Home > Design Patterns > Relational Source

Relational Source (Buhler, Erl, Khattak)

How can large amounts of data be imported into a Big Data platform from a relational database?

Relational Source

Problem

Importing large amounts of relational data based on first exporting and then importing a delimited file is not only time-consuming but also inefficient.

Solution

To import relational data, a direct connection is made from within the Big Data platform to the backend relational database.

Application

A data transfer engine is used, employing different connectors to directly connect to different relational databases and execute SQL queries for selecting the data that needs importing.

A relational data transfer engine component is introduced within the Big Data platform. This component internally uses different drivers and connectors for connecting to different relational databases. The user specifies the connection string and the table from which data needs importing or an SQL query for customizing the data import. In some cases, to accelerate the import of very large amounts of relational data, the relational data transfer engine may internally make use of a processing engine that parallelizes the import process by executing multiple SQL commands in parallel. Based on the availability of suitable connectors, this pattern can also be applied to extract data from data warehouses.

The application of this pattern may be impeded if a database-specific connector is not available. However, a generic connector can normally be used in such circumstances albeit providing suboptimal data transfer speeds.

Relational Source: A capability is added to the Big Data platform that enables it to make a direct connection to the relational database via some user interface. The user interface is used to make a connection to the relational database and specify which data needs importing. Apart from providing a single and a uniform interface for connecting with multiple databases, the application of this pattern further saves time by not having to move between two systems.

A capability is added to the Big Data platform that enables it to make a direct connection to the relational database via some user interface. The user interface is used to make a connection to the relational database and specify which data needs importing. Apart from providing a single and a uniform interface for connecting with multiple databases, the application of this pattern further saves time by not having to move between two systems.

  1. User configures the relational data transfer engine to extract the required data.
  2. The relational data transfer engine mechanism automatically extracts the required data from the relational database.
  3. The relational data transfer engine then automatically inserts the data into the storage device without requiring any human intervention.