Home > Design Patterns > Relational Source

Relational Source (Buhler, Erl, Khattak)

How can large amounts of data be imported into a Big Data platform from a relational database?

Relational Source

Problem

Importing large amounts of relational data based on first exporting and then importing a delimited file is not only time-consuming but also inefficient.

Solution

To import relational data, a direct connection is made from within the Big Data platform to the backend relational database.

Application

A data transfer engine is used, employing different connectors to directly connect to different relational databases and execute SQL queries for selecting the data that needs importing.

A relational data transfer engine component is introduced within the Big Data platform. This component internally uses different drivers and connectors for connecting to different relational databases. The user specifies the connection string and the table from which data needs importing or an SQL query for customizing the data import. In some cases, to accelerate the import of very large amounts of relational data, the relational data transfer engine may internally make use of a processing engine that parallelizes the import process by executing multiple SQL commands in parallel. Based on the availability of suitable connectors, this pattern can also be applied to extract data from data warehouses.

The application of this pattern may be impeded if a database-specific connector is not available. However, a generic connector can normally be used in such circumstances albeit providing suboptimal data transfer speeds.

Relational Source: A capability is added to the Big Data platform that enables it to make a direct connection to the relational database via some user interface. The user interface is used to make a connection to the relational database and specify which data needs importing. Apart from providing a single and a uniform interface for connecting with multiple databases, the application of this pattern further saves time by not having to move between two systems.

A capability is added to the Big Data platform that enables it to make a direct connection to the relational database via some user interface. The user interface is used to make a connection to the relational database and specify which data needs importing. Apart from providing a single and a uniform interface for connecting with multiple databases, the application of this pattern further saves time by not having to move between two systems.

  1. User configures the relational data transfer engine to extract the required data.
  2. The relational data transfer engine mechanism automatically extracts the required data from the relational database.
  3. The relational data transfer engine then automatically inserts the data into the storage device without requiring any human intervention.
BigDataScienceSchool.com Big Data Science Certified Professional (BDSCP) Module 10: Fundamental Big Data Architecture

This pattern is covered in BDSCP Module 10: Fundamental Big Data Architecture.

For more information regarding the Big Data Science Certified Professional (BDSCP) curriculum,
visit www.arcitura.com/bdscp.

Big Data Fundamentals

The official textbook for the BDSCP curriculum is:

Big Data Fundamentals: Concepts, Drivers & Techniques
by Paul Buhler, PhD, Thomas Erl, Wajid Khattak
(ISBN: 9780134291079, Paperback, 218 pages)

Please note that this textbook covers fundamental topics only and does not cover design patterns.
For more information about this book, visit www.arcitura.com/books.