Home > Design Patterns > Direct Data Access

Direct Data Access (Buhler, Erl, Khattak)

How can large amounts of raw data be analyzed in place by contemporary data analytics tools without having to export data?

Direct Data Access

Problem

Analyzing large amounts of raw data using advanced analytics tools based on first exporting raw data from Big Data platform into external storage and then retrieving the data from the external storage is not only inefficient but also time-consuming.

Solution

A direct connection is made between the Big Data platform and the analytics tool based on a standard technology to allow access to the raw data.

Application

A two-way connector is introduced between the Big Data platform and the analytics tool that translates the calls from the analytics tool to the Big Data platform.

A two-way connector is used to enable direct connection between the analytic tool and the Big Data platform. In order for the user to be able to access data stored in the Big Data platform, the user first specifies a connection string that is used by the connector to locate the underlying resource in the Big Data platform to which the connection needs to be made and the file/dataset that needs to be retrieved. A separate connector is generally used to connect to different types of resources. The actual connection is made between the analytics tool and either the query engine or the storage device. After making the connection, the user specifies the required operations that need to be performed on the data. At runtime, the connector connects to the storage device or the query engine and retrieves the data required by the analytics tool for executing the required operations.

Direct Data Access: A functionality is added to enable the analytics tool to make a direct connection to the Big Data platform. Based on the type of the functionality required, the connection is made to the query engine or to the storage device.

A functionality is added to enable the analytics tool to make a direct connection to the Big Data platform. Based on the type of the functionality required, the connection is made to the query engine or to the storage device.

  1. A large dataset is stored in a storage device.
  2. A data analyst uses a contemporary analytics tool to apply a machine learning algorithm to the dataset.
  3. A connection is made to the storage device via a connector.
  4. The storage device provides the required data via the connector.
  5. The tool then applies the required machine learning algorithm to the data.