Home > Design Patterns > File-based Sink

File-based Sink (Buhler, Erl, Khattak)

How can processed data be ported from a Big Data platform to systems that use proprietary, non-relational storage technologies?

File-based Sink

Problem

The relational egress technique cannot be used to export processed data from a Big Data platform to systems using non-relation or proprietary storage techniques.

Solution

Processed data is exported from the Big Data platform in a delimited or hierarchical file format to the target system’s location.

Application

A file-based data transfer engine is implemented that copies textual data from the storage device and to a configured location.

A file data transfer engine is used that is configured to copy data from the storage device to a target location, such as a directory location or a URI. The file data transfer engine may internally use polling or some file watcher-based functionality to copy files from the source location. It should be noted that the file that needs to be copied over to the target system’s location may not be in the correct format or model. Consequently, some processing may be required to put the file in the required format or model.

File-based Sink: Processed data is exported in a common textual format, such as a delimited file format or a hierarchical file format, and automatically copied in the target system’s configured location. A scheduling system is further used to export files at regular intervals. The application of this pattern helps Big Data platform integration with legacy and other proprietary systems.

Processed data is exported in a common textual format, such as a delimited file format or a hierarchical file format, and automatically copied in the target system’s configured location. A scheduling system is further used to export files at regular intervals. The application of this pattern helps Big Data platform integration with legacy and other proprietary systems.

  1. The user configures the file data transfer engine mechanism to export textual data from the Big Data platform to the specified location.
  2. Delimited files containing textual data are automatically copied from the storage device by the file data transfer engine.
  3. The file data transfer engine then automatically inserts the delimited files into the configured location.