Home > Design Patterns > Streaming Source

Streaming Source (Buhler, Erl, Khattak)

How can high velocity data be imported reliably into a Big Data platform in realtime?

Streaming Source

Problem

Data originating as streams needs to be captured in realtime and on a continuous basis. However, utilizing batch ingress techniques introduces unacceptable latency.

Solution

A realtime data ingestion system is a setup that collects data from configured source(s) as it is produced and then coninuously forwards it to the configured destination(s).

Application

A publish-subscribe system based on a queuing system is implemented, capturing incoming stream of data as events and then forwarding these events to the subscriber(s).

An event data transfer engine mechanism is introduced within the Big Data platform. The event data transfer engine is configured to specify the data sources and the destination. Once configured, the event data transfer engine automatically ingests events as they are generated by the source. Once an event is ingested, it is published to the configured subscribers. A queue is generally used to store the events, providing fault-tolerance and scalability.

This pattern is generally applied together with the Realtime Access Storage and High Velocity Realtime Processing patterns.

Streaming Source: Instead of collating the individual data events as a file, a system is implemented that captures the events as they are produced by the data source and forwarded to the Big Data platform for instant processing. Doing so enables realtime capture of data without incurring any delay.

Instead of collating the individual data events as a file, a system is implemented that captures the events as they are produced by the data source and forwarded to the Big Data platform for instant processing. Doing so enables realtime capture of data without incurring any delay.

  1. Individual readings transmitted by a smart meter every 30 seconds are captured by an event data transfer engine.
  2. Each event is imported into the Big Data platform as it gets captured by the event data transfer engine.
  3. Data is then analyzed to find insights.
  4. The whole process takes a very short time to execute.