Home > Design Patterns > Cloud-based Big Data Storage

Cloud-based Big Data Storage (Buhler, Erl, Khattak)

How can large amounts of data be stored without investing in any Big Data storage infrastructure and only paying for the used storage space?

Cloud-based Big Data Storage

Problem

Building a storage cluster for storing large amounts of data requires upfront investment that may not be possible for all enterprises. Even if it were possible, a storage cluster is generally underutilized, resulting in waste.

Solution

Instead of creating an in-house storage infrastructure, cloud storage is used for storing large datasets as a cost-saving measure.

Application

Cloud is used to store data in a distributed file system or NoSQL database on a pay-per-use basis.

A distributed file system or a NoSQL deployed in the cloud is used for data storage. The application of this pattern requires the IT team to have cloud skills, such as knowledge of cloud provider-specific APIs, in order to import data into the cloud and be able to manipulate data. The Cloud-based Big Data Storage pattern is generally applied together with the Cloud-based Big Data Processing pattern. When applying the Cloud-based Big Data Storage pattern, data processing delays may occur if the datasets are not already in the cloud due to the time required to import large datasets into the cloud.

Cloud-based Big Data Storage: The pay-per-use and elastic nature of the cloud is put to use by storing data in the cloud. Use of cloud also provides more scalability than in-house cluster potentially due the larger infrastructure backing of the cloud provider.

The pay-per-use and elastic nature of the cloud is put to use by storing data in the cloud. Use of cloud also provides more scalability than in-house cluster potentially due the larger infrastructure backing of the cloud provider.

In the diagram, importing large amounts of structured, semi-structured and unstructured data requires a cluster-based storage infrastructure. The enterprise opts for cloud storage to save the data. The resulting costs are within the allocated IT budget of the enterprise, for the enterprise only pays for the amount of storage space utilized.