0% found this document useful (0 votes)
211 views

Components of A Big Data Architecture

The document outlines the key components of a big data architecture, including data sources, data storage, real-time message ingestion, batch processing, stream processing, an analytical data store, and orchestration. Data sources can include application data stores, static files, and real-time sources like IoT devices. Data is stored in distributed file storage and processed via batch or stream processing before being loaded into an analytical data store for analysis and reporting. Orchestration coordinates repeated workflows that transform and move data between components.

Uploaded by

AMIT RAJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
211 views

Components of A Big Data Architecture

The document outlines the key components of a big data architecture, including data sources, data storage, real-time message ingestion, batch processing, stream processing, an analytical data store, and orchestration. Data sources can include application data stores, static files, and real-time sources like IoT devices. Data is stored in distributed file storage and processed via batch or stream processing before being loaded into an analytical data store for analysis and reporting. Orchestration coordinates repeated workflows that transform and move data between components.

Uploaded by

AMIT RAJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Components of a big data architecture

The following diagram shows the logical components that fit into a big data
architecture. Most big data architectures include some or all of the following
components:

 Data sources.

o A big data environment can manage both batch processing and real-
time processing. All big data solutions start with one or more data
sources. Examples include:

 Application data stores, such as relational databases.


 Static files produced by applications, such as web server log files.
 Real-time data sources, such as IoT devices.

 Data storage.

o Data is typically stored in a distributed file store that can hold high
volumes of large files in various formats. This is mostly in raw format and
is referred as data lake.
o It converts the data into a format comprehensible for the data analytics
tool, and stores the data according to its format.
o For example, Big Data architecture stores unstructured data in
distributed file storage systems like HDFS or NoSQL database. It stores
structured data in RDBMS.

 Real-time Message Ingestion.

o It is a mechanism in Big Data architecture that captures and stores real-


time data that is consumed by stream processing consumers eg.
Customer click streams.
o It is simply a datastore where the new messages are dropped inside the
folder.
o It may include options like Apache Kafka, Event hubs from Azure,
Apache Flume, etc.

 Batch Processing
o Because the data sets are so large, often a big data solution must process
data files using long-running batch jobs to filter, aggregate, and
otherwise prepare the data for analysis.
o Usually these jobs involve reading source files, processing them, and
writing the output to new files.
o The most commonly used solution for Batch Processing is Apache
Hadoop.

 Stream processing.

o There is a little difference between stream processing and real-time


message ingestion. Stream processing handles all streaming data which
occurs in windows or streams.
o After capturing real-time messages, the solution must process them by
filtering, aggregating, and otherwise preparing the data for analysis.
o The processed stream data is then written to an output sink.
o It includes Apache Spark, Storm, Apache Flink, etc.
 Analytical data store.
 For this, there are many data analytics and visualization tools that analyze
the data and generate reports or a dashboard. Companies use these reports
for making data-driven decisions

 Orchestration.

o Most big data solutions consist of repeated data processing operations,


encapsulated in workflows that transform source data, move data
between multiple sources and sinks, load the processed data into an
analytical data store, or push the results straight to a report or
dashboard.

You might also like