Components of A Big Data Architecture
Components of A Big Data Architecture
The following diagram shows the logical components that fit into a big data
architecture. Most big data architectures include some or all of the following
components:
Data sources.
o A big data environment can manage both batch processing and real-
time processing. All big data solutions start with one or more data
sources. Examples include:
Data storage.
o Data is typically stored in a distributed file store that can hold high
volumes of large files in various formats. This is mostly in raw format and
is referred as data lake.
o It converts the data into a format comprehensible for the data analytics
tool, and stores the data according to its format.
o For example, Big Data architecture stores unstructured data in
distributed file storage systems like HDFS or NoSQL database. It stores
structured data in RDBMS.
Batch Processing
o Because the data sets are so large, often a big data solution must process
data files using long-running batch jobs to filter, aggregate, and
otherwise prepare the data for analysis.
o Usually these jobs involve reading source files, processing them, and
writing the output to new files.
o The most commonly used solution for Batch Processing is Apache
Hadoop.
Stream processing.
Orchestration.