0% found this document useful (0 votes)
3 views16 pages

Lecture8-IoT Data Management

The document outlines the fundamentals of IoT data management, detailing the layered architecture of IoT cloud systems, including device management, data ingestion, processing, storage, and visualization. It discusses various frameworks and technologies like Apache Kafka, NiFi, and MongoDB, emphasizing the importance of handling large volumes of diverse data efficiently. Additionally, it highlights best practices for IoT data management, including leveraging edge and fog computing, hybrid storage solutions, and ensuring data security.

Uploaded by

lyyne01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views16 pages

Lecture8-IoT Data Management

The document outlines the fundamentals of IoT data management, detailing the layered architecture of IoT cloud systems, including device management, data ingestion, processing, storage, and visualization. It discusses various frameworks and technologies like Apache Kafka, NiFi, and MongoDB, emphasizing the importance of handling large volumes of diverse data efficiently. Additionally, it highlights best practices for IoT data management, including leveraging edge and fog computing, hybrid storage solutions, and ensuring data security.

Uploaded by

lyyne01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

3/15/2025

Lecture 8: IoT Data Management

Seyed-Hosein Attarzadeh-Niaki

Fundamentals of IoT 1

Recap: Layered IoT Cloud Architecture


• IoT Cloud architecture is divided into functional layers:
– Device Management Layer: Handles IoT device operations.
– Data Ingestion Layer: Collects and routes data.
– Data Processing Layer: Analyzes data (brief mention).
– Data Storage Layer: Stores data (brief mention).
– Application Layer: Manages IoT services.
– Data Visualization and Reporting Layer: Presents data to users.
– Orchestration Layer: Coordinates operations.

Fundamentals of IoT 2

1
3/15/2025

Data Management in IoT


• Manages exponential data growth
from IoT devices.
• Ensures data accessibility for
applications.
• Tackles challenges: volume, velocity,
variety.
• Enables real-time decision-making.
• Optimizes efficiency and reduces
costs.

Fundamentals of IoT 3

Data Ingestion in IoT Cloud


• Functions
• Collect heterogeneous data sources (sensors, logs, etc.).
• Route data to appropriate destinations in cloud (batch/stream).
• Key design considerations
• Scalability for growing data.
• Data integrity and fault tolerance.
• Challenges:
Challenge Description
Velocity/Volume Massive data streams require high throughput.
Heterogeneous Sources Diverse formats (e.g., CSV, JSON, sensor logs).
Rapid Evolution Protocols and frameworks change frequently.
Data Independence Data semantics evolve without prior notice.
Fundamentals of IoT 4

2
3/15/2025

Apache Flume Architecture


• Distributed system for aggregating streaming data.
• Components: Source , Channel , Sink .
• Horizontal scaling via multiple agents.
• Use cases: Streaming logs to HDFS.

Fundamentals of IoT 5

Apache Kafka:
Publish-Subscribe Model
• Publish-subscribe messaging system with
topics/partitions.
• Topics: Data categories (e.g., “sensor-logs”).
• Partitions: Scale horizontally; each partition is ordered.
• Producers/Consumers: Write/read data via topics.

Fundamentals of IoT 6

3
3/15/2025

Kafka Partitions and Replication


• Partitions ensure scalability and parallelism.
• Replication: Fault tolerance via leader-follower
model.
• Offsets track consumer progress.

Fundamentals of IoT 7

Apache Nifi:
Flow-Based Data Movement
• Automates data flow with drag-and-drop UI.
• Features
– FlowFile: Data unit with metadata.
– Processors: Transform, route, or system interactions.
– Connections: Queues with backpressure.

Fundamentals of IoT 8

4
3/15/2025

Elastic Logstash
• Ingests data from multiple sources (logs,
databases).
• Transforms: Normalize, enrich, or filter data.
• Outputs to Elasticsearch for indexing.

Fundamentals of IoT 9

Data Ingestion Frameworks


Framework Key Features Use Cases
Aggregates streaming data into
Apache Flume Log data collection
Hadoop
High-throughput messaging Real-time data
Apache Kafka
system pipelines
Automates data flow, supports Data routing and
Apache NiFi
diverse formats transformation
Ingests and transforms data for Log and event data
Elastic Logstash
Elasticsearch processing

Fundamentals of IoT 10

5
3/15/2025

Data Processing Layer:


Batch vs. Stream
• Transforms raw IoT data
into actionable insights.
• Types of processing
– Batch: Processes data in
scheduled blocks (e.g., weekly
sales reports).
– Stream: Real-time processing
as data arrives (e.g.,
temperature alarm).
• Latency
– Stream: milliseconds;
– Batch: minutes/hours.
• IoT often demands stream
processing for immediacy.

Fundamentals of IoT 11

Lambda Architecture
• Combines batch and stream processing:
– Batch Layer: Stores raw data, processes for long-term
analysis.
– Speed Layer: Real-time processing for short-term insights.
– Serving Layer: Merges batch and real-time views.

Fundamentals of IoT 12

6
3/15/2025

Kappa Architecture
• Simplifies Lambda by using only stream
processing
– All data treated as streams.
– Reprocesses data for updates.
• Advantages: Single codebase, reduced
complexity.

Fundamentals of IoT 13

Data Processing Frameworks


Apache Storm
• Real-time stream processing,
fault-tolerant.
Apache Flink
• Scalable streaming with
checkpointing.
Apache Spark
• In-memory processing for batch
and streams.
Apache Storm topology
Fundamentals of IoT 14

7
3/15/2025

Data Storage Layer


• Requires hybrid storage
for diverse IoT data
needs.
• Storage types
– Cold Storage: Data lakes
(e.g., HDFS) for raw data.
– Warm Storage: Databases
(e.g., MySQL) for analytics.
– Hot Storage: In-memory
DBs (e.g., Redis) for real-
time queries.

Fundamentals of IoT 15

ACID: Atomicity, Consistency, Durability, Isolation

SQL vs. NoSQL Databases


Feature SQL (RDBMS) NOSQL
Schema Fixed Flexible
Transactions ACID Eventual Consistency
Use Case Structured data Unstructured data
Scaling Vertical Horizontal
Examples MySQL, PostgreSQL MongoDB, Cassandra

When Use SQL When Use NoSQL


• Building custom dashboards • ACID support is not required
• Analyzes behavior-related or custom • Traditional RDBMS model is
sessions insufficient
• Need to store or extract database • Data requires flexible schema
information quickly • Constraints or validation logic is not
• When using joins and complex queries needed
• Need ACID transaction • Logging data from distributed sources
• The schema (structure of data) is • Storing temporary data or wish lists
known and does not change and session data
Fundamentals of IoT 16

8
3/15/2025

MongoDB:
Document-Oriented Storage
• Document-oriented database
– Collections: Equivalent to tables.
– Embedded vs. Referenced Data: Supports 1:1, 1:N, N:1,
N:N relationships.
– Aggregation Framework: Built-in ETL capabilities.
• Use cases: Unstructured data, real-time analytics.
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Primary key Primary key (MongoDB provides
the default key _id itself)
Fundamentals of IoT 17

Cassandra: Wide-Column Store


• Wide-column NoSQL
database
– Keyspaces: Equivalent to
databases.
– Column Families: Flexible
schema (rows with
arbitrary columns).
– Replication: No single point
of failure.
– CQL: SQL-like query
language.
• Use cases: Time-series
data, write-heavy
applications.

Fundamentals of IoT 18

9
3/15/2025

Redis: In-Memory Database


• Key-Value Store: Fast read/write operations.
• Data Structures: Strings, lists, hashes, sets.
• Use Cases: Caching, session management,
real-time analytics.

Fundamentals of IoT 19

Elasticsearch & Kibana


• Elasticsearch
– Distributed search engine for logs/analytics.
– Components: Cluster, Node, Shards.
• Kibana
– Visualization tool for Elasticsearch data.

Fundamentals of IoT 20

10
3/15/2025

CAP Theorem
• CAP Trade-offs
– Consistency: All nodes
see the same data.
– Availability: Every
request receives a
response.
– Partition Tolerance:
System works despite
network failures.
• Trade-offs: Choose two
of three properties.

Fundamentals of IoT 21

Data Warehouse
• A data integration point
– Combining data from a variety of diverse relational databases
• Compared to traditional RDBMS
– Run quick and/or complicated queries involving all (even
historic) data
– Structure of Data: not (necessarily) normalized

Fundamentals of IoT 22

11
3/15/2025

Data Lakes in IoT


• Definition: Stores raw
structured/unstructured
data for future analysis.
• Advantages
– Cost-effective storage for
high-volume data.
– Schema-on-read (ELT vs.
ETL).
• Use Cases: Sensor logs,
machine learning training.

Fundamentals of IoT 23

Data Lakes vs. Data Warehouses


Data Lake Data Warehouse/RDBMS
Raw, unstructured data Processed, structured data
Physical collection of un-curated raw data Data of common meaning
System of insight: unknown data to make System of record: well-understood data to
experimentation/data discovery do operational reporting
Any type of data Limited set of data types (i.e., relational)
Not suitable for transactions Suitable for ACID transactions
Schema-on-read (ELT) Schema-on-write (ETL)
Cost-effective for large data High-performance for structured queries

Fundamentals of IoT 24

12
3/15/2025

ETL vs. ELT Processes


ETL (Extract, Transform, Load) ELT (Extract, Load, Transform)

• Data transformed before • Data transformed after


loading (used in DWH). loading (used in Data Lakes).

Fundamentals of IoT 25

Distributed File Systems


• HDFS: Scalable, fault-
tolerant for big data
(e.g., Hadoop).
• Amazon S3: Cloud-
based storage for data
lakes.
• Data is split into blocks
and replicated across
nodes.

Fundamentals of IoT 26

13
3/15/2025

Data Lake Challenges


• Key Challenges
– Data swamps (unmanaged metadata).
– Security and access control.
– Data lineage and quality assurance.
• Solution: Governance frameworks (e.g., data catalogs).
• Data Lake Tiers & File Formats

Fundamentals of IoT 27

Data Visualization and Reporting Layer


Data Visualization Frameworks
• Kibana: data visualization based on Elasticsearch
• explore and analyze Elasticsearch’s log data
• Grafana: graph composer and dashboard
• visualize continuous time-series and streaming data

Business Intelligence Frameworks


• Tableau: manipulate big data, user friendly
• Microsoft Power BI: analyze data and share and visualize subsequent data
• QlikView: effective data visualization, business intelligence, and enterprise
reporting options

Advanced Data Analytical and Machine Learning Frameworks


• Scikit-learn, TensorFlow, Caffe, RapidMiner, Splunk, etc.

Fundamentals of IoT 28

14
3/15/2025

IoT Compute Stack and Fog Computing


• Traditional IT: Centralized cloud
processing.
• IoT model: Distributed with edge
and fog layers.
– Edge: Near-device processing.
– Fog: Intermediate
processing/storage.
– Cloud: Centralized resources.

• Fog extends cloud capabilities to


the network edge.
– Benefits:
• Low latency for real-time needs.
• Bandwidth savings via local
processing.
• Enhanced security with on-site data.
– Fog nodes: Routers, gateways, etc.

Fundamentals of IoT 29

IoT Data Management Best Practices


• Leverage edge and fog for distributed processing.

• Use hybrid storage (hot, warm, cold) for efficiency.

• Prioritize data security and governance policies.

• Adopt scalable tools for data ingestion and processing.

• Regularly monitor and refine data pipelines.

Fundamentals of IoT 30

15
3/15/2025

Next Lecture
• Artificial Intelligence in IoT

Fundamentals of IoT 31

16

You might also like