3/15/2025
Lecture 8: IoT Data Management
Seyed-Hosein Attarzadeh-Niaki
Fundamentals of IoT 1
Recap: Layered IoT Cloud Architecture
• IoT Cloud architecture is divided into functional layers:
– Device Management Layer: Handles IoT device operations.
– Data Ingestion Layer: Collects and routes data.
– Data Processing Layer: Analyzes data (brief mention).
– Data Storage Layer: Stores data (brief mention).
– Application Layer: Manages IoT services.
– Data Visualization and Reporting Layer: Presents data to users.
– Orchestration Layer: Coordinates operations.
Fundamentals of IoT 2
1
3/15/2025
Data Management in IoT
• Manages exponential data growth
from IoT devices.
• Ensures data accessibility for
applications.
• Tackles challenges: volume, velocity,
variety.
• Enables real-time decision-making.
• Optimizes efficiency and reduces
costs.
Fundamentals of IoT 3
Data Ingestion in IoT Cloud
• Functions
• Collect heterogeneous data sources (sensors, logs, etc.).
• Route data to appropriate destinations in cloud (batch/stream).
• Key design considerations
• Scalability for growing data.
• Data integrity and fault tolerance.
• Challenges:
Challenge Description
Velocity/Volume Massive data streams require high throughput.
Heterogeneous Sources Diverse formats (e.g., CSV, JSON, sensor logs).
Rapid Evolution Protocols and frameworks change frequently.
Data Independence Data semantics evolve without prior notice.
Fundamentals of IoT 4
2
3/15/2025
Apache Flume Architecture
• Distributed system for aggregating streaming data.
• Components: Source , Channel , Sink .
• Horizontal scaling via multiple agents.
• Use cases: Streaming logs to HDFS.
Fundamentals of IoT 5
Apache Kafka:
Publish-Subscribe Model
• Publish-subscribe messaging system with
topics/partitions.
• Topics: Data categories (e.g., “sensor-logs”).
• Partitions: Scale horizontally; each partition is ordered.
• Producers/Consumers: Write/read data via topics.
Fundamentals of IoT 6
3
3/15/2025
Kafka Partitions and Replication
• Partitions ensure scalability and parallelism.
• Replication: Fault tolerance via leader-follower
model.
• Offsets track consumer progress.
Fundamentals of IoT 7
Apache Nifi:
Flow-Based Data Movement
• Automates data flow with drag-and-drop UI.
• Features
– FlowFile: Data unit with metadata.
– Processors: Transform, route, or system interactions.
– Connections: Queues with backpressure.
Fundamentals of IoT 8
4
3/15/2025
Elastic Logstash
• Ingests data from multiple sources (logs,
databases).
• Transforms: Normalize, enrich, or filter data.
• Outputs to Elasticsearch for indexing.
Fundamentals of IoT 9
Data Ingestion Frameworks
Framework Key Features Use Cases
Aggregates streaming data into
Apache Flume Log data collection
Hadoop
High-throughput messaging Real-time data
Apache Kafka
system pipelines
Automates data flow, supports Data routing and
Apache NiFi
diverse formats transformation
Ingests and transforms data for Log and event data
Elastic Logstash
Elasticsearch processing
Fundamentals of IoT 10
5
3/15/2025
Data Processing Layer:
Batch vs. Stream
• Transforms raw IoT data
into actionable insights.
• Types of processing
– Batch: Processes data in
scheduled blocks (e.g., weekly
sales reports).
– Stream: Real-time processing
as data arrives (e.g.,
temperature alarm).
• Latency
– Stream: milliseconds;
– Batch: minutes/hours.
• IoT often demands stream
processing for immediacy.
Fundamentals of IoT 11
Lambda Architecture
• Combines batch and stream processing:
– Batch Layer: Stores raw data, processes for long-term
analysis.
– Speed Layer: Real-time processing for short-term insights.
– Serving Layer: Merges batch and real-time views.
Fundamentals of IoT 12
6
3/15/2025
Kappa Architecture
• Simplifies Lambda by using only stream
processing
– All data treated as streams.
– Reprocesses data for updates.
• Advantages: Single codebase, reduced
complexity.
Fundamentals of IoT 13
Data Processing Frameworks
Apache Storm
• Real-time stream processing,
fault-tolerant.
Apache Flink
• Scalable streaming with
checkpointing.
Apache Spark
• In-memory processing for batch
and streams.
Apache Storm topology
Fundamentals of IoT 14
7
3/15/2025
Data Storage Layer
• Requires hybrid storage
for diverse IoT data
needs.
• Storage types
– Cold Storage: Data lakes
(e.g., HDFS) for raw data.
– Warm Storage: Databases
(e.g., MySQL) for analytics.
– Hot Storage: In-memory
DBs (e.g., Redis) for real-
time queries.
Fundamentals of IoT 15
ACID: Atomicity, Consistency, Durability, Isolation
SQL vs. NoSQL Databases
Feature SQL (RDBMS) NOSQL
Schema Fixed Flexible
Transactions ACID Eventual Consistency
Use Case Structured data Unstructured data
Scaling Vertical Horizontal
Examples MySQL, PostgreSQL MongoDB, Cassandra
When Use SQL When Use NoSQL
• Building custom dashboards • ACID support is not required
• Analyzes behavior-related or custom • Traditional RDBMS model is
sessions insufficient
• Need to store or extract database • Data requires flexible schema
information quickly • Constraints or validation logic is not
• When using joins and complex queries needed
• Need ACID transaction • Logging data from distributed sources
• The schema (structure of data) is • Storing temporary data or wish lists
known and does not change and session data
Fundamentals of IoT 16
8
3/15/2025
MongoDB:
Document-Oriented Storage
• Document-oriented database
– Collections: Equivalent to tables.
– Embedded vs. Referenced Data: Supports 1:1, 1:N, N:1,
N:N relationships.
– Aggregation Framework: Built-in ETL capabilities.
• Use cases: Unstructured data, real-time analytics.
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Primary key Primary key (MongoDB provides
the default key _id itself)
Fundamentals of IoT 17
Cassandra: Wide-Column Store
• Wide-column NoSQL
database
– Keyspaces: Equivalent to
databases.
– Column Families: Flexible
schema (rows with
arbitrary columns).
– Replication: No single point
of failure.
– CQL: SQL-like query
language.
• Use cases: Time-series
data, write-heavy
applications.
Fundamentals of IoT 18
9
3/15/2025
Redis: In-Memory Database
• Key-Value Store: Fast read/write operations.
• Data Structures: Strings, lists, hashes, sets.
• Use Cases: Caching, session management,
real-time analytics.
Fundamentals of IoT 19
Elasticsearch & Kibana
• Elasticsearch
– Distributed search engine for logs/analytics.
– Components: Cluster, Node, Shards.
• Kibana
– Visualization tool for Elasticsearch data.
Fundamentals of IoT 20
10
3/15/2025
CAP Theorem
• CAP Trade-offs
– Consistency: All nodes
see the same data.
– Availability: Every
request receives a
response.
– Partition Tolerance:
System works despite
network failures.
• Trade-offs: Choose two
of three properties.
Fundamentals of IoT 21
Data Warehouse
• A data integration point
– Combining data from a variety of diverse relational databases
• Compared to traditional RDBMS
– Run quick and/or complicated queries involving all (even
historic) data
– Structure of Data: not (necessarily) normalized
Fundamentals of IoT 22
11
3/15/2025
Data Lakes in IoT
• Definition: Stores raw
structured/unstructured
data for future analysis.
• Advantages
– Cost-effective storage for
high-volume data.
– Schema-on-read (ELT vs.
ETL).
• Use Cases: Sensor logs,
machine learning training.
Fundamentals of IoT 23
Data Lakes vs. Data Warehouses
Data Lake Data Warehouse/RDBMS
Raw, unstructured data Processed, structured data
Physical collection of un-curated raw data Data of common meaning
System of insight: unknown data to make System of record: well-understood data to
experimentation/data discovery do operational reporting
Any type of data Limited set of data types (i.e., relational)
Not suitable for transactions Suitable for ACID transactions
Schema-on-read (ELT) Schema-on-write (ETL)
Cost-effective for large data High-performance for structured queries
Fundamentals of IoT 24
12
3/15/2025
ETL vs. ELT Processes
ETL (Extract, Transform, Load) ELT (Extract, Load, Transform)
• Data transformed before • Data transformed after
loading (used in DWH). loading (used in Data Lakes).
Fundamentals of IoT 25
Distributed File Systems
• HDFS: Scalable, fault-
tolerant for big data
(e.g., Hadoop).
• Amazon S3: Cloud-
based storage for data
lakes.
• Data is split into blocks
and replicated across
nodes.
Fundamentals of IoT 26
13
3/15/2025
Data Lake Challenges
• Key Challenges
– Data swamps (unmanaged metadata).
– Security and access control.
– Data lineage and quality assurance.
• Solution: Governance frameworks (e.g., data catalogs).
• Data Lake Tiers & File Formats
Fundamentals of IoT 27
Data Visualization and Reporting Layer
Data Visualization Frameworks
• Kibana: data visualization based on Elasticsearch
• explore and analyze Elasticsearch’s log data
• Grafana: graph composer and dashboard
• visualize continuous time-series and streaming data
Business Intelligence Frameworks
• Tableau: manipulate big data, user friendly
• Microsoft Power BI: analyze data and share and visualize subsequent data
• QlikView: effective data visualization, business intelligence, and enterprise
reporting options
Advanced Data Analytical and Machine Learning Frameworks
• Scikit-learn, TensorFlow, Caffe, RapidMiner, Splunk, etc.
Fundamentals of IoT 28
14
3/15/2025
IoT Compute Stack and Fog Computing
• Traditional IT: Centralized cloud
processing.
• IoT model: Distributed with edge
and fog layers.
– Edge: Near-device processing.
– Fog: Intermediate
processing/storage.
– Cloud: Centralized resources.
• Fog extends cloud capabilities to
the network edge.
– Benefits:
• Low latency for real-time needs.
• Bandwidth savings via local
processing.
• Enhanced security with on-site data.
– Fog nodes: Routers, gateways, etc.
Fundamentals of IoT 29
IoT Data Management Best Practices
• Leverage edge and fog for distributed processing.
• Use hybrid storage (hot, warm, cold) for efficiency.
• Prioritize data security and governance policies.
• Adopt scalable tools for data ingestion and processing.
• Regularly monitor and refine data pipelines.
Fundamentals of IoT 30
15
3/15/2025
Next Lecture
• Artificial Intelligence in IoT
Fundamentals of IoT 31
16