Overview
Streaming data is the continuous flow of real-time information, and the foundation of the event-driven architecture software model. Modern applications can use streaming data to enable data processing, storage, and analysis.
One way to think about streaming data is as a running log of changes or events that have occurred to a data set—often one changing at an extremely high velocity.
The large, fast-moving data sets that can be sources of streaming data are as varied as financial transactions, Internet of Things (IoT) sensor data, logistics operations, retail orders, or hospital patient monitoring. Like a next generation of messaging, data streaming is suited for situations that demand a real-time responses to events.
One example of streaming data is event data, which forms the foundation of event-driven architecture. Event-driven architecture brings together loosely coupled microservices as part of agile development.
Why is streaming data important?
Application users expect real-time digital experiences. Apps that can consume and process streaming data raise the level of performance and improve customer satisfaction.
Traditionally, applications that needed real-time responses to events relied on databases and message processing systems. Such systems cannot keep up with the torrent of data produced today. For example, traditional request-driven systems can struggle to react quickly to fast-moving data requests from multiple sources.
With an event-streaming model, events are written to a log rather than stored to a database. Event consumers can read from any part of the stream and can join the stream at any time.
Event stream processing can be used to detect meaningful patterns in streams. Event stream processing uses a data streaming platform to ingest events and process or transform the event stream.
Red Hat resources
What are some common use cases for data streams?
When you think of streaming data, think of real-time applications. Some common use cases include:
- Digital experiences that rely on immediate access to information.
- Microservices applications that support agile software development.
- Streaming scenarios that modernize database-driven applications that were previously driven by batch processing.
- Real-time analytics, especially ones that ingest data from multiple sources.
- Edge computing that brings together data from diverse and disparate devices and systems.
Apps built around messaging, geolocation, stock trades, fraud detection, inventory management, marketing analytics, IT systems monitoring, and industrial IoT data are some popular use cases for data streams.
How does Apache Kafka work with streaming data?
Apache Kafka is an open-source distributed messaging platform that has become one of the most popular ways to work with large quantities of streaming, real-time data.
Software developers use Kafka to build data pipelines and streaming applications. With Kafka, applications can:
- Publish and subscribe to streams of records.
- Store streams of records.
- Process records as they occur.
Kafka is designed to manage streaming data while being fast, horizontally scalable, and fault-tolerant. Since Kafka minimizes the need for point-to-point integrations for data sharing in certain applications, it can reduce latency to milliseconds. This means data is available to users faster, which can be advantageous in use cases that require real-time data availability, such as IT operations and e-commerce, and many others.
Apache Kafka can handle millions of data points per second, which makes it well suited for big data challenges. In many data processing use cases, such as the IoT and social media, data is increasing exponentially, and may quickly overwhelm an application based on today's data volume.
What are some of the challenges of data streaming?
By definition, data streams must deliver sequenced information in real time. Streaming data applications depend on streams that are consistent and highly available, even during times of high activity. Delivering and/or consuming a data stream that meets these qualities can be challenging.
The amount of raw data in a stream can surge rapidly. Consider the sudden exponential growth of new data created by stock trades during a market selloff, social media posts during a big sporting event, or log activity during a system failure. Data streams must be scalable by design. Even during times of high activity, they need to prioritize proper data sequencing, data consistency, and availability. Streams also must be designed for durability in the event of a partial system failure.
Across a distributed hybrid cloud environment, a streaming data cluster demands special considerations. Typical streaming data brokers are stateful and must be preserved in the event of a restart. Scaling requires careful orchestration to make sure messaging services behave as expected and no records are lost.
Why use a streaming data service?
The challenge of delivering a complex, real-time, highly available streaming data platform can consume significant resources. It often takes expertise and hardware beyond the capabilities of an in-house IT organization.
For these reasons, many streaming data users opt for a managed cloud service, in which infrastructure and system management is offloaded to a service provider. This option helps organizations focus on their core competencies, rather than management and administration of a complex streaming data solution.
The official Red Hat blog
Get the latest information about our ecosystem of customers, partners, and communities.