Stream & Parallel Stream - Java 8
Stream & Parallel Stream - Java 8
A Stream is a pipeline of data operations that can transform or process collections in a functional way — without mutating the original
source.
Streams help you work with collections (like List, Set) in a clean and functional way.
Think of it like a chain of steps:Take a list → 🎛 Filter it → 🛠 Transform it → 📦 Collect the result
In-Built Functions with Examples
🔹 findFirst() / findAny(): Fetch the first or any element from the stream.
Combined Example
Here are some handy, real-world-inspired, and slightly complex Java 8 Stream examples
1. Group Employees by Department and Count Them
✅ Real-world use: Billing/inventory systems to get all sold items for a day.
10. Map of Product Name: Discounted Price (using Collectors.toMap)
In Java 8, parallel streams provide a way to process large collections of data concurrently (using multiple threads). When you use a parallel stream, the
data is split into smaller chunks, and each chunk is processed in parallel. This can significantly speed up operations, especially when dealing with large
datasets.
However, parallel streams are not always faster than sequential streams. The effectiveness of parallel streams depends on the nature of the operation
and the size of the dataset. For tasks that are lightweight or involve too much context switching, parallel streams may actually perform worse.
Let’s dive deeper into parallel streams and their built-in features with basic examples.
How to Use Parallel Stream:
To convert a stream into a parallel stream, you can simply use the parallel() method on an existing stream or call parallelStream() directly on
a collection.
1. Basic Parallel Stream Example: This is a simple example of converting a list to a parallel stream and applying a map operation.
2. Using filter() with Parallel Streams: Parallel streams work efficiently with filtering operations as well.
3. Using reduce() with Parallel Streams: The reduce() operation can also benefit from parallelism. In this case, it's calculating the sum of
numbers.
4. Using forEach() with Parallel Streams: Note that forEach() may not guarantee the order of processing when used with parallel streams.
The order may be unpredictable.
5. collect() with Parallel Streams: Collecting results with parallel streams is a common operation, but care should be taken if you are using
non-thread-safe collectors (e.g., a simple list).
6. map() and flatMap() in Parallel Streams: These operations can also benefit from parallel execution, especially for transformations like
merging data from different sources.
7. Using parallelStream() for Big Data Processing: Parallel streams shine when processing large datasets.
8. Parallel Stream with Grouping and Summing: Imagine you have a list of orders, and you need to group them by customer, and calculate
the total amount spent by each customer, all while processing the data in parallel.
Explanation:
We are grouping the orders by customer and then calculating the sum of their total amount using groupingBy and summingDouble. This
entire process happens in parallel, which speeds up execution for large datasets.
9. Parallel Stream with Custom Object Transformation: In this example, we have a list of products. We want to apply a discount and then
transform the results into a list of discounted products in parallel.
Explanation:
In this case, each product’s price is reduced by 10%, and this transformation happens in parallel across the stream. For large product
catalogs, parallel processing can make the transformation process faster.
10. Parallel Stream with Complex Grouping and Mapping: Here we want to group students by their department and calculate the average
marks for each department, but we also want to filter students who have marks greater than a threshold.
Explanation:
This example demonstrates how you can filter elements while using parallel streams. The students are first filtered for those with marks
above 80, and then they are grouped by their department. The average marks are computed using averagingDouble.
11. Parallel Stream with FlatMap and Reduction: Let’s say you have a list of lists (like a list of orders for each customer), and you need to
flatten the lists and then calculate the total revenue.
Explanation:
We used flatMap to flatten the list of lists of orders into a single stream, then mapToDouble to extract the order amounts. Finally, sum is
used to get the total revenue. This entire process is done in parallel for better performance on large data.
12. Parallel Stream with forEachOrdered(): If you need to maintain order when using a parallel stream (which is not guaranteed by default),
you can use forEachOrdered().
Explanation:
In this example, even though the operation is done in parallel, forEachOrdered() ensures that the items are processed in the original order.
This is useful when you need to maintain sequence, even in parallel execution.
13. Parallel Stream with Custom Collector: Let’s create a custom collector that calculates the sum of squares of numbers using parallel
streams.
Explanation:
Here, we create a custom collector SumOfSquaresCollector to calculate the sum of squares in parallel. This gives us more flexibility when
working with complex parallel stream scenarios.
14. Parallel Stream with Sorting: If you want to sort a collection using parallel streams, you should be careful, as sorting is a stateful
operation that may not benefit from parallelism due to the complexity of merging threads. However, for larger datasets, parallel sorting can
still provide a speedup.
Explanation:
Even though sorting is a stateful operation, parallel streams can speed up the sorting process on large collections by dividing the work
across multiple threads.
1. Thread Safety: Be cautious when collecting into non-thread-safe collections (e.g., List), especially when performing operations like
add() inside parallel operations.
2. Overhead: For small datasets or simple operations, parallel streams can have more overhead than using sequential streams.
3. Stateful Operations: Parallel streams can have issues with stateful operations (e.g., sorting). Be sure to check if your operations are
stateless.
4. Order of Execution: Parallel streams do not guarantee the order of processing. If you need the order to be maintained, either use
forEachOrdered() or avoid parallelism.
Efficiency: Parallel streams are effective when working with large datasets or CPU-intensive operations that can benefit from parallel
processing.
Thread Safety: Ensure that operations inside parallel streams are thread-safe (e.g., using thread-safe collections or Collectors.toList()).
Performance Trade-Off: Always test the performance, as parallel streams can sometimes introduce more overhead due to thread
management, especially for smaller datasets.
🔹 Parallel Stream
Execution: Multi-threaded (uses ForkJoinPool).
Performance: Better for large datasets or CPU-intensive tasks.
Thread Safety: Needs care with shared mutable state.
Order: Not guaranteed unless using forEachOrdered().
Use When:
Dataset is large.
Task is heavy (e.g., big calculations).
Tasks are independent and can run in parallel.
Small dataset ✅ ❌
Large dataset ❌ ✅
CPU-intensive transformation ❌ ✅