DISTRIBUTED PROCESSING
What is Distributed Processing?
➢ Distributed processing means that a specific task can be broken up into functions, and the
functions are dispersed across two or more interconnected processors (nodes).
➢ A distributed application is an application for which the component application programs
are distributed between two or more interconnected processors.
➢ Distributed data is data that is dispersed across two or more interconnected systems.
Architecture Diagram For Distributed Processing
Key Components of Distributed Processing:
➢ Nodes: Independent computers/devices working together on the task (e.g., servers,
laptops, edge devices).
➢ Internetwork: The communication infrastructure connecting nodes (e.g., LAN, WAN,
cloud network).
➢ Operating System (OS): Manages resources and provides an interface for applications on
each node.
➢ Middleware: Software facilitating communication and coordination between nodes,
including:
i. Distributed File System (DFS): Allows seamless access to shared data across
nodes.
ii. Remote Procedure Call (RPC): Enables nodes to invoke procedures on other
nodes as if they were local.
iii. Message Queuing System (MQS): Facilitates asynchronous communication and
task distribution.
➢ Applications: Programs designed to leverage the distributed environment for parallel
processing.
Working of Distributed Processing:
1. Task partitioning: The master node or another designated node splits the larger task into
smaller, independent subtasks.
2. Subtask distribution: Subtasks are assigned to available nodes based on load balancing
algorithms.
3. Subtask execution: Each node independently executes its assigned subtask using its local
resources.
4. Result integration: Results from individual nodes are sent back to the master node or
another designated node.
5. Final output generation: The master node/designated node merges and presents the final
output.
Features of Distributed Processing:
➢ Parallel processing: Subtasks are executed concurrently, reducing overall processing
time.
➢ Scalability: Adding more nodes increases processing power and storage capacity.
➢ Fault tolerance: System remains operational even if individual nodes fail
(redundancy).
➢ Resource sharing: Resources (computation, storage, network) are shared among
nodes.
➢ Transparency: Users or applications may not be aware of the distributed nature.
Characteristics of Distributed Processing:
➢ Geographic decentralization: Nodes can be located anywhere in a network.
➢ Resource independence: Nodes utilize their own resources for subtask execution.
➢ Heterogeneity: Nodes may have different hardware, software, and OS setups.
➢ Concurrency: Subtasks are often executed concurrently for faster results.
➢ Asynchronous communication: Nodes may interact asynchronously using MQS for
flexibility.
Pros and Cons of Distributed Processing:
Pros of Distributed Processing:
➢ Performance: Significantly improved processing speed, especially for large tasks.
➢ Scalability: Easy to add more nodes to accommodate growing demands.
➢ Fault tolerance: System continues to function even if some nodes fail.
➢ Load balancing: Efficiently distributes workload across nodes to optimize
utilization.
➢ Data locality: Improves access times for data located near processing nodes.
Cons of Distributed Processing:
➢ Complexity: Higher design, development, and management overhead compared to
centralized systems.
➢ Communication overhead: Network communication may introduce latency and
consume resources.
➢ Security risks: Increased attack surface due to multiple entry points.
➢ Data consistency: Maintaining data consistency across multiple nodes can be
challenging.
➢ Troubleshooting: Diagnosing issues in a distributed system can be complex.
Applications / Uses of Distributed Processing:
➢ High-performance computing (HPC): Complex scientific simulations, weather
forecasting, large-scale data analysis.
➢ Big data processing: Analytics, machine learning, real-time data pipelines.
➢ Cloud computing: Virtualization, infrastructure management, distributed applications.
➢ Content delivery networks (CDNs): Efficient delivery of online content across
geographically dispersed locations.
➢ Internet of Things (IoT): Sensor data processing, edge computing, real-time
monitoring.
Distributed processing is a powerful computing paradigm that divides tasks into smaller
subtasks, distributing them across multiple nodes for parallel execution. The architecture
typically includes nodes, communication channels, and coordination mechanisms, as illustrated
in the architecture diagram. Key features of distributed processing include enhanced
performance, scalability, and fault tolerance. Its characteristics involve task division, parallel
execution, communication, fault tolerance, and load balancing. Pros include improved
efficiency, scalability, and fault tolerance, while cons may involve increased complexity and
communication overhead. Applications span various domains, such as data analytics (e.g.,
MapReduce, Apache Spark), distributed databases (e.g., Apache Cassandra), and cloud
computing. In conclusion, distributed processing is a versatile approach offering significant
advantages in handling large-scale computing tasks, despite inherent challenges in system
complexity and communication management.