DSCC Unit 4

Define Distributed system and its types
A Distributed System is a Network of Machines that can exchange information with each other through Message-passing. It can be very useful
as it helps in resource sharing. It enables computers to coordinate their activities and to share the resources of the system so that users perceive
the system as a single, integrated computing facility.
1 Client/Server Systems: Client-Server System is the most basic communication method where the client sends input to the server and the server
replies to the client with an output. The client requests the server for resources or a task to do, the server allocates the resource or performs the task
and sends the result in the form of a response to the request of the client. Client Server System can be applied with multiple servers.
2. Peer-to-Peer Systems: Peer-to-Peer System communication model works as a decentralized model in which the system works like both Client and
Server. Nodes are an important part of a system. In this, each node performs its task on its local memory and shares data through the supporting
medium, this node can work as a server or as a client for a system. Programs in the peer-to-peer system can communicate at the same level without
any hierarchy.
Middleware: Middleware can be thought of as an application that sits between two separate applications and provides service to both. It works as a
base for different interoperability applications running on different operating systems. Data can be transferred to other between others by using this
service.
4. Three-tier: Three-tier system uses a separate layer and server for each function of a program. In this data of the client is stored in the middle tier
rather than sorted into the client system or on their server through which development can be done easily. It includes an Application Layer, Data Layer,
and Presentation Layer. This is mostly used in web or online applications.
5. N-tier: N-tier is also called a multitier distributed system. The N-tier system can contain any number of functions in the network. N-tier systems
contain similar structures to three-tier architecture. When interoperability sends the request to another application to perform a task or to provide a
service. N-tier is commonly used in web applications and data systems.
Advantages of Distributed Systems:

● All the nodes in the distributed system are connected to each other. So nodes can easily share data with other nodes
● More nodes can easily be added to the distributed system i.e. it can be scaled as required.
● Failure of one node does not lead to the failure of the entire distributed system. Other nodes can still communicate with each other.
● Resources like printers can be shared with multiple nodes rather than being restricted to just one.
Disadvantages of Distributed Systems:
● It is difficult to provide adequate security in distributed systems because the nodes as well as the connections need to be secured.
● Some messages and data can be lost in the network while moving from one node to another.
● The database connected to the distributed systems is quite complicated and difficult to handle as compared to a single user system.
● Overloading may occur in the network if all the nodes of the distributed system try to send data at once.
FEATURES OF SCHEDULING ALGORITHMS

Scheduling is a decision-making process that is used on a regular basis in many manufacturing and services industries. It deals with the allocation of
resources to tasks over given time periods and its goal is to optimize one or more objectives
FEATURES OF SCHEDULING ALGORITHMS:
General purpose: ● A scheduling approach should make few assumptions about and have few restrictions to the types of applications
that can be executed. ● Interactive jobs, distributed and parallel applications, as wellas noninteractive batch jobs, should all be
supported with good performance. ● This property is a straightforward one, but to some extent difficult to achieve. ● Because
different kinds of jobs have different attributes, their requirements to the scheduler may contradict. ● To achieve the general purpose,
a tradeoff may have to be made
Efficiency: ● It has two meanings: one is that it should improve the performance of scheduled jobs as much as possible; the other is
that the scheduling should incur reasonably low overhead so that it would not counter attack the benefits.
Fairness: ● Sharing resources among users raises new challenges in guaranteeing that each user obtains his/her fair share when
demand is heavy is fairness. ● In a distributed system, this problem could be exacerbated such that one user consumes the enti re
system. ● There are many mature strategies to achieve fairness on a single node.
Dynamic: ● The algorithms employed to decide where to process a task should respond to load changes, and exploit the full extent of
the resources available.
Transparency: ● The behavior and result of a task’s execution should not be affected by the host(s) on which it executes. ● In
particular, there should be no difference between local and remote execution. ● No user effort should be required in deciding where to
execute a task or in initiating remote execution; a user should not even be aware of remote processing. ● Further, the applications
should not be changed greatly. ● It is undesirable to have to modify the application programs in order to execute them in the system.
Scalability: ● A scheduling algorithm should scale well as the number of nodes increases. ● An algorithm that makes scheduling
decisions by first inquiring the workload from all the nodes and then selecting the most lightly loaded node has poor scalability. ● This
will work fine only when there are few nodes in the system. ● This is because the inquirer receives a flood of replies almost
simultaneously, and the time required to process the reply messages
Fault tolerance: ● A good scheduling algorithm should not be disabled by the crash of one or more nodes of the system. ● Also, if the
nodes are partitioned into two or more groups due to link failures, the algorithm should be capable of functioning properly for the
nodes within a group. ● Algorithms that have decentralized decision making capability and consider only available nodes in their
decision making have better fault tolerance capability.
Quick decision making capability: ● Heuristic methods requiring less computational efforts (and hence less time) while providing near-
optimal results are preferable to exhaustive (optimal) solution methods.
SCHEDULING ALGORITHMS
LOCAL SCHEDULING
In a distributed system, local scheduling means how an individual workstation should schedule those processes assigned to it in order to maximize the
overall performance. It seems that local scheduling is the same as the scheduling approach on a stand-alone workstation. However, they are different in
many aspects. In a distributed system, the local scheduler may need global information from other workstations to achieve the optimal overall
performance of the entire system. For example, in the extended stride scheduling of clusters, the local schedulers need global ticket information in
order to achieve fairness across all the processes in the system. In recent years, there have been many scheduling techniques developed in different
models. Here, we introduce two of them: one is a proportionalsharing scheduling approach, in which the resource consumption rights of each active
process are proportional to the relative shares that it is allocated. The other is predictive scheduling, which is adaptive to the CPU load and resource
distribution of the distributed system. The traditional priority-based schedulers are difficult to understand and give more processing time to users with
many jobs, which leads to unfairness among users. Numerous researches have been trying to find a scheduler that is easy to implement and can solve
the problem of allocating resources to users fairly over time. In this environment, proportional-share scheduling was brought out to effectively solve
this problem. With proportional-share scheduling, the resource consumption rights of each active process are proportional to the relative shares that it
is allocated.
STRIDE SCHEDULING
As a kind of proportional-share scheduling strategies, stride scheduling allocates resources to competing users in proportion to the number of tickets
they hold. Each user has a time interval, or stride, inversely proportional to his/her ticket allocation, which determines how frequently it is used. A pass
is associated with each user. The user with a minimum pass is scheduled at each interval; a pass is then incremented by the job's stride.
Extension to Stride Scheduling: The original stride scheduling only deals with CPU-bound jobs. If the proportional-share schedulers are to handle the
interactive and I/O intensive job workloads, they must be extended to improve the responsive time and I/O throughput, while n ot penalizing competing
users. Here we discuss two extensions to stride scheduling that give credits to jobs not competing for resources. In this way, jobs are given incentive to
relinquish the processor when not in use and will receive their share of resources over a longer time-interval. Thus, because interactive jobs are
scheduled more frequently when they awaken, they can receive better response time. The first approach is loan & borrow, and t he second approach is
system credit. Both approaches are built upon exhaustible tickets, which are simple tickets with expiration time.
Loan & Borrow: In this approach, exhausted tickets are traded among competing clients. When a user temporarily exits the system, other users can
borrow these otherwise inactive tickets. The borrowed tickets expire when the user rejoins the system. When the sleeping user wakes up, it stops
loaning tickets and is paid back in exhaustible tickets by the borrowing users. In general, the lifetime of the exhaustible tickets is equal to the length the
original tickets were borrowed. This policy can keep the total number of tickets in the system constant over time; thus, users can accurately determine
the amount of resources they receive. However, it also introduces an excessive amount of computation into the scheduler on every sleep and wake-up
event, which we don’t expect.
System Credit: This second approach is an approximation of the first one. With system credits, clients are given exhaustible tickets from th e system
when they awaken. The idea behind this policy is that after a client sleeps and awakens, the scheduler calculates the number of exhaustible tickets for
the clients to receive its proportional share over some longer interval. The system credit policy is easy to implement and does not add significant
overhead to the scheduler on sleep and wakeup events. Proportional-share of resources can be allocated to clients running sequential jobs in a cluster.
In the cluster, users are guaranteed a proportional-share of resources if each local stride-scheduler is aware of the number of tickets issued in its
currency across the cluster and if the total number of base tickets allocated on each workstation is balanced. The solution for the first assu mption is
simple: each local scheduler is informed of the number of tickets issued in each currency, and then correctly calculates the base funding of each local
job. The solution for distributing tickets to the stride-schedulers is to run a user-level ticketssever on each of the nodes in the cluster. Each stride-
scheduler periodically contacts the local ticket server to update and determine the value of currencies. Further, for parallel jobs in a distributed cluster,
proportional-share resources can be provided through a combination of stride-scheduling and implicit coscheduling. Preliminary simulations of implicit
coscheduling for a range of a communication patterns and computation granularity indicate that the stride-scheduler with system credit performs
similarly to the Solaris time-sharing scheduler which is used in the Berkeley NOW environment .
PREDICTIVE SCHEDULING
Predictive scheduling differs from other scheduling approaches in that it provides intelligence, adaptivity and proactivity so that the system
implementing predictive scheduling can adapt to new architectures and/or algorithms and/or environmental changes automaticall y. Predictive
scheduling can learn new architectures, algorithms and methods that are embedded into the system. They provide some guarantees of service.
Furthermore, they are able to anticipate significant changes to its environment and avoid those changes to become the system performance bottleneck.
Predictive scheduling can be roughly decomposed into three components: H-cell, S-cell and allocator. The H-cell receives information of hardware
resource changes such as disk traffic, CPU usage, memory availability, etc., and provides near-real-time control. Meanwhile, S-cell provides longterm
control of computational demands--such as what the deadline of a task is and what its real-time requirement is--by interrogating the parallel program
code. H-cell and S-cell respectively collect information about computational supply and computational demand, and provide to the allocator the raw
data or some intelligent recommendations. The allocator reconciles the recommendations sent by the H-cells and S-cells and schedules jobs according
to their deadline, while guaranteeing constraints and enforcing the deadline. In the allocator, the previous inputs, in the form of a vector of
performance information (such as memory, CPU, disk usage etc.), are aggregated into sets. Each set corresponds to a scheduling decision. The allocator
re-organizes the sets dynamically to keep a limited memory demand by splitting or merging sets. If a new input matches one of the pattern categories, a
decision will be made due to the corresponding decision of that pattern set, otherwise a new pattern category is built to associate this new input
pattern with corresponding scheduling decision. Most of the scheduling policies are used either when a process blocks or at the end of a time slice,
which may reduce the performance because there can be a considerable lapse of time before scheduling is done. Predictive scheduling solves this
problem by predicting when a scheduling decision is necessary, or predicting the parameters needed by the scheduling decision when not known in
advance. Based on the collected static information (machine type, CPU power, etc.) and dynamic information (memory free space, CPU load, etc.),
predictive scheduling tries to make an educated guess about the future behavior, such as CPU idle time slot, which can be used to make scheduling
decisions in advance. Predicting the future performance based on past information is a common strategy, and it can achieve a satisfactory performance
in practical work. Predictive scheduling is very effective in performance and reliability enhancement, even with the simplest methods, but at the cost of
design complexity and management overhead. Furthermore, it is observed that the more complicated method is used, the more des ign complexity and
management overhead, and the less performance and reliability enhancement.
COSCHEDULING
In 1982, Outsterhout introduced the idea of coscheduling , which schedules the interacting activities (i. e., processes) in a job so that all the activities
execute simultaneously on distinct workstations. It can produce benefits in both system and individ ual job efficiency. Without coordinated scheduling,
the processor thrashing may lead to high communication latencies and consequently degraded overall performance. With systems connected by high
performance networks that already achieve latencies within tens microseconds, the success of coscheduling becomes a more important factor in
deciding the performance.
GANG SCHEDULING
Gang scheduling is a typical coscheduling approach, which has already been introduced for a long time but still plays a funda mental role. Moreover,
there are still many research projects in progress to improve gang scheduling. The approach identifies a job as a gang and its components as gang
members. Further, each job is assigned to a class that has the minimum number of workstations that meet the requirement of its gang members based
on a oneprocess- one-workstation policy. The class has a local scheduler, which can have its own scheduling policy. When a job is scheduled, each of its
gang members is allocated to a distinct workstation, and thus, the job executes in parallel. When a time-slice finishes, all running gang members are
preempted simultaneously, and all processes from a second job are scheduled for the next time-slice. When a job is rescheduled, effort is also made to
run the same processes on the same processors. The strategy bypasses the busy-waiting problem by scheduling all processes at the same time.
According to the experience, it works well for parallel jobs that have a lot of inter-process communications. However, it also has several disadvantages.
First, it is a centralized scheduling strategy, with a single scheduler making decisions for all jobs and all workstations. This centralized nature can easily
become the bottleneck when the load is heavy. Second, although this scheduler can achieve high system efficiency on regular parallel applications, it
has difficulty in selecting alternate jobs run when processes block, requiring simultaneous multi-context switches across the nodes. Third, to achieve
good performance requires long scheduling quanta, which can interfere with interactive response, making them a less attractive choice for use in a
distributed system. These limitations motivate the integrated approaches. The requirement of centralized control and the poor timesharing response of
previous scheduling approaches have motivated new, integrated coscheduling approaches. Such approaches extend local timesharing schedulers,
preserving their interactive response and autonomy. Further, such approaches do not need explicitly identified sets of processes to be coscheduled, but
rather integrate the detection of a coscheduling requirement with actions to produce effective coscheduling.
IMPLICIT COSCHEDULING
Implicit coscheduling is a distributed algorithm for time-sharing communicating processes in a cluster of workstations. By observing and reacting to
implicit information, local schedulers in the system make independent decisions that dynamically coordinate the scheduling of communicating
processes. The principal mechanism involved is two-phase spin-blocking: a process waiting for a message response spins for some amount of time, and
then relinquishes the processor if the response does not arrive. The spin ti me before a process relinquishes the processor at each communication event
consists of three components. First, a process should spin for the baseline time for the communication operation to complete; this component keeps
coordinated jobs in synchrony. Second, the process should increase the spin time according to a local cost-benefit analysis of spinning versus blocking.
Third, the pairwise cost-benefit, i.e., the process, should spin longer when receiving messages from other processes, thus considering t he impact of this
process on others in the parallel job.
● The baseline time comprises the round-trip time of the network, the overhead of sending and receiving messages, and the time to awake the
destination process when the request arrives.
● The local cost-benefit is the point at which the expected benefit of relinquishing the processor exceeds the cost of being scheduled again. For
example, if the destination process will be scheduled later, it may be beneficial to spin longer and avoid the cost of losing coordination and being
rescheduled later. On the other hand, when a large load imbalance exists across processes in the parallel job, it may be wasteful to spin for the entire
load-imbalance even when all the processes are coscheduled.
● The pairwise spin-time only occurs when other processes are sending to the currently spinning process, and is therefore conditional. Consider a pair
of processes: the receiver who is performing a twophase spin-block while waiting for a communication operation to complete, and a sender who is
sending a request to the receiver. When waiting for a remote operation, the process spins for the base and local amount, whil e recording the number of
incoming messages. If the average interval between requests is sufficiently small, the process assumes that it will remain beneficial in the future to be
scheduled and continues to spins for an additional spin time. The process continues conditionally spinning for intervals of s pin time until no messages
are received in an interval.
DYNAMIC COSCHEDULING
Dynamic coscheduling makes scheduling decisions driven directly by the message arrivals. When an arriving message is directed to a process that isn’t
running, a schedule decision is made. The idea derives from the observation that only those communicating processes need to be coscheduled.
Therefore, it doesn’t require explicit identification to specify the processes need coscheduling.
The implementation consists three parts:
Monitoring Communication/Thread Activity: A firmware, which is on the network interface card, monitors the thread activities by periodically reading
the host's kernel memory. If the incoming message is sent to the process currently running, the scheduler should do nothing.
Causing Scheduling Decisions: If a message received is not sent to the process currently running, an interrupt will be produced and invoke the
interrupt routine. When the routine finds that it would be fair to preempt the process currently running, the process receivi ng the message has its
priority raised to the maximum allowable priority for user mode timesharing processes, and is placed at the front of the dispatcher queue. Flags are set
to cause a scheduling decision based on the new priorities. This will cause the process receiving the message to be scheduled unless the process
currently running has a higher priority than the maximum allowable priority for user mode.
Making a Decision Whether to Preempt: In dynamic coscheduling, the process receiving the message is scheduled only if doing so would not cause
unfair CPU allocation. The fairness is implemented by limiting the frequency of priority boosts that therefore limits the frequency of preemption. In jobs
with fine-grained communication, the sender and receiver are scheduled together and run until one of them blocks or is preempted. Larger collections
of communicating processes are coscheduled by transitivity. The experiments taken in HPVM project indicate that dynamic coscheduling can provide
good performance for a parallel process running on a cluster of workstations in competition with serial processes. Performanc e was able to close to
ideal: CPU times were nearly the same as for batch processing, and reduced job response times by up to 20% over implicit scheduling while maintaining
near-perfect fairness. Further, it claims that dynamiccoscheduling-like approaches can be used to implement coordinated resource management in a
much broader range of cases, although most of which are still to be explored.
TASK ASSIGNMENT APPROACH
Each process is viewed as a collection of tasks. These tasks are scheduled to suitable processor to improve performance. This is not a widely used
approach because:
● It requires characteristics of all the processes to be known in advance.
● This approach does not take into consideration the dynamically changing state of the system. In this approach, a process is considered to be
composed of multiple tasks and the goal is to find an optimal assignment policy for the tasks of an individual process. The following are typical
assumptions for the task assignment approach: ● Minimize IPC cost (this problem can be modeled using network flow model) ● Efficient resource
utilization ● Quick turnaround time ● A high degree of parallelism A Distributed System is a Network of Machines that can exchange information with
each other through Message-passing. It can be very useful as it helps in resource sharing. In this article, we will see the concept of the Task Assignment
Approach in Distributed systems.
Resource Management: One of the functions of system management in distributed systems is Resource Management. When a user requests the
execution of the process, the resource manager performs the allocation of resources to the process submitted by the user for execution. In addition,
the resource manager routes process to appropriate nodes (processors) based on assignments. Multiple resources are available in the distributed
system so there is a need for system transparency for the user. There can be a logical or a physical resource in the system. For example, data files in
sharing mode, Central Processing Unit (CPU), etc. As the name implies, the task assignment approach is based on the division of the process into
multiple tasks. These tasks are assigned to appropriate processors to improve performance and efficiency. This approach has a major setback in that it
needs prior knowledge about the features of all the participating processes. Furthermore, it does not take into account the d ynamically changing state
of the system. This approach’s major objective is to allocate tasks of a single process in the best possible manner as it is based on the division of tasks in
a system. For that, there is a need to identify the optimal policy for its implementation.
Working of Task Assignment Approach:
In the working of the Task Assignment Approach, the following are the assumptions:
• The division of an individual process into tasks.
• Each task’s computing requirements and the performance in terms of the speed of each processor are known.
• The cost incurred in the processing of each task performed on every node of the system is known.
• The IPC (Inter-Process Communication) cost is known for every pair of tasks performed between nodes.
• Other limitations are also familiar, such as job resource requirements and available resources at each node, task priority co nnections,
and so on.
Goals of Task Assignment Algorithms:
• Reducing Inter-Process Communication (IPC) Cost
• Quick Turnaround Time or Response Time for the whole process
• A high degree of Parallelism
• Utilization of System Resources in an effective manner
The above-mentioned goals time and again conflict. To exemplify, let us consider the goal-1 using which all the tasks of a process need
to be allocated to a single node for reducing the Inter-Process Communication (IPC) Cost. If we consider goal-4 which is based on the
efficient utilization of system resources that implies all the tasks of a process to be divided and processed by appropriate nodes in a
system.
Note: The possible number of assignments of tasks to nodes:
For m tasks and n nodes= m x n
But in practice, the possible number of assignments of tasks to nodes < m x n because of the constraint for allocation of tasks to the
appropriate nodes in a system due to their particular requirements like memory space, etc.
Need for Task Assignment in a Distributed System:
The need for task management in distributed systems was raised for achieving the set performance goals. For that optimal assignments
should be carried out concerning cost and time functions such as task assignment to minimize the total execution and communication
costs, completion task time, total cost of 3 (execution, communication, and interference), total execution and communication costs with
the limit imposed on the number of tasks assigned to each processor, and a weighted product of cost functions of total execution and
communication costs and completion task time. All these factors are countable in task allocation and turn, resulting in the best outcome
of the system.
(NOTE:- example not In this)
CLASSIFICATION OF LOAD BALANCING ALGORITHMS
A load balancer is a device that acts as a reverse proxy and distributes network or application traffic across a number of servers. Load adjusting is the
approach to conveying load units (i.e., occupations/assignments) across the organization which is associated with the distributed system. Load
adjusting should be possible by the load balancer. The load balancer is a framework that can deal with the load and is utilized to disperse the
assignments to the servers. The load balancers allocates the primary undertaking to the main server and the second assignment to the second
server.
Purpose of Load Balancing in Distributed Systems:
• Security: A load balancer provide safety to your site with practically no progressions to your application.
• Protect applications from emerging threats: The Web Application Firewall (WAF) in the load balancer shields your site.
• Authenticate User Access: The load balancer can demand a username and secret key prior to conceding admittance to your site to
safeguard against unapproved access.
• Protect against DDoS attacks: The load balancer can distinguish and drop conveyed refusal of administration (DDoS) traffic before
it gets to your site.
• Performance: Load balancers can decrease the load on your web servers and advance traffic for a superior client experience.
• SSL Offload: Protecting traffic with SSL (Secure Sockets Layer) on the load balancer eliminates the upward from web servers
bringing about additional assets being accessible for your web application.
• Traffic Compression: A load balancer can pack site traffic giving your clients a vastly improved encounter with your site.
Load Balancing Approaches:
• Round Robin
• Least Connections
• Least Time
• Hash
• IP Hash
Classes of Load Adjusting Calculations:
Following are a portion of the various classes of the load adjusting calculations.
• Static: In this model assuming any hub/node is found with a heavy load, an assignment can be taken arbitrarily and move the
undertaking to some other arbitrary system. .
• Dynamic: It involves the present status data for load adjusting. These are better calculations than static calculations.
• Deterministic: These calculations utilize processor and cycle attributes to apportion cycles to the hubs.
• Centralized: The framework states data is gathered by a single hub.
Advantages of Load Balancing:
• Load balancers minimize server response time and maximize throughput.
• Load balancer ensures high availability and reliability by sending requests only to online servers
• Load balancers do continuous health checks to monitor the server’s capability of handling the request.
LOAD-SHARING APPROACH
Load sharing basically denotes the process of forwarding a router to share the forwarding of traffic, in case of multiple paths if available in the routing
table. In case there are equal paths then the forwarding process will follow the load-sharing algorithm. In load sharing systems, all nodes share the
overall workload, and the failure of some nodes increases the pressure of the rest of the nodes. The load sharing approach ensures that no node is
kept idle so that each node can share the load.
For example, suppose there are two connections of servers of different bandwidths of 500Mbps and another 250Mbps. Let, there are 2 packets.
Instead of sending the 2 packets to the same connection i.e. 500Mbps, 1 packet will be forwarded to the 500Mbps and another to the 250Mbps
connection. Here the goal is not to use the same amount of bandwidth in two connections but to share the load so that each connection can sensibly
deal with it without any traffic.
Load Sharing algorithm includes policies like location policy, process transfer policy, state information exchange policy, load estimation policy,
priority assignment policy, and migration limiting policy.
1. Location Policies: The location policy concludes the sender node or the receiver node of a process that will be moved inside the framework for load
sharing. Depending upon the sort of node that steps up and searches globally for a reasonable node for the process, the location strategies are of the
accompanying kinds:
• Sender-inaugurated policy: Here the sender node of the process has the priority to choose where the process has to be sent. The actively loaded
nodes search for lightly loaded nodes where the workload has to be transferred to balance the pressure of traffic. Whenever a node’s load turns out
to be more than the threshold esteem, it either communicates a message or arbitrarily tests different nodes individually to o bserve a lightly loaded
node that can acknowledge at least one of its processes. In the event that a reasonable receiver node isn’t found, the node on which the process began
should execute that process.
• Receiver-inaugurated policy: Here the receiver node of the process has the priority to choose where to receive the process. In this policy, lightly
loaded nodes search for actively loaded nodes from which the execution of the process can be accepted. Whenever the load on a node falls under
threshold esteem, it communicates a text message to all nodes or tests nodes individually to search for the actively loaded nodes. Some vigorously
loaded node might move one of its processes if such a transfer does not reduce its load underneath the normal threshold.
2. Process transfer Policy: All or nothing approach is used in this policy. The threshold value of all the nodes is allotted as 1. A node turns into a
receiver node if there is no process and on the other side a node becomes a sender node if it has more than 1 process. If the nodes turn idle then they
can’t accept a new process immediately and thus it misuses the processing power To overcome this problem, transfer the process in such a node that
is expected to be idle in the future. Sometimes to ignore the processing power on the nodes, the load-sharing algorithm turns the threshold value
from 1 to 2.
3. State Information exchange Policy: In load-sharing calculation, it is not required for the nodes to regularly exchange information, however, have
to know the condition of different nodes when it is either underloaded or overloaded. Thus two sub-policies are used here:
• Broadcast when the state changes: The nodes will broadcast the state information request only when there is a change in state. In the sender-
inaugurated location policy, the state information request is only broadcasted by the node when a node is overloaded. In the receiver-inaugurated
location policy, the state information request is only broadcasted by the node when a node is underloaded.
• Poll when the state changes: In a large network the polling operation is performed. It arbitrarily asks different nodes for state information till it gets
an appropriate one or it reaches the test limit.
4. Load Estimation Policy: Load-sharing algorithms aim to keep away from nodes from being idle yet it is adequate to know whether a node is
occupied or idle. Consequently, these algorithms typically utilize the least complex load estimation policy of counting the absolute number of processes
on a node.
5. Priority Assignment Policy: It uses some rules to determine the priority of a particular node. The rules are:
• Selfish: Higher priority is provided to the local process than the remote process. Thus, it has the worst response time performance for the remote
process and the best response time performance for the local process.
• Altruistic: Higher priority is provided to the remote process than the local process. It has the best response time performance.
• Intermediate: The number of local and remote processes on a node decides the priority. At the point when the quantity of local processes is more or
equivalent to the number of remote processes then local processes are given higher priority otherwise remote processes are given higher priority than
local processes.
6. Migration limiting policy: This policy decides the absolute number of times a process can move. One of the accompanying two strategies might be
utilized.
• Uncontrolled: On arrival of a remote process at a node is handled similarly as a process emerging at a node because of which any number of times a
process can migrate.
• Controlled: A migration count parameter is used to fix the limit of the migration of a process. Thus, a process can migrate a fixed number of times
here. This removes the instability of uncontrolled strategy.
PROCESS MANAGEMENT
Process management is a systematic approach to ensure that effective and efficient business processes are in place. It is a methodology used to align
business processes with strategic goals. In contrast to project management, which is focused on a single project, process management addresses
repetitive processes carried out on a regular basis. It looks at every business process, individually and as a whole, to create a more efficient
organization. It analyzes current systems, spots bottlenecks, and identifies areas of improvement. Process management is a long-term strategy that
constantly monitors business processes so they maintain optimal efficiency. Implemented properly, it significantly helps boost business growth.
Realtime Process Management Examples:-

Onboarding New Employees | Human Resource Department: Without a proper system, the onboarding process can be chaotic and timeconsuming.
With BPM, forms and documents can be filled up and submitted electronically. Software is used to automatically filter data, find the best matches for
a position, send messages, schedule interviews, and facilitate employee onboarding.
Managing Logistics | Shipping Company: Logistics for a shipping company entails a long chain of complex processes while dealing with potentially
thousands of people in various locations. BPM standardizes and optimizes routines involved to streamline the entire process and deliver quality
service. It integrates production, finance, quality control, HR, customer service, and other departments. It centralizes data to facilitate easy retrieval of
information in every phase of the business operations.
Loan Processing | Banking Firms: With BPM, processing loans can be done in a much shorter time. It creates an efficient flow from document
submission to credit and risk checks to loan approval. It also enables tracking of applications through the entire loan processing system.
Compliance Management| Insurance Companies: BPM improves the overall regulatory compliance of insurance companies. It reduces human error
and prevents data loss through proper documentation management. It also ensures that the company is able to comply with the latest state and
federal regulations.
Customer Service | Retail Business: BPM enables customer-centric operations. It unifies all systems and departments for a smooth workflow that
ensures all customer needs are met. It also identifies bottlenecks in the buyer’s journey so the entire purchasing process ca n be improved.
Q.) How can process management be implemented?

Q.) Selection criteria for good BPM software?
What are the desirable features of a good process migration mechanism?
A good process migration mechanism must possess transparency, minimal interferences, minimal residue dependencies, efficiency, robustness and
communication between coprocesses.
Transparency: Transparency is an important requirement for a system that supports process migration. Following levels of transparency can b e
identified: Object access level: Transparency at the object access level is the minimum requirement for a system to support n on-preemptive process
migration facility. If a system supports transparency at object access level, access to objects such as files and devices can be done at location
independent manner. System calls and inters process communication level: In this case, a migrated process does not continue to depend upon its
originating node after being migrated. It is necessary that all system calls including inter process communication are location independent.
Minimal Interference: Migration of a process should cause minimal interference to the progress of the process involved and to the system as a whole.
One method to achieve this is to minimize the freezing time of the process being executed. Freezing time is defined as the ti me period for which the
execution of the process is stopped for transferring its information to the destination node.
Minimal Residual Dependencies A migrated process should not in any way continue to depend on its previous node once it has started executing on
its new node. Else the following things occur: The migrated process will continue to impose a load on its previous node. A failure or reboot of the
previous node will cause the process to fail.
Efficiency : The main sources of inefficiency are: Time required for migrating a process. Cost of location an object Cost of supporting remote execution
once the process is migrated.
Robustness : It is the case where the failure of one node other than the one on which a process is currently running should not in any w ay affect the
accessibility or execution of a process.
Communication between coprocesses of a job:- Communication between coprocesses of a job should be carried out a minimal cost and also should
be able to directly communicate with each other irrespective of their locations.
What is Thread and types of thread models

A thread is a light weight process which is similar to a process where every process can have one or more threads. Each thread
contains a Stack and a Thread Control Block. There are four basic thread models :
1. User Level Single Thread Model:
● Each process contains a single thread.
● Single process is itself a single thread.
● process table contains an entry for every process by maintaining its PCB.
2. User Level Multi Thread Model:

● Each process contains multiple threads.
● All threads of the process are scheduled by a thread library at user level.
● Thread switching can be done faster than process switching.
● Thread switching is independent of operating system which can be done within a process.
● Blocking one thread makes blocking of entire process.
● Thread table maintains Thread Control Block of each thread of a process.
● Thread scheduling happens within a process and not known to Kernel.
3. Kernel Level Single Thread Model:

● Each process contains a single thread.
● Thread used here is kernel level thread.
● Process table works as thread table.
4. Kernel Level Multi Thread Model:
● Thread scheduling is done at kernel level.
● Fine grain scheduling is done on a thread basis.
● If a thread blocks, another thread can be scheduled without blocking the whole process.
● Thread scheduling at Kernel process is slower compared to user level thread scheduling.
● Thread switching involves switch.
DATA FILE SYSTEM
A Distributed File System (DFS) as the name suggests, is a file system that is distributed on multiple file servers or multiple
locations. It allows programs to access or store isolated files as they do with the local ones, allowing programmers to access
files from any network or computer. The main purpose of the Distributed File System (DFS) is to allows users of physically
distributed systems to share their data and resources by using a Common File System.
FEATURES OF DFS:
Structure transparency: There is no need for the client to know about the number or locations of file servers and the
storage devices. Multiple file servers should be provided for performance, adaptability, and dependability. Access
transparency: Both local and remote files should be accessible in the same manner. The file system should be automatically
located on the accessed file and send it to the client’s side.
Naming transparency: There should not be any hint in the name of the file to the location of the file. Once a name is given
to the file, it should not be changed during transferring from one node to another.
Replication transparency: If a file is copied on multiple nodes, both the copies of the file and their locations should be
hidden from one node to another.
Performance: Performance is based on the average amount of time needed to convince the client requests. This time covers
the CPU time + time taken to access secondary storage + network access time. It is advisable that the performance of the
Distributed File System be similar to that of a centralized file system.
Security: A distributed file system should be secure so that its users may trust that their data will be kept private. To
safeguard the information contained in the file system from unwanted & unauthorized access, security mechanisms must be
implemented.
APPLICATIONS OF DFS:
NFS: NFS stands for Network File System. It is a client-server architecture that allows a computer user to view, store, and
update files remotely. The protocol of NFS is one of the several distributed file system standards for Network-Attached
Storage (NAS).
CIFS: CIFS stands for Common Internet File System. CIFS is an accent of SMB. That is, CIFS is an application of SIMB protocol,
designed by Microsoft.
SMB: SMB stands for Server Message Block. It is a protocol for sharing a file and was invented by IMB. The SMB protocol
was created to allow computers to perform read and write operations on files to a remote host over a Local Area Network
(LAN). The directories present in the remote host can be accessed via SMB and are called as ―shares.
Hadoop: Hadoop is a group of open-source software services. It gives a software framework for distributed storage and
operating of big data using the MapReduce programming model. The core of Hadoop contains a storage part, known as
Hadoop Distributed File System (HDFS), and an operating part which is a MapReduce programming model.
NetWare: NetWare is an abandon computer network operating system developed by Novell, Inc. It primarily used combined
multitasking to run di
Advantages:
● DFS allows multiple user to access or store the data.
● It allows the data to be share remotely.
● It improved the availability of file, access time, and network efficiency.
● Improved the capacity to change the size of the data and also improves the ability to exchange the data.
● Distributed File System provides transparency of data even if server or disk fails.
Disadvantages:
● In Distributed File System nodes and connections needs to be secured therefore we can say that security is at stake.
● There is a possibility of lose of messages and data in the network while movement from one node to another. ● Database
connection in case of Distributed File System is complicated.
● Also handling of the database is not easy in Distributed File System as compared to a single user system.
● There are chances that overloading will take place if all nodes tries to send data at once.
FILE MODELS IN DISTRIBUTED SYSTEM
There are mainly two types of file models in the distributed operating system.
1. Structure Criteria
2. Modifiability Criteria Structure Criteria
There are two types of file models in structure criteria.
These are as follows:
1. Structured Files
2. Unstructured Files
STRUCTURED FILES
The Structured file model is presently a rarely used file model. In the structured file model, a file is seen as a collection of records by the
file system. Files come in various shapes and sizes and with a variety of features. It is also possible that rec ords from various files in the
same file system have varying sizes. Despite belonging to the same file system, files have various attributes. A record is the smallest unit of
data from which data may be accessed. The read/write actions are executed on a set of records. Different "File Attributes" are provided in
a hierarchical file system to characterize the file. Each attribute consists of two parts: a name and a value. The file system used determines
the file attributes. It provides information on files, file sizes, file owners, the date of last modification, the date of file creation, access
permission, and the date of last access. Because of the varied access rights, the Directory Service function is utilized to manage file
attributes.
The structured files are also divided into two types:

1. Files with Non-Indexed records
2. Files with Indexed records
1. Files with Non-Indexed records: Records in non-indexed files are retrieved based on their placement inside the file. For instance, the
second record from the starting and the second from the end of the record.
2. Files with Indexed records: Each record contains a single or many key fields in a file containing indexed records, each of which may be
accessed by specifying its value. A file is stored as a B-tree or similar data structure or hash table to find records quickly.
Unstructured Files:
It is the most important and widely used file model. A file is a group of unstructured data sequences in the unstructured model. Any
substructure does not support it. The data and structure of each file available in the file system is an uninterrupted sequence of bytes
such as UNIX or DOS. Most latest OS prefer the unstructured file model instead of the structured file model due to sharing of files by
multiple apps. It has no structure; therefore, it can be interpreted in various ways by different applications.
Modifiability Criteria:
There are two files model in the Modifiability Criteria.
These are as follows:
1. Mutable Files
2. Immutable Files
1. Mutable Files: The existing operating system employs the mutable file model. A file is described as a single series of records because
the same file is updated repeatedly once new material is added. After a file is updated, the existing contents are cha nged by the new
contents.
2. Immutable Files: The Immutable file model is used by Cedar File System (CFS). The file may not be modified once created in the
immutable file model. Only after the file has been created can it be deleted. Several versions of the same file are created t o implement
file updates. When a file is changed, a new file version is created. There is consistent sharing because only immutable files are shared in
this file paradigm. Distributed systems allow caching and replication strategies, overcoming the limitation of many copies and maintaining
consistency. The disadvantages of employing the immutable file model include increased space use and disc allocation activity. CFS uses
the "Keep" parameter to keep track of the file's current version number. When the parameter value is 1, it results in the production of a
new file version. The previous version is erased, and the disk space is reused for a new one. When the parameter value is greater than 1,
it indicates the existence of several versions of a file. If the version number is not specified, CFS utilizes the lowest version number for
actions such as "delete" and the highest version number for other activities such as "open".
The specific client's request for accessing a particular file is serviced on the basis of the file accessing model used by th e distributed file
system. The file accessing model basically depends on 1 ) the unit of data access and 2 ) the method used for accessing remote files.
On the basis of the unit of data access, following file access models might be used in order to access the specific file.
1. File-level transfer model
2. Block-level transfer model
3. Byte-level transfer model
4. Record-level transfer model
1. File-level transfer model: In file-level transfer model, the complete file is moved while a particular operation necessitates the file data
to be transmitted all the way through the distributed computing network amongst client and server. This model has better scalability and
is efficient.
2. Block-level transfer model: In block-level transfer model, file data transfers through the network amongst client and a server is
accomplished in units of file blocks. In short, the unit of data transfer in block-level transfer model is file blocks. The block-level transfer
model might be used in distributed computing environment comprising several diskless workstations.
3. Byte-level transfer model: In byte-level transfer model, file data transfers the network amongst client and a server is accomplished in
units of bytes. In short, the unit of data transfer in byte-level transfer model is bytes. The byte-level transfer model offers more flexibility
in comparison to the other file transfer models since, it allows retrieval and storage of an arbitrary sequential subrange of a file. The
major disadvantage of byte-level transfer model is the trouble in cache management because of the variable-length data for different
access requests.
4. Record-level transfer model: The record-level file transfer model might be used in the file models where the file contents are structured
in the form of records. In record-level transfer model, file data transfers through the network amongst client and a server is accomplished
in units of records. The unit of data transfer in record level transfer model is record.

DSCC Unit 4

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

DSCC Unit 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSCC Unit 4

Uploaded by

Copyright:

Available Formats

Define Distributed system and its types

Advantages of Distributed Systems:

FEATURES OF SCHEDULING ALGORITHMS

TASK ASSIGNMENT APPROACH

CLASSIFICATION OF LOAD BALANCING ALGORITHMS

Realtime Process Management Examples:-

Q.) How can process management be implemented?

What is Thread and types of thread models

2. User Level Multi Thread Model:

3. Kernel Level Single Thread Model:

DATA FILE SYSTEM

FILE MODELS IN DISTRIBUTED SYSTEM

The structured files are also divided into two types:

You might also like