Distributed Operating Systems
PRINCIPLES OF OPERATING SYSTEMS
Outline
Introductory material
Distributed IPC
Distributed file systems Security for distributed systems
Outline of Introductory Materials
Why distributed operating systems?
Important issues in distributed OSes
Important distributed OS tools and mechanisms
Why Bother?
Economics of hardware
Local autonomy
Resource sharing Effective use of networks
Reliability
Economics of Hardware
Cheaper to build many small machines than one
large one Due to
Economics of scale Chip design and fabrication issues
Gives purchasers easy options to increase computer
power
Local Autonomy
Single user machines better suited for most
computer tasks Allow dedication of resources to a users task
E.g., easier to guarantee response time
Owning user can control his computer power
Resource Sharing
But users need to share resources
Hardware resources Printers and tape drives
Software resources Data Access to software services
Network Usage
Users often want to communicate With other local users And to make data available to world System needs to support user interactions Generally demands cooperation among multiple
machines
Reliability
Failure of a single machine no longer halts everyone
Generally graceful degradation of the overall
systems resources Ability to apply fault tolerance for important tasks at a high architectural level
Problems with Distributed Systems
More complex model of the system
Harder to provide correct operation
Harder to allocate resources properly Security
Dealing with partial failures
Scaling issues Heterogeneity
Complexity of the Model
Problem for Designers Users System software Harder to understand what will happen at any given
case Harder to design software to handle even understood complexities
Difficulties with Correct Operation
Distribution requires more complex synchronization
Differences between similar operations with remote
and local New sources of nonuniform timings
Difficulties of Allocating Resources
Local machine may have inadequate resources for a
task
While a remote machine lies idle
Infeasible to control resources centrally Do I need to go remote to satisfy
malloc()?
Using remote resources conflicts with local
autonomy
Security
Security problems much trickier when no centralized
control Data communications more subject to eavedropping Physical security measures typically infeasible for many problems In very wide distributed systems, very tricky problems
Dealing with Partial Failures
Single machines usually have easy failure modes
Distributed systems face complications
Even detecting failure of a remote machine is
nontrivial
E.g., whats the difference between a slow network, a failed network, and a crashed machine?
Scaling Issues
Distributed systems control much larger pools of
resources So algorithms that scale well become much more important Scaling puts severe limits on close cooperation
Heterogeneity Problems
Most distributed systems must address problems of
differing hardware and software Problems with data formats, executable formats Problems with software versioning Problems with different OSes
Resource Sharing
Resource sharing helps with some of the problems
Motivations for resource sharing Information exchange Load distribution Computational parallelism The fundamental distributed system problem
Distribution Complicates Everything
Process control and synchronization
Interprocess communications
File systems Security
Device management
Important Research Areas in Distributed Operating Systems
In the area of processes Remote interprocess communications Synchronization Naming Distributed process management
More Research Areas
In the area of resource management Resource allocation Distributed deadlock mechanisms Protection and security Managing communication resources
Taxonomy of Distributed Systems
Data Stream Single Single Instruction Stream Multiple Multiple
SISD
SIMD
MISD
MIMD
Network OSes vs. Distributed OSes
Network Oses control a single machine, plus some
remote access facilities Distributed OSes control a collection of machines Not a hard and fast distinction
Network OS Diagram
Network OS
Network OS Network OS
Network OS
Network OS
Distributed OS Diagram
NODE 1 NODE 5 Network OS
Network OS
Distributed Operating system
NODE 2
Network OS
NODE 4
Network OS
NODE 3
Network OS
Characteristics of Network OSes
Private per-machine OS
Normal operations only on local machine
Machine boundaries are explicit Little per-user fault tolerance
Characteristics of Distributed OSes
Single system controls multiple machines
Use of remote machines invisible
Users treat system as virtual uniprocessor Strong fault tolerance
Reality is Somewhere in Between
Relatively few true distributed OSes
Network OS model
But many modern systems have distributed OS-
like capabilities
Like remote file access
And they also support network OS operations Like rlogin and remote shell
WWW access is in between
The Role of the Network
Distributed OSes made possible by network
Two fundamental types Local area networks Long haul networks
With very different characteristics
Local Area Networks
High bandwidth
Low delay
Shared by modest number of machines Covers modest geographical area
Dedicated to small group of users
Can be regarded as extension to computers
backplane
Long Haul Networks
Lower bandwidth
Longer delays
Shared by large numbers of machines Covers very wide area
Typically shared by many independent groups
Communication Protocols
Well defined methods of intermachine data exchange
To automatically handle problems of connecting
network Many different types required/available
Using Protocols in Distributed Operating Systems
Any intermachine operation requires a protocol to
control it So all machines involved can understand data exchange Fundamental choice
General vs. special purpose protocols
General vs. Special Purpose Protocols
General protocols try to handle any kind of traffic
Special purpose protocols are customized for one
situation General protocols simplify everything Special purpose protocols may perform better
Important Issues in Distributed Operating Systems
Communication model
Process interaction
Transparency Heterogeneity
Autonomy
Consistency and transactions
Communication Models for Distributed Operating Systems
How do machines communicate? Generally message-based, at some level ISO model adds too much overhead So, special purpose protocols or simplified protocol stacking model is typically used
Process Interaction in Distributed Operating Systems
How do processes interact in a distributed system? Pipe model Uninterpreted message model Client/server model Peer-to-peer model Integrated model RPC model Shared memory model
Pipe Model
Processes interact through pipes Named or unnamed Local or remote
Pros/Cons of Pipe Model
+ Simple transfer of large blocks of data + Hides many aspects of distribution - Offers little organizational benefits - Short on flexibility - May be hard to get good performance
Uninterpreted Message Model
Processes send explicit messages
System provides general message delivery service
Higher level semantics handled by processes Libraries can provide useful message services Example: Isis
Pros/Cons of Uninterpreted Message Model
+ Simple and powerful + Relatively easy to implement + Can scale well - Offers little organizational support - Encourages asynchrony - Not everyones favorite programming paradigm
Client/Server Process Interaction Model
Processes are either clients or servers
Client send request messages to servers
Servers send response messages to clients Client compete for server resources
Control of total system effectively distributed
among servers Examples: Name servers, IPC servers, file servers, WWW servers, etc.
Pros/Cons of Client/Server Model
+ Simple model + Hides much distribution - Control of resources centralized in server - Servers are bottlenecks - Multiple implementations of servers to overcome bottlenecks increases complexity
Peer-to-Peer Model
A process serves as a client and a server
Control of the total system is distributed among
peers
Pros/Cons of Peer-to-Peer Model
+ No centralized bottleneck + Can scale well - Difficult to control the global behavior
Integrated Process Interaction Model
All system resources implemented in integrated way
Remote/local resources treated identically
System makes decisions on resource allocation E.g., Locus
Pros/Cons of Integrated Process Interaction Model
+ Hides distributed complexity + Reduces bottlenecks - Hard to implement correctly - Performance problems likely - Big scaling problems
RPC Model
Processes communicate through RPC Client/server often built on top of this But this model makes lower level more explicit
Pros/Cons of RPC Model
+ Simple programming model + Good scaling potential + Potentially performance - Potential for deadlock and blocking - Implicit close connection between processes - Potential bottleneck problems
Shared Memory Model
Provide distributed shared memory as the basic
interprocess communication mechanism Emulating local shared memory as closely as possible Possibly without substantial hardware support
Pros/Cons of Shared Memory Model
+ Simple user model + Easy to build other mechanisms on top - Hard to provide complete transparency - Hard to provide good performance - Serious scaling, heterogeneity questions
Transparency
Hiding machine boundaries From both users and system itself Transparent systems much easier to work with Providing at a low level has strong benefits
Not everything should be transparent
Kinds of Transparency
Data transparency
Process access transparency
Location transparency Name transparency
Control transparency
Execution transparency Performance transparency
Data Transparency
Allow transparent access to remote data
Benefit: allows use of remote data resources
NFS is (largely) data transparency
Process Access Transparency
Local resources accessed with same mechanisms as
remote resources Benefit: user doesnt need to worry whats local and whats not NFS, RPC are process access transparent WWW is not process access transparent
Location Transparency
Where resources are located is invisible
Benefit: resources can be moved without disruption
RPC can be location transparent WWW is not location transparent
Name Transparency
A given name has the same meaning throughout the
distributed system Benefit: same name gets to same resource from anywhere Fully qualified WWW names are name transparent /tmp in most distributed FSes is not
Control Transparency
Control of system resources is transparent to its
users (e.g., remote processes controlled like local) Benefit: easier control of distributed applications Locus provides control transparency on processes Typical UNIX network of workstation does not provide it on processes
Execution Transparency
Allows processes to execute on any machine in
system (and more, perhaps) Benefit: easier handling of distributed applications, load balancing Java is execution transparent (not load balancing, though) NFS provides no execution transparency
Performance Transparency
Users dont notice difference when something must
be done remotely Benefit: if achievable, frees user of worrying about costs of going remote NFS has high degree of performance transparency WWW often does not
Benefits of Transparency
Easier software development
Support for incremental changes
Potentially better reliability Simpler user model
Flexibility in resource location
Support for scaling
When can you provide transparency?
In applications (especially databases)
In programming languages
In operating system itself
When dont you want transparency?
When its too complex to provide E.g., heterogeneous systems When you want particular resources E.g., /tmp when remote performance is terrible E.g., over very slow links Must be able to bypass transparency
Heterogeneity
How transparent should heterogeneous networks
be? And at what cost? Generally, how does the network deal with heterogeneity?
Types of Heterogeneity
Computer heterogeneity
Network heterogeneity
Operating system heterogeneity
Computer Heterogeneity
Handling different types of computers
Most IPC mechanism easier if machines are
homogeneous
Easier sharing of certain kinds of data
Technology trends towards homogeneity But that can change
Network Heterogeneity
Handling different types of networks E.g., Ethernet vs. Appletalk Dominance of IP making network interoperability a
reality But problems remain with differing network performances
OS Heterogeneity
Different OSes are not generally prepared to work
together Prevents easy load sharing, migration of tasks Microsoft wants to crush this form of heterogeneity
Solutions to Heterogeneity problems
Enforced coherence Happening at de facto level High level standards E.g., external data representations Bridges Largely an unsolved problem