Chapter 12
Distributed File System
• Prepared by : Zainab zafari
• Instructor : Aslamzai
A Distributed File System:
How To Design A DFS ?
• Some design considerations:
Providing location transparency
Stateful or stateless server?
Failure handling
Cashing
Cash consistency
Location transparency:
• Location transparency: the clients are unaware of server’s location
How do you name a remote file?
Is the name structure different from the local file?
Do you include the name of the server in the file name?
Existing file systems have used both design
Failures:
• Server crash (failstop failures):
All data that is in server memory will lost
The server may write to the local disk and send acknowledgement to the client, but
the data might not have made it to disk, because it is still in OS buffer or in disk
buffer.
• Message loss(omission failures):
Usually take care of by underlying communication protocols (ex RPC)
Stateless And Stateful Servers
• Stateless server: losses state in the case of crash
A stateless server looks to clients like a slow server
Simple server design
Quick recovery after reboot
Clients need to maintain state: if the server crashed, the client does not know
if the operation succeeded, so it must retry
• Stateful server: remember it’s state and recovers
Complex server design yet simple client design
Longer server recovery time
Cashing:
• File access exhibit temporal locality
If a file has been accessed it will likely been accessed soon again
• Cashing: keeping a recently accessed copy of a file(or part of file) close to where
it is accessed
• Cashing can be done at various points in DFS:
Cashing in client’s memory
Cashing in client’s disk
Cashing in server’s memory
Cashing in server’s disk
Cashing and consistency:
• Consistency is about insuring that the copy of file is up-to-date
• Cashing making consistency an issue: the file may be modified in cashe, but not
in the it’s true resource
• DFS should maintain consistency to prevent data loss
When the cashe modifies the file in it’s cash, the client copy becomes different to the
server copy
We care about consistency, because if the client crashes the data may be lost
• DFS should maintain consistency to facilitate file sharing
Client A and B share a file
Client A modifies files in it’s local cash, client B sees out-of-date of copy of the file
Consistency protocols :
• Consistency protocols determines when the modified data is propagated to it,s source.
• Consistency protocols answer the question, how do we maintain a file copies consistent?
• We have two design schemes:
Write-through: instant propagation
Write –back: delayed propagation(lazy propagation)
design schemes:
• Write-through:
A client propagates dirty data to the server as soon as data is written
Reduce the risk of data loss on crash
May result in a large number of protocol messages
• Write-back:
a client propagate dirty file data when file is closed or after a delay(i.e. 30 seconds)
Higher risk of data loss
Smaller number of protocol message
Consistency protocol and file sharing:
• When files are shared among clients, one client may modify the data, causing the
data may become inconsistent
• Data consistency should be validated
• Approaches: client validations, server validations
• Client validation:
• Client contact the server for validation
• Server validation:
• Server notify the clients when the cashed data is out-of-date
Granularity of Data Access:
• Block granularity: the file is transferred block by block
• Cashe consistency is done on block –by-block access: consistency protocol may
generate many messages
• File granularity: the file is transferred as a whole
• If you have a large file and you do not use whole file, you waste resources by
transferring whole file
• Cashe consistency is done on whole file granularity- so there is fewer consistency
messages.
DFS Usage Pattern:
• A study by Mary Baker:
Most files are small
Read operation are much more frequent than write operations
Most accesses are sequential than, random access is rare
Files are mostly read in their entirety
Data in files tend to be overwritten often
Most files are read and written by one user
When user share a file, typically only one user can modify
File reference show substantial temporal locality
Designing A Good DFS:
Introduction to NFS (SUN Network):
• NFS: Network File System
• Use in many organization
• Developed by sun, implemented over sun RPC, can use either TCP or UDP
• Key Features:
Access and location transparency( even inside the kernel)
Block-granularity of file access and cashing
Delayed update propagation
Client validation
Stateless server
Access transparency in NFS:
VFS is a software
layer that redirect
file system call to
the right file system
(such as:NFS)
VFS provide access transparency
at user level and inside kernal
Block granularity of file access and cashing:
• NFS uses VFS’s Cashe
• VFS cashing is a block granularity, so NFS files are accessed and cashed on a
block granularity
• Typical block size is 8KB
• Files are cashed on the client only in memory
Server cashing and failure mode:
• NFS server cashes files in it’s local VFS cash
• Writes to the local file system use delayed propagation:
• Data is written in to the VFS cashe
NFS Client/Server Interaction in the presence of
server cashing:
• Option one :client writes data to the server. Server send acknowledgement to the
client after writing data to the local file cashe, not to disk
• Advantage: fast response to client
• Disadvantage: if the server crashes before the data is written to disk, the data is lost
• Option two: client writes data to the server. Server synch data to the disk, then
responds to client
• Advantage: the data not lost if server crashes
• Disadvantage: each write operation takes longer to complete, this can limit server’s
scalability
NFS Statelessness
• NFS server is stateless
• It forgot it’s state when it crashes, local file system recovers local file system state
if it was made inconsistent because of crashes
• The client keeps retrying the operations until the server reboots
Thank you for your attention…