0% found this document useful (0 votes)
43 views

Distributed File Systems-2

A distributed file system (DFS) distributes files across multiple servers and locations, making files accessible from anywhere as if they were local. A DFS provides benefits like transparent local access, location independence, massive scaling, and fault tolerance. It allows teams distributed globally to collaborate by sharing files from any location.

Uploaded by

onele mabhena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Distributed File Systems-2

A distributed file system (DFS) distributes files across multiple servers and locations, making files accessible from anywhere as if they were local. A DFS provides benefits like transparent local access, location independence, massive scaling, and fault tolerance. It allows teams distributed globally to collaborate by sharing files from any location.

Uploaded by

onele mabhena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Distributed File Systems

What Is a Distributed File System?


A distributed file system (DFS) is a file system that spans across multiple file servers or
multiple locations, such as file servers that are situated in different physical places. Files
are accessible just as if they were stored locally, from any device and from anywhere on
the network. A DFS makes it convenient to share information and files among users on
a network in a controlled and authorized way.

Why Is a Distributed File System Important?


The main reason enterprises choose a DFS is to provide access to the same data from
multiple locations. For example, you might have a team distributed all over the world,
but they have to be able to access the same files to collaborate. Or in today’s
increasingly hybrid cloud world, whenever you need access to the same data from the
data center, to the edge, to the cloud, you would want to use a DFS.

A DFS is critical in situations where you need:

 Transparent local access — Data to be accessed as if it’s local to the user for high
performance.
 Location independence — No need for users to know where file data physically
resides.
 Scale-out capabilities — The ability to scale out massively by adding more machines.
DFS systems can scale to exceedingly large clusters with thousands of servers.
 Fault tolerance — A need for your system to continue operating properly even if some
of its servers or disks fail. A fault-tolerant DFS is able to handle such failures by
spreading data across multiple machines.

What Are the Benefits of a DFS?


A distributed file system (DFS) is a file system that is distributed to and stored in multiple
locations, such as file servers that are located in different locales. Files are accessible
just as if they were locally stored, from any device at any location. A DFS makes it
convenient to share information and files among authorized users on a network in a
controlled way.

What Are the Different Types of Distributed File


Systems?
These are the most common DFS implementations:

 Windows Distributed File System

 Network File System (NFS)


 Server Message Block (SMB)

 Google File System (GFS)

 Lustre

 Hadoop Distributed File System (HDFS)

 GlusterFS

 Ceph

 MapR File System

What Are DFS and NFS?


NFS stands for Network File System, and it is one example of a distributed file system
(DFS). As client-server architecture, an NFS protocol allows computer users to view,
store, and update files that are located remotely as if they were local. The NFS protocol
is one of several DFS standards for network-attached storage (NAS).

What Is a Distributed File System in Big Data?


One of the challenges of working with big data is that it is too big to manage on a single
server—no matter how massive the storage capacity or computing power that server
possesses. After a certain point, it no longer makes economic or technical sense to
continue scaling up—to add more and more capacity to that single server. Instead, the
data needs to be distributed across multiple clusters (also called nodes) by scaling out
to make use of the computing power of each cluster. A distributed file system (DFS)
enables businesses to manage the accessing of big data across multiple clusters or
nodes, allowing them to read big data quickly and perform multiple parallel reads and
writes.

How Does a Distributed File System Work?


A distributed file system works as follows:

 Distribution: First, a DFS distributes datasets across multiple clusters or nodes. Each
node provides its own computing power, which enables a DFS to process the datasets
in parallel.
 Replication: A DFS will also replicate datasets onto different clusters by copying the
same pieces of information into multiple clusters. This helps the distributed file system to
achieve fault tolerance—to recover the data in case of a node or cluster failure—as well
as high concurrency, which enables the same piece of data to be processed at the
same time.

What Is Distributed File System Replication?


DFS replication is a multiple-master replication engine in Microsoft Windows Server that
you can use to synchronize folders between servers on limited bandwidth network
connections. As the data changes in each replicated folder, the changes are replicated
across connections.

Where Is a Distributed File System Located?


The goal of using a distributed file system is to allow users of physically distributed
systems to share their data and resources. As such, the DFS is located on any
collection of workstations, servers, mainframes, or a cloud connected by a local area
network (LAN).

Why Is a Distributed File System Required?


The advantages of using a DFS include:

 Transparent local access — Data is accessed as if it’s on a user’s own device or


computer.
 Location independence — Users may have no idea where file data physically resides.
 Massive scaling — Teams can add as many machines as they want to a DFS to scale
out.
 Fault tolerance — A DFS will continue to operate even if some of its servers or disks
fail because machines are connected and the DFS can gracefully failover.

Cohesity and Distributed File Systems


To effectively consolidate storage silos, enterprises need a distributed file system (DFS)
that can manage multiple use cases simultaneously. It must provide standard NFS,
SMB, and S3 interfaces, strong IO performance for both sequential and random IO, in-
line variable length deduplication, and frequent persistent snapshots.
It also must provide native integration with the public cloud to support a multicloud data
fabric, enabling enterprises to send data to the cloud for archival or more advanced use
cases like disaster recovery, agile dev/test, and analytics.

All of this must be done on a web-scale architecture to manage the ever-increasing


volumes of data effectively.

To enable enterprises to take back control of their data at scale, Cohesity has built a
completely new file system: SpanFS. SpanFS is designed to effectively consolidate and
manage all secondary data, including backups, files, objects, dev/test, and analytics
data, on a web-scale, multicloud platform that spans from core to edge to cloud.

With Cohesity SpanFS, you can consolidate data silos across locations by uniquely
exposing industry-standard, globally distributed NFS, SMB, and S3 protocols on a single
platform.
These are among the top benefits of SpanFS:

 Unlimited scalability — Start with as little as three nodes and grow limitlessly on-
premises or in the cloud and pay-as-you-grow.
 Automated global indexing — Perform powerful global actionable wildcard searches
for any virtual machine (VM), file, or object.
 Guaranteed data resiliency — Maintain strict consistency across nodes within a
cluster to ensure data resiliency at scale.
 Dedupe across workloads and clusters — Reduce your data footprint with global
variable-length dedupe across workloads and protocols.
 Cloud-ready — Use the Cohesity Helios multicloud data platform to eliminate
dependency on bolt-on cloud gateways.
 Multiprotocol access — Seamlessly read and write to the same data volume with
simultaneous multiprotocol access for NFS, SMB, and S3.

You might also like