Storage Area Network
Storage Area Network
Storage Area Network
I was actually going to write an article about object storage (which is the current
hottest storage option in the cloud). But before going and discussing that part of
the storage arena, I thought its better to discuss the two main storage methods
which co-exists together from a very long time, used by companies internally for
their needs.
The decision of your storage type will depend on many factors like the below
ones.
When you begin your career as a system administrator, you will often hear your
colleagues talking about different storage methods like SAN, NAS, DAS etc. And
without a little bit of digging, you are bound to get confused with different terms
in storage. The confusion arises often because of the similarities between the
different approaches in storage. The only hard and fast rule to stay up to date in
technical terms, is to keep on reading stuffs (especially concepts behind a certain
technology.)
Today we will be discussing two different methods that defines the structure of
storage in your environment. Your choice of the two in your architecture should
only depend on your use case, and type of data that you store.
By the end of this tutorial, I hope you will have a clear picture about the main two
types of storage methods, and what to select for your need.
The main things that differentiate each of these technologies are mentioned
below.
Let's discuss SAN first and then NAS, and at the end, let's compare each of these
technologies to clear the differences between them.
In such cases(where you need high performance, and fast I/O ) we can use SAN.
SAN is nothing but a high speed network that makes connections between storage
devices and servers.
Traditionally application servers used to have their own storage devices attached
to them. Server's talk to these devices by a protocol known as SCSI(Small
Computer System Interface). SCSI is nothing but a standard used to communicate
between servers and storage devices. All normal hard disks, tape drives etc uses
SCSI. In the beginning the storage needs of a server was fulfilled by a storage
devices that was included inside the server(the server used to talk to those
internal storage device, using SCSI. This is very much similar to how a normal
desktop talks to its internal hard disk.).
Devices like Compact Disk drives are attached to the server(which are part of the
server) using SCSI. The main advantage of SCSI for connecting devices to a server
was its high throughput. Although this architecture is sufficient for low end
requirements, there are few limitations like the below mentioned ones.
The server can only access data on the devices, which are directly attached to it.
If something happens to the server, access to data will fail (because the storage
device is part of the server and is attached to it using SCSI)
There is a limit in the number of storage devices the server can access. In case the
server needs more storage space, there will be no more space that can be
attached, as the SCSI bus can accommodate only a finite number of devices.
Also the server using the SCSI storage has to be near the storage device(because
parallel SCSI, which is the normal implementation in most computer's and servers,
has some distance limitations. It can work up to 25 meters.)
Some of these limitations can be overcame using DAS (Directly Attached Storage).
The media used to directly connect storage to the server can be any one of SCSI,
Ethernet, Fiber channel etc.). Low complexity, Low investment, Simplicity in
deployment caused DAS to be adopted by many for normal requirement's. The
solution was good even performance wise, if used with faster mediums like fiber
channel.
Even an external USB drive attached to a server is also a DAS(well conceptually its
DAS, as its directly attached to the server's USB bus). But USB drives are normally
not used due to the speed limitation of USB bus. Normally for heavy and large DAS
storage solutions, the media used are SAS(Serially attached SCSI). Internally the
storage device can use RAID(which normally is the case) or anything to provide
storage volumes to servers. SAS storage options provide 6Gb/s speed these days.
To the server, a DAS storage will appear very much similar to its own internal drive
or an external drive that you plugged in.
Although DAS is good for normal needs and gives good performance, there are
limitations like the number of servers that can access it. Storage device, or say
DAS storage has to be near to the server (in the same rack or within the limits of
the accepted distance of the medium used.).
It can be argued that, directly attached storage(DAS) is faster than any other
storage methods. This is because it does not involve any overhead of data transfer
over the network (all data transfer occurs on a dedicated connection between the
server and the storage device. Mostly its Serially attached SCSI or SAS). However
due to latest improvement's in fiber channel and other caching mechanism's, SAN
also provides better speed's similar to DAS, and in some cases, it surpasses the
speed provided by a DAS.
Before getting inside SAN, let's understand several media types and methods that
are used to interconnect storage devices(when i say storage devices, please dont
consider it as one single hard disk. Take it as an array of disk's, probably in some
RAID level. Consider it as something like Dell's MD1200).
what is SAS(Serially Attached SCSI), FC(Fibre Channel), and iSCSI (Internet Small
Computer System Interface)?
Traditionally the SCSI devices like the internal hard disk's are connected to a
shared parallel SCSI bus. This means all devices attached, will be using the same
bus to send/receive data. But shared parallel connections are not good for high
accuracy, and create issues during high speed transfers. However a serial
connection between the device and the server can increase the overall
throughput of the data transfer. SAS connections between storage devices and
servers uses a dedicated 300 MB/Sec per disk. Think of SCSI bus that shares the
same speed for all devices connected.
SAS uses the same SCSI commands to send and receive data from a device. Also
please do not think that SCSI is only used for internal storage. It is also used for
external storage device to be connected to the server.
If data transfer performance and reliability is the choice, then using SAS is the
best solution. In terms of reliability and error rate SAS disks are much better
compared to the old SATA disks. SAS was designed by keeping performance in
mind, due to which it is full-duplex. This means, data can be send and received
simultaniously from a device using SAS. Also a single SAS host port can connect to
multiple SAS drives using expanders. SAS uses point to point data transfer by
using serial communication between devices (storage device, like disk drives &
disk array's) and hosts.
The first generation of SAS provided around 3Gb/s of speed. The second
generation of SAS improved this to 6Gb/s. And the third generation (which is
currently used by many organization's for extremly high throughput) improved this
to 12Gb/s.
The components of making a fiber channel connection are mentioned below. The
below requirement is very minimal to achieve a point to point connection.
Typically this can be used for a direct connection between a storage array and a
host.
An HBA (Host Bus Adapter) with Fiber channel port
Driver for the HBA card
Cables to interconnect devices in HBA fiber channel port
You might be thinking that why do we need this mapping and re-mapping, why
cant we directly use SCSI to deliver data. Its because SCSI cannot deliver data to
greater distances to large number of devices (or large number of hosts).
Fiber cannel can be used to interconnect systems as far as 10KM (if used with
optical fibers. You can increase this distance by having repeaters in between). And
you can also transfer data to an extent of 30m using a copper wire for lower cost
in fiber cannel.
With the emergence of fiber channel switches from variety of major vendors,
connecting many large number of storage devices and servers have now become
an easy task(provided you have the budget to invest). The networking ability of
fiber channel led to the advanced adoption of SAN(Storage Area Networks) for
faster, long distance, and reliable data access. Most of the high computing
environment's(which requires fast and large volume data transfers) uses fiber
channel SAN with optical fiber cables.
The current fiber channel standard (called as 16GFC) can transmit data at the rate
of 1600MB/s(dont forget the fact that this standard was released in 2011). The
upcoming standards in the coming years are expected to provide 3200MB/s and
6400MB/s speed.
iSCSI is nothing but an IP based standard for interconnecting storage arrays and
hosts. It is used to carry SCSI traffic over IP networks. This is the simplest and
cheap solution(although not the best) to connect to a storage device.
This is a nice technology for location independent storage. Because it can
establish connection to a storage device using local area networks, Wide area
network. Its a Storage Area Network interconnection standard. It does not require
special cabling and equipments like the case of a fiber channel network.
To the system using a storage array with iSCSI, the storage appears as a locally
attached disk. This technology came after fiber channel and was widely adopted
due to it low cost.
Its a networking protocol which is made on top of TCP/IP. You can guess that its
not at all good performance wise, when compared with fiber channel(simply
because everything is running over TCP with no special hardware and
modifications to your architecture.)
iSCSI introduces a little bit of CPU load on the server, because the server has to do
the extra processing for all storage requests over the network, with the regular
TCP.
iSCSI introduces a little bit more latency compared to fiber channel, due to the
overhead of IP headers
Database applications have small read and write operations, which when done on
iSCSI will introduce more latency
iSCSI when done on the same LAN, which contains other normal traffic (other
infrastructure traffic other than iSCSI), it will introduce a read/write lag or say low
performance.
The maximum speed/bandwidth is limited to your ethernet and network speed.
Even if you aggregate multiple links, it does not scal to the level of a fiber channel.
The simplest definition of NAS is "Any server that shares its own storage with
others on the network and acts as a file server is the simplest form NAS".
Please make a note of the fact that Network Attached Storage shares files over
the network. Not storage device over the network.
NAS will be using an ethernet connection for sharing files over the network. The
NAS device will have an IP address, and then will be accessible over the network
through that IP address. When you access files on a file server on your windows
system, its basically NAS.
The main difference is in how your computer or the server treats a particular
storage. If the computer treats a storage as part of itself(similar to how you attach
a DAS to your server), in other words, if the server's processor is responsible for
managing the storage attached, it will be some sort of DAS. And if the
computer/server treats the storage attached as another computer, which is
sharing its data through the network, then its a NAS.
Directly attached storage(DAS) can be viewed as any other peripheral device like
mouse keyboard etc. Because to the server/computer, its a directly attached
storage device. However NAS is another server, or say an equipment having its
own computing features that can share its own storage with others.
Even SAN storage can also be considered as an equipment that has its own
processing/computing power. So the main difference between NAS, SAN and DAS
is how the server/computer accessing it sees. A DAS storage device appears to the
server as part of itself. The server sees it as its own physical part. Although the
DAS storage device might not be inside the server(its normally another device
with its own storage array), the server sees it as its own internal part(DAS storage
appears to the server as its own internal storage)
When we talk about NAS, we need to call them shares rather than storage
devices. Because NAS appears to a server as a shared folder instead of a shared
device over the network. Please do not forget the fact that NAS devices are
computers in themselves, who can share their storage space with others. When
you share a folder with access control using SAMBA, its NAS.
Although NAS is a cheaper option for your storage needs. It really does not suit
for an enterprise level high performance application. Never ever think of using a
database storage (which needs to be high performing) with a NAS. The main
downside of using NAS is its performance issue, and dependency on
network(most of the times, the LAN which is used for normal traffic is also used
for sharing storage with NAS, which makes it more congested)
When you share an export with NFS over the network, its also a form of NAS.
So NAS is file level storage(Because its basically a file sharing technology). This is
because it hides the actual file system under the hood. It gives the users an
interface to access its shared storage space using NFS, or CIFS.
Advantages of NAS
Disadvantages of NAS
NAS is slow
Lowever throughput and high latency, due to which it cannot be used for high
performance applications
Now let's get back to our discussion of SAN(Storage area network) which we
started earlier in the beginning.
The first and foremost thing to understand about SAN (apart from the things we
already discussed in the beginning) is the fact that its a block level storage
solution. And SAN is optimized for high volume of block level data transfer. SAN is
performs best when used with fiber channel medium (optical fibers, and a fiber
channel switch )
Both NAS and SAN solves the problem of keeping the storage device nearer to the
server accessing it(which was the case with DAS). A SAN storage can be alloted to
a server, which in tern can share it with other's using NAS. Please do not forget
the fact that the underlying disks on a DAS, NAS and a SAN can be in any form of a
RAID (what makes the real difference is how the server access these
storage devices, using which protocol and media).
The name Storage Area Network itself implies that the storage resides in its own
dedicated network. Hosts can attach the storage device to itself using either Fiber
channel, TCP/IP network (SAN uses iSCSI when used over tcp/ip network).
SAN can be considered as a technology that combines the best features of both
DAS and NAS. If you remember, DAS appears to the computer as its own storage
device, and is known for good speed, DAS is also a block level storage solution(if
you remember, we never talked of CIFS or NFS during DAS). NAS is known for its
flexibility, primary access through network, access control etc. SAN combines the
best features of both of these worlds together because....
SAN storage also appears to the server as its own storage device
Its a block level storage solution
Good performance/speed
Networking features using iSCSI
SAN and NAS are not competing technologies, but were designed for different
needs and purposes. As SAN is a block level storage solution, its best suited for
high performance data base storage, email storage etc. Most modern SAN
solutions provide, disk mirroring, archiving backup and replication features as
well.
SAN is a dedicated network of storage devices(can include tape drives storages,
raid disk arrays etc) all working together to provide an excellent block level
storage. While NAS is a single device/server/computing appliance, sharing its own
storage over the network.
SAN NAS