Storage1
Storage1
Introduction
• The popular interface protocols used for host to storage communications are IDE/ATA,
SCSI
IDE/ATA & SATA
• Integrated device electronics (IDE)/ Advanced Technology Attachment (ATA) is a popular
interface protocol standard used for connecting storage devices.
• This protocol supports parallel transmission and therefore is also known as Parallel ATA (PATA)
or simply ATA.
• IDE/ATA has a variety of standards and names. The Ultra DMA/133 version of ATA supports a
throughput of 133 MB per second
• High performance, high capacity and low-cost SATA has largely replaced PATA in the newer
systems. SATA revision 3.0 provides data transfer rate up to 600MB/s.
IDE/ATA
ATA cable
https://www.youtube.com/watch?v=myU2x27FIIc&t
=262s
SATA
cable
SATA HD
SCSI and SAS
• SCSI has emerged as a preferred connectivity protocol in high-end
computers.
• This protocol supports parallel transmission and offers improved
performance, scalability, and compatibility compared to ATA.
• However, the high cost associated with SCSI limits its popularity among
home or personal desktop users
• Serial attached SCSI (SAS) is a serial protocol that provides an alternative to
parallel SCSI
• It is more expensive, less storage capacity, generally faster than SATA.
Therefore, it is not widely used in personal use
• Spinning disk platters with a rotational speed of 10,000 or 15,000 rpm
• Typically have 25% of the capacity of SATA disks
•
Storage Options
• Magnetic Tapes
• Optical disks
Storage Options
• Magnetic Tapes
• Hard disks
• Optical disks
Magnetic Tapes
• They are low cost option for long term data storage
• Preferred option for backup and archiving
• Limitations:
• Sequential data access: Search and retrieval of data are done sequentially
• Physical wear and tear: The read/write head touches the tape surface, so
the tape degrades or wears out after repeated use
Optical Disks
• Optical disks have limited capacity and speed, which limit the use of optical
media as a business data storage solution
• The capability to write once and read many (WORM) is one advantage of optical
disk storage. Optical disks, to some degree, guarantee that the content has not
been altered
Hard Disks
• Hard disks are the most popular storage medium used in modern computers
for storing and accessing data for performance-intensive, online applications.
• Disks support rapid access to random data locations. This means that data
can be written or retrieved quickly for a large number of simultaneous users
or applications.
• They have a large capacity. Disk storage arrays are configured with multiple
disks to provide increased capacity and enhanced performance
• Two available types of hard disks:
• Mechanical drives
• SSD disks
Mechanical Disk Drives Components
Mechanical Disk Drives Components
Platter:
• A typical HDD consists of one or more flat circular disks called platters. The data
is recorded on these platters in binary codes (0s and 1s).
• Data can be written to or read from both surfaces of the platter. The number of
platters and the storage capacity of each platter determine the total capacity of
the drive.
Spindle:
• Spindle connects all the platters and is connected to a motor. The motor of the
spindle rotates with a constant speed.
• The disk platter spins at a speed of several thousands of revolutions per minute
(rpm). Common spindle speeds are 5,400 rpm, 7,200 rpm, 10,000 rpm, and
15,000 rpm. The speed of the platter is increasing with improvements in
technology
Mechanical Disk Drives Components
Read/write head:
• It reads and writes data from the hard drive's disk platter. Hard drives usually
have one read/write head for each platter side
• It never touches the surface of the platter. When the spindle is rotating, there is a
microscopic air gap maintained between the R/W heads and the platters, known as the
head flying height.
• This air gap is removed when the spindle stops rotating and the R/W head rests on a
special area on the platter called the landing zone.
• Heads are moved to the landing zone before they touch the surface. If the drive
malfunctions and the R/W head accidentally touches the surface of the platter outside the
landing zone, a head crash occurs resulting in data loss
Mechanical Disk Drives Components
• The R/W heads for all platters on a drive are attached to one actuator arm assembly and
move across the platters simultaneously
Track:
• Data on the disk is recorded on tracks, which are concentric rings on the platter around
the spindle. The tracks are numbered, starting from zero, from the outer edge of the
platter
• A cylinder is a set of identical tracks on both surfaces of each drive platter. The location of
R/W heads is referred to by the cylinder number, not by the track number
Mechanical Disk Drives Components
Sectors:
• Each track is divided into smaller units called sectors. A sector is the smallest
addressable unit of storage.
• A sector holds 512 bytes. Some disks can be formatted with larger sector sizes.
• The track and sector structure is written on the platter by the drive manufacturer
using a formatting operation
Solid State Disk Drives
• SSDs are new generation drives that deliver ultra-high performance required by
performance-sensitive applications.
• They don’t have mechanical parts, they use semiconductor-based memory to store
and retrieve data. SSDs are connected using a standard SAS disk interface
• SSD’s main advantage is performance. SSDs have no moving parts, so data can be
accessed much faster than using mechanical disks
• SSDs drives deliver a high number of IOPS with very low response times. Also, being
a semiconductor-based device, SSDs consume less power, compared to mechanical
drives
• Main disadvantage is the price of SSD compared to mechanical hard drives
REDUNDANT ARRAY OF
INDEPENDENT DISKS
(RAID)
RAID
• Redundant Array of Independent Disks (RAID) solutions can provide high availability
of data and/or improvements of performance through combining multiple disk drives.
• RAID is an enabling technology that leverages multiple drives as part of a set that
provides data protection against drive failures. In general, RAID implementations
also improve the storage system performance by serving I/Os from multiple disks
simultaneously
• RAID can be implemented in several configurations, called RAID levels, each with
their own pros and cons
• RAID 0 - Striping
• RAID 1 - Mirroring
• RAID 10 - Striping and Mirroring
• RAID 3- Striping with dedicated parity
• RAID 5 - Striping with distributed parity
• RAID 6 - Striping with distributed double parity
RAID 0
RAID 0
• RAID 0 (also known as striping) configuration uses data striping techniques,
where data is striped across all the disks within a RAID set.
• RAID 0 is a good option for applications that need high I/O throughput
• RAID 0 uses multiple disks, each with a part of the data on it. When data is read,
part of the data comes from one disk, another part from another disk, effectively
doubling the read performance.
• The write performance is faster than using a single disk as well, as different data
blocks are written to the disks in parallel.
• RAID 0 actually lowers availability – if one of the disks in a RAID 0 set fails, all
data is lost.
RAID 1
RAID 1
• RAID 1 ( also called mirroring) is based on the mirroring technique. In this
RAID configuration, data is mirrored to provide fault tolerance.
• A RAID 1 set consists of two disk drives and every write is written to both
disks. If one disk fails, data is not lost as it is still available on the mirror
disk.
• The write performance is a bit slower, since writes are only finished after
the data is written on both disks
• RAID 1 is thought to be the most reliable RAID level, but its price is
relatively high – 50% of the disks are used for redundancy only
RAID 10
RAID 10
• This RAID type requires an even number of disks, the minimum being
four
• Only 50% of the disk space is used (the rest of the disk space is used for
mirroring).
• RAID 3 stripes data for performance and uses parity for fault tolerance.
• For example, in a set of five disks, four are used for data and one for
parity.
• It requires at least 3 disks. Still work fine if one drive fails. Cannot
tolerate more than 1 disk failure
RAID 5
RAID 5
• RAID 5 uses striping with distributed parity. Data is written in disk blocks
on all disks in parallel (like RAID 0 striping), and a parity block of the
written disk blocks is stored as well.
• RAID 5 spreads the parity blocks over the available disks to overcome
the write bottleneck of a dedicated parity disk
• It requires at least 3 disks. Still work fine if one drive fails. Cannot
tolerate more than 1 disk failure
RAID 6
RAID 6
• RAID 6 works the same way as RAID 5, except that RAID 6 includes a
second parity element to enable survival if two disk failures occur in a
RAID set.
• RAID 5 writes perform better than RAID 6. The rebuild operation in RAID
6 may take longer than that in RAID 5 due to the presence of two parity
sets
RAID Impact on Performance
RAID Impact on Performance
• After computing the new parity, the controller completes the write I/O
by writing the new data and the new parity onto the disks, amounting to
two write I/Os. Therefore, the controller performs two disk reads and
two disk writes for every write operation, and the write penalty is 4
• In RAID 6, which maintains dual parity, a disk write requires three read
operations: two parity and one data
• After calculating both new parities, the controller performs three write
operations: two parity and new data
• For example, if drives of different capacities are mixed in a RAID set, the
capacity of the smallest drive is used from each disk in the set to make
up the RAID set’s overall capacity. The remaining capacity of the larger
drives remains unused.
• Likewise, mixing higher revolutions per minute (RPM) drives with lower
RPM drives lowers the overall performance of the RAID set.
HOST ACCESS TO STORAGE
Host Access to Storage
Host Access to Storage
• The storage device can be internal and (or) external to the host.
• In either case, the host controller card accesses the storage devices
using predefined protocols, such as IDE/ATA, SCSI, or Fibre Channel (FC)
• IDE/ATA and SCSI are popularly used in small and personal computing
environments for accessing internal storage
• FC and iSCSI protocols are used for accessing data from an external
storage device
• External storage devices can be connected to the host directly or
through the storage network
• Data can be accessed over a network in one of the following ways:
block level, file level, or object level
File system
• A file system controls how and where data on a storage device are accessed,
stored and organized. Data in file systems are organized into folders or directories.
• Without a file system, data placed in a storage medium would be one large body of
data with no way to tell where one piece of data stops and the next begins.
• File systems provide naming for files, metadata (size, type, directory hierarchy,
access type) and file location
• Files on storage device are kept in sectors. It's the file system that identifies the
size and position of the files as well as which sectors are ready to be used.
• In general, the application requests data from the file system by specifying the
filename and location. The file system maps the file attributes to the logical block
address of the data and sends the request to the storage device. The storage
device converts the logical address to a cylinder-head-sector (CHS) address and
fetches the data
• Common Windows file system are NTFS, FAT. Common file systems for file sharing
are SMB/CIFS for windows and NFS for Unix/Linux
Block Level Storage
• In block level access, raw disks are assigned to the host for creating a file
system.
• Block-level storage protocols like iSCSI, Fibre Channel and FCoE (Fibre
Channel over Ethernet) are utilized to make the storage blocks visible and
accessible by the server-based operating system
• It provides higher performance and low latency than file level access
File Level Storage
• In a file-level access, the file system is created at the storage side, and the
file-level request is sent over a network
• This method has higher overhead at the storage side, as compared to the
data accessed at the block level
• The storage disk is configured with a protocol such as NFS or SMB/CIFS and
the files are stored and accessed from it in bulk
• File level storage is highly scalable and sharable between different users
• Unlike file or block storage, object storage does not use a hierarchy or directory
tree. Instead, every distinct unit of data exists at the same level in a storage pool.
• The main difference between the other concepts is that the objects are managed
via the application itself that supports Object storage. That means that no real file
system is needed here. This layer is obsolete.
• But it is a good fit for data that doesn't change much like backups, archives.
• Or, for example, storage that holds vast amounts of video or movies that
are only watched but not changed like for example online movie streaming
sites or videos on YouTube
Block, File, Object Level Storage