0% found this document useful (0 votes)
27 views49 pages

Storage1

Uploaded by

Lena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views49 pages

Storage1

Uploaded by

Lena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

DATA STORAGE

Introduction

• Every day, approximately 15


petabytes of new
information is generated
worldwide
• Data generated from social
media, e-commerce, patient
records, web portals, etc
need to be stored managed
retrieved and shared
• The total amount of digital
data doubles approximately
Storage Model

• Most servers use external


storage, sometimes
combined with internal
storage

• A model of storage building


blocks is shown on the right
STORAGE OPTIONS
Disk Controllers
• The circuits that control data transfer to and from the
disk drive (floppy disk, hard disk, optical disc)
• When the computer wants to transfer data to or from
the disk, it tells the disk controller. The controller in
turn sends electronic commands to the disk drive
making the disk spin and move its magnetic heads to
the proper location on the disk.
• The controller then transfers the data between the
computer and the disk drive.
• A protocol enables communication between the host and storage. Protocols are
implemented using controllers at both source and destination.

• The popular interface protocols used for host to storage communications are IDE/ATA,
SCSI
IDE/ATA & SATA
• Integrated device electronics (IDE)/ Advanced Technology Attachment (ATA) is a popular
interface protocol standard used for connecting storage devices.

• This protocol supports parallel transmission and therefore is also known as Parallel ATA (PATA)
or simply ATA.

• IDE/ATA has a variety of standards and names. The Ultra DMA/133 version of ATA supports a
throughput of 133 MB per second

• The serial version of this protocol is known as Serial ATA (SATA).

• High performance, high capacity and low-cost SATA has largely replaced PATA in the newer
systems. SATA revision 3.0 provides data transfer rate up to 600MB/s.
IDE/ATA
ATA cable

https://www.youtube.com/watch?v=myU2x27FIIc&t
=262s
SATA
cable

SATA HD
SCSI and SAS
• SCSI has emerged as a preferred connectivity protocol in high-end
computers.
• This protocol supports parallel transmission and offers improved
performance, scalability, and compatibility compared to ATA.
• However, the high cost associated with SCSI limits its popularity among
home or personal desktop users
• Serial attached SCSI (SAS) is a serial protocol that provides an alternative to
parallel SCSI
• It is more expensive, less storage capacity, generally faster than SATA.
Therefore, it is not widely used in personal use
• Spinning disk platters with a rotational speed of 10,000 or 15,000 rpm
• Typically have 25% of the capacity of SATA disks

Storage Options
• Magnetic Tapes

• Hard disks. There are two Types of hard disks:


• Mechanical hard disks
• Solid state drive (SSD)

• Optical disks
Storage Options
• Magnetic Tapes

• Hard disks

• Optical disks
Magnetic Tapes
• They are low cost option for long term data storage
• Preferred option for backup and archiving
• Limitations:
• Sequential data access: Search and retrieval of data are done sequentially

• Single application access at a time: In a shared computing environment,


data stored on tape cannot be accessed by multiple applications
simultaneously, restricting its use to one application at a time

• Physical wear and tear: The read/write head touches the tape surface, so
the tape degrades or wears out after repeated use
Optical Disks

• They are popular in small, single-user computing environments.

• It is frequently used by individuals to store photos or as a backup medium on


personal or laptop computers. It is also used as a distribution medium for small
applications.

• Optical disks have limited capacity and speed, which limit the use of optical
media as a business data storage solution

• The capability to write once and read many (WORM) is one advantage of optical
disk storage. Optical disks, to some degree, guarantee that the content has not
been altered
Hard Disks

• Hard disks are the most popular storage medium used in modern computers
for storing and accessing data for performance-intensive, online applications.
• Disks support rapid access to random data locations. This means that data
can be written or retrieved quickly for a large number of simultaneous users
or applications.
• They have a large capacity. Disk storage arrays are configured with multiple
disks to provide increased capacity and enhanced performance
• Two available types of hard disks:
• Mechanical drives
• SSD disks
Mechanical Disk Drives Components
Mechanical Disk Drives Components
 Platter:
• A typical HDD consists of one or more flat circular disks called platters. The data
is recorded on these platters in binary codes (0s and 1s).
• Data can be written to or read from both surfaces of the platter. The number of
platters and the storage capacity of each platter determine the total capacity of
the drive.
 Spindle:
• Spindle connects all the platters and is connected to a motor. The motor of the
spindle rotates with a constant speed.
• The disk platter spins at a speed of several thousands of revolutions per minute
(rpm). Common spindle speeds are 5,400 rpm, 7,200 rpm, 10,000 rpm, and
15,000 rpm. The speed of the platter is increasing with improvements in
technology
Mechanical Disk Drives Components
 Read/write head:
• It reads and writes data from the hard drive's disk platter. Hard drives usually
have one read/write head for each platter side

• It never touches the surface of the platter. When the spindle is rotating, there is a
microscopic air gap maintained between the R/W heads and the platters, known as the
head flying height.

• This air gap is removed when the spindle stops rotating and the R/W head rests on a
special area on the platter called the landing zone.

• Heads are moved to the landing zone before they touch the surface. If the drive
malfunctions and the R/W head accidentally touches the surface of the platter outside the
landing zone, a head crash occurs resulting in data loss
Mechanical Disk Drives Components

 Actuator Arm Assembly:


• Each platter has two R/W heads, one for each surface. R/W heads are mounted on the
actuator arm assembly , which positions the R/W head at the location on the platter where
the data needs to be written or read

• The R/W heads for all platters on a drive are attached to one actuator arm assembly and
move across the platters simultaneously

 Track:
• Data on the disk is recorded on tracks, which are concentric rings on the platter around
the spindle. The tracks are numbered, starting from zero, from the outer edge of the
platter

• A cylinder is a set of identical tracks on both surfaces of each drive platter. The location of
R/W heads is referred to by the cylinder number, not by the track number
Mechanical Disk Drives Components

 Sectors:
• Each track is divided into smaller units called sectors. A sector is the smallest
addressable unit of storage.

• A sector holds 512 bytes. Some disks can be formatted with larger sector sizes.

• The track and sector structure is written on the platter by the drive manufacturer
using a formatting operation
Solid State Disk Drives
• SSDs are new generation drives that deliver ultra-high performance required by
performance-sensitive applications.

• They don’t have mechanical parts, they use semiconductor-based memory to store
and retrieve data. SSDs are connected using a standard SAS disk interface

• SSD’s main advantage is performance. SSDs have no moving parts, so data can be
accessed much faster than using mechanical disks

• SSDs drives deliver a high number of IOPS with very low response times. Also, being
a semiconductor-based device, SSDs consume less power, compared to mechanical
drives
• Main disadvantage is the price of SSD compared to mechanical hard drives
REDUNDANT ARRAY OF
INDEPENDENT DISKS
(RAID)
RAID
• Redundant Array of Independent Disks (RAID) solutions can provide high availability
of data and/or improvements of performance through combining multiple disk drives.
• RAID is an enabling technology that leverages multiple drives as part of a set that
provides data protection against drive failures. In general, RAID implementations
also improve the storage system performance by serving I/Os from multiple disks
simultaneously
• RAID can be implemented in several configurations, called RAID levels, each with
their own pros and cons
• RAID 0 - Striping
• RAID 1 - Mirroring
• RAID 10 - Striping and Mirroring
• RAID 3- Striping with dedicated parity
• RAID 5 - Striping with distributed parity
• RAID 6 - Striping with distributed double parity
RAID 0
RAID 0
• RAID 0 (also known as striping) configuration uses data striping techniques,
where data is striped across all the disks within a RAID set.

• RAID 0 is a good option for applications that need high I/O throughput

• RAID 0 uses multiple disks, each with a part of the data on it. When data is read,
part of the data comes from one disk, another part from another disk, effectively
doubling the read performance.

• The write performance is faster than using a single disk as well, as different data
blocks are written to the disks in parallel.

• RAID 0 actually lowers availability – if one of the disks in a RAID 0 set fails, all
data is lost.
RAID 1
RAID 1
• RAID 1 ( also called mirroring) is based on the mirroring technique. In this
RAID configuration, data is mirrored to provide fault tolerance.

• A RAID 1 set consists of two disk drives and every write is written to both
disks. If one disk fails, data is not lost as it is still available on the mirror
disk.

• The write performance is a bit slower, since writes are only finished after
the data is written on both disks

• RAID 1 is thought to be the most reliable RAID level, but its price is
relatively high – 50% of the disks are used for redundancy only
RAID 10
RAID 10

• RAID 1+0 combines the performance benefits of RAID 0 with the


redundancy benefits of RAID 1.

• It uses mirroring and striping techniques and combine their benefits.

• This RAID type requires an even number of disks, the minimum being
four

• Only 50% of the disk space is used (the rest of the disk space is used for
mirroring).

• Read performance is high, just like RAID 0, and write performance is a


bit slower, just like RAID 1
Parity
• Parity is a method to protect striped
data from disk drive failure without the
cost of mirroring

• Using XOR operation, parity allows re-


creation of the missing data

• There are some disadvantages of using


parity. Parity information is generated
from data on the data disk.

• Therefore, parity is recalculated every


time there is a change in data.

• This recalculation is time-consuming


and affects the performance of the
RAID array
RAID 3
RAID 3

• RAID 3 stripes data for performance and uses parity for fault tolerance.

• Parity information is stored on a dedicated drive so that the data can be


reconstructed if a drive fails in a RAID set

• For example, in a set of five disks, four are used for data and one for
parity.

• It requires at least 3 disks. Still work fine if one drive fails. Cannot
tolerate more than 1 disk failure
RAID 5
RAID 5

• RAID 5 uses striping with distributed parity. Data is written in disk blocks
on all disks in parallel (like RAID 0 striping), and a parity block of the
written disk blocks is stored as well.

• This parity block is used to automatically reconstruct data in a RAID 5


set in case of a disk failure

• RAID 5 spreads the parity blocks over the available disks to overcome
the write bottleneck of a dedicated parity disk

• It requires at least 3 disks. Still work fine if one drive fails. Cannot
tolerate more than 1 disk failure
RAID 6
RAID 6

• RAID 6 works the same way as RAID 5, except that RAID 6 includes a
second parity element to enable survival if two disk failures occur in a
RAID set.

• Therefore, a RAID 6 implementation requires at least four disks.

• RAID 6 distributes the parity across all the disks.

• The write penalty in RAID 6 is more than that in RAID 5

• RAID 5 writes perform better than RAID 6. The rebuild operation in RAID
6 may take longer than that in RAID 5 due to the presence of two parity
sets
RAID Impact on Performance
RAID Impact on Performance

• When choosing a RAID type, it is imperative to consider its impact on disk


performance

• In both mirrored and parity RAID configurations, every write operation


translates into more I/O overhead for the disks, which is referred to as a write
penalty.

• In a RAID 1 implementation, every write operation must be performed on two


disks configured as a mirrored pair, whereas in a RAID 5 implementation, a
write operation may manifest as four I/O operations.

• Whenever the controller performs a write I/O, parity must be computed by


reading the old parity (Cpold) and the old data (C4 old) from the disk, which
means two read I/Os. Then, the new parity (Cpnew) is computed as follows:

Cpnew= Cpold–C4 old + C4 new (XOR operations)


RAID Impact on Performance

• After computing the new parity, the controller completes the write I/O
by writing the new data and the new parity onto the disks, amounting to
two write I/Os. Therefore, the controller performs two disk reads and
two disk writes for every write operation, and the write penalty is 4

• In RAID 6, which maintains dual parity, a disk write requires three read
operations: two parity and one data

• After calculating both new parities, the controller performs three write
operations: two parity and new data

• Therefore, in a RAID 6 implementation, the controller performs six I/O


operations for each write I/O, and the write penalty is 6
Important Consideration about RAID

• It is highly recommended that the RAID set be created from drives of


the same type, speed, and capacity to ensure maximum usable
capacity, reliability, and consistency in performance.

• For example, if drives of different capacities are mixed in a RAID set, the
capacity of the smallest drive is used from each disk in the set to make
up the RAID set’s overall capacity. The remaining capacity of the larger
drives remains unused.

• Likewise, mixing higher revolutions per minute (RPM) drives with lower
RPM drives lowers the overall performance of the RAID set.
HOST ACCESS TO STORAGE
Host Access to Storage
Host Access to Storage
• The storage device can be internal and (or) external to the host.
• In either case, the host controller card accesses the storage devices
using predefined protocols, such as IDE/ATA, SCSI, or Fibre Channel (FC)
• IDE/ATA and SCSI are popularly used in small and personal computing
environments for accessing internal storage
• FC and iSCSI protocols are used for accessing data from an external
storage device
• External storage devices can be connected to the host directly or
through the storage network
• Data can be accessed over a network in one of the following ways:
block level, file level, or object level
File system
• A file system controls how and where data on a storage device are accessed,
stored and organized. Data in file systems are organized into folders or directories.

• Without a file system, data placed in a storage medium would be one large body of
data with no way to tell where one piece of data stops and the next begins.

• File systems provide naming for files, metadata (size, type, directory hierarchy,
access type) and file location

• Files on storage device are kept in sectors. It's the file system that identifies the
size and position of the files as well as which sectors are ready to be used.

• In general, the application requests data from the file system by specifying the
filename and location. The file system maps the file attributes to the logical block
address of the data and sends the request to the storage device. The storage
device converts the logical address to a cylinder-head-sector (CHS) address and
fetches the data
• Common Windows file system are NTFS, FAT. Common file systems for file sharing
are SMB/CIFS for windows and NFS for Unix/Linux
Block Level Storage
• In block level access, raw disks are assigned to the host for creating a file
system.

• Server deals with these raw disks as local hard drives

• It is the responsibility of host to create the file system

• It is deployed by larger businesses and enterprises in storage area


networks (SANs)

• Block-level storage protocols like iSCSI, Fibre Channel and FCoE (Fibre
Channel over Ethernet) are utilized to make the storage blocks visible and
accessible by the server-based operating system

• It provides higher performance and low latency than file level access
File Level Storage
• In a file-level access, the file system is created at the storage side, and the
file-level request is sent over a network

• It is the storage technology used in Network attached storage (NAS)

• This method has higher overhead at the storage side, as compared to the
data accessed at the block level

• The storage disk is configured with a protocol such as NFS or SMB/CIFS and
the files are stored and accessed from it in bulk

• File level access is easy to use and implement

• File level storage is highly scalable and sharable between different users

• It provides simultaneous read and write to files from multiple users


Object Level Storage
• What we now see is that much of the data that is being produced is unstructured
data. Content or material that will never be changed again. And this is where
Object storage comes into play

• Object storage is a storage architecture that manages data as objects, where an


object is defined as a file with metadata and a globally unique identifier called the
object ID

• Unlike file or block storage, object storage does not use a hierarchy or directory
tree. Instead, every distinct unit of data exists at the same level in a storage pool.

• The main difference between the other concepts is that the objects are managed
via the application itself that supports Object storage. That means that no real file
system is needed here. This layer is obsolete.

• Object storage is not designed to be directly accessed by an operating system.


Instead, the interaction occurs through REST API over HTTP at the application level
Object Level Storage
Object Level Storage

• It simply uses a "GET" to retrieve an object; a "POST" to create that object;


and a "DELETE" to remove it

• Data in object storage can’t be modified. Instead If a user makes a change


another version of the same file is stored as a new object. This makes
object storage unsuitable for frequently changing data.

• But it is a good fit for data that doesn't change much like backups, archives.

• Or, for example, storage that holds vast amounts of video or movies that
are only watched but not changed like for example online movie streaming
sites or videos on YouTube
Block, File, Object Level Storage

You might also like