3PAR Sparing (Aug 2015)
3PAR Sparing (Aug 2015)
3PAR Sparing (Aug 2015)
HP 3PAR sparing
Table of contents
Executive summary ...................................................................................................................................................................... 2
Definitions ....................................................................................................................................................................................... 2
Spare space .................................................................................................................................................................................... 2
Distributed sparing ................................................................................................................................................................... 2
Rebuilds....................................................................................................................................................................................... 3
How much spare space do I need? ............................................................................................................................................ 3
Spare policy—schemes ........................................................................................................................................................... 3
Spare rate ................................................................................................................................................................................... 4
Sparing algorithms defined ......................................................................................................................................................... 4
Nearline (NL) drives are special .............................................................................................................................................. 4
Spare space implementation: admithw ................................................................................................................................... 5
How does it all work?.................................................................................................................................................................... 6
Evacuating drives ...................................................................................................................................................................... 6
Viewing spare space ................................................................................................................................................................. 6
Changing spare space policy .................................................................................................................................................. 7
Sparing performance .................................................................................................................................................................... 9
System performance during sparing .................................................................................................................................... 9
Sizing: NinjaSTARS ...................................................................................................................................................................... 10
Adaptive sparing .......................................................................................................................................................................... 11
Adaptive sparing examples .................................................................................................................................................. 12
Predictive drive failures .............................................................................................................................................................. 13
Conclusion ..................................................................................................................................................................................... 13
Technical white paper | HP 3PAR sparing
Executive summary
HP 3PAR storage arrays are designed for high availability and include many features that enable the array to keep going
when a failure occurs and one of these features is the sparing policy.
The HP 3PAR StoreServ sparing policy provides space and processes to handle mechanical failures of spinning media
devices as well as flash devices. The sparing policy gives the user some measure of control in defining different levels of
space overhead to match the needs of the environment. HP 3PAR StoreServ sparing architecture provides many-to-many
rebuilds resulting in speedy recovery. The reduced recovery time limits the exposure of a second failure occurring while still
recovering from the first failure.
HP 3PAR StoreServ sparing policy and algorithms are an integral part of the HP 3PAR StoreServ operating system and are
part of every HP 3PAR storage array. This white paper explains the sparing policy and algorithms of HP 3PAR StoreServ
arrays so applications achieve maximum benefit.
• Chunklet: Physical disks are divided into chunklets. Each chunklet occupies contiguous space on a disk. Chunklets on
HP 3PAR StoreServ 10000 and 7000 Storage arrays are 1 GB.
• Sparing policy: HP 3PAR StoreServ policy which manages the sparing process. The policies include user options (schemes)
and algorithms to reserve spare chunklets and move data to these chunklets when necessary.
• Devtype: A category of device such as FC, NL, and SSD.
– FC = Fast Class. 10K and 15K RPM spinning media devices
– NL = Near Line. 7.2K spinning media devices
– SSD = Solid State Drive (see below)
• Scheme: Sparing policy configuration options that include:
– Default
– Minimal
– Maximal
– Custom
• Relocation: The movement of data in a chunklet from one place, such as a failing drive, to another place, such as a spare
chunklet on a good drive.
• SSD: Solid State Drive. Non-volatile flash memory chips packaged in a hard disk form factor. Functions as a disk device,
but with the properties of flash including lower power consumption and faster random read performance.
Spare space
Storage arrays today have a variety of approaches to address the basic challenge of how to keep an array running when
failures occur. The basic building block of data protection has been RAID for a long time and this key element continues
today. A key element of data protection is sparing.
Sparing is a process by which a RAID storage system restores redundancy to data stored across a collection of disks after a
single disk fails. All RAID arrays in the industry today implement some type of sparing algorithm.
Traditionally, storage array vendors have architected spare disks as part of array design. Spare disks, sometimes called hot
spares or dedicated spares, are extra disks that are standing by waiting for a failure to occur before they are utilized. When a
primary disk fails, the spare disk is immediately called into service to replace the failed disk.
One problem with the dedicated spare disk approach is loss of performance. A dedicated spare disk is not used until the
failure of another disk. The space, power, potential performance, and capital used to purchase the disk are all wasted
resources the majority of the life of the disk. HP 3PAR has addressed these shortcomings by virtualizing the function of the
spare disk called distributed sparing.
Distributed sparing
On HP 3PAR StoreServ arrays, physical disks are divided into chunklets when a disk is admitted to the system. Some
chunklets on each disk are used to hold user data and some chunklets are designated as spares. This spare space serves
the same function as a dedicated spare, but provides the performance benefit of having all drives in the system active.
Technical white paper | HP 3PAR sparing
Another benefit of distributed sparing is many-to-many rebuilds. When a drive fails, a process begins to recover the lost
redundancy by rebuilding the data that was on the failed drive. All used chunklets on the failed drive are rebuilt on spare or
free chunklets on other drives. Used chunklets are reconstructed by reading data from the remaining chunklets in the RAID
set and computing the missing data from the parity information if necessary.
The rebuild process chooses a target spare chunklet using several criteria. These criteria are prioritized to maintain the
same level of performance and availability as the source chunklet if possible. The following list shows the priority used to
select a spare chunklet.
1. Locate a chunklet on the same type of drive (e.g., NL, FC).
2. Maintain the same HA characteristics as the failed chunklet (e.g., HA cage, HA magazine).
3. Keep the chunklet on the same node as the failed drive.
The best case is when spare chunklets are on the same node as the failed drive with the same availability characteristics.
When spare chunklets with this criteria are not available, free chunklets with the same characteristics are considered.
During the sparing process if the number of free chunklets used exceeds a threshold set in the HP 3PAR StoreServ OS,
consideration will be given to spare chunklets on another node. This will help keep the array balanced.
The sparing algorithm will locate target chunklets spread around many different disks. The disk being rebuilt will also have
its remaining good chunklets spread among many disks creating a many-to-many rebuild.
Spare policy—schemes
The HP 3PAR StoreServ OS provides four user-settable options relating to spare space sizing. These options, referred to as
schemes, are:
• Minimal
• Default
• Maximal
• Custom
Best practice: The default spare policy provides the best balance of reserved spare space and usable space and is the
HP recommended option for most configurations.
The spare policy is set when the system is installed. The policy can be changed using HP 3PAR StoreServ CLI commands.
The spare policy is set with the HP 3PAR StoreServ CLI setsys command. The spare policy is implemented with the HP 3PAR
StoreServ CLI admithw command.
Technical white paper | HP 3PAR sparing
The first three sparing policies—Minimal, Default, and Maximal—are automatically managed by the HP 3PAR StoreServ OS,
while the custom setting requires the administrator to actively manage spare space. The custom setting requires the use of
the HP 3PAR StoreServ CLI commands createspare, removespare, and showspare.
HP 3PAR StoreServ CLI commands used to manage sparing are documented in the HP 3PAR Command Line Interface
Reference manual.
The HP 3PAR StoreServ CLI setsys command allows many system parameters to be set. The specific form of the command
to set the sparing policy is shown below. This example sets the sparing policy to “Default”.
lab-eosxx cli% setsys SparingAlgorithm Default
Spare rate
A key parameter in the spare policy is the spare rate. The spare rate is a target amount of space to set aside for sparing
expressed relative to the number of disks of a given type in the system. A spare rate of 24, for example, sets target spare
space equal to the size of one drive for every 24 drives in the array.
The spare rate is defined based on the hardware configuration in the array as follows:
• Spare rate = 40 if there are any magazines present in the array (e.g., 10,000)
• Spare rate = 24 for all others such as 7400/7400c, 7450/7450c, etc.
The two terms, spare rate and sparing rate, are similar but different. Please don’t confuse them. The term spare rate as used
here represents the ratio of spare space to total space as just discussed. The term sparing rate is the rate at which we can
move data to reconstruct a failed drive and is dependent on many factors, including array load and the number of drives.
Technical white paper | HP 3PAR sparing
The admithw command can be run in a production environment to change the sparing scheme. The back-end workload to
implement a sparing policy change is low.
The sparing policy setting can be displayed using the “showsys –param” command. As you can see from the example below,
this command will show the current value of many system parameters including the sparing algorithm value, which is set
to Default.
lab-eosxx cli% showsys -param
System parameters from configured settings
------Parameter------ --Value--
RawSpaceAlertFC : 0
RawSpaceAlertNL : 0
RawSpaceAlertSSD : 0
RemoteSyslog : 0
RemoteSyslogHost :
SparingAlgorithm : Default
EventLogSize : 3M
VVRetentionTimeMax : 336 Hours
UpgradeNote :
PortFailoverEnabled : yes
AutoExportAfterReboot : yes
AllowR5OnNLDrives : no
AllowR0 : no
ThermalShutdown : yes
FailoverMatchedSet : no
SessionTimeout : 01:00:00
lab-eosxx cli%
Technical white paper | HP 3PAR sparing
Evacuating drives
The process of moving chunklets off a disk is called evacuating the drive. This process is used when addressing a failing, but
not yet failed, disk. This process can take considerable time, depending on how much data must be evacuated off the drive
and how much I/O is active on the back-end of the array.
Some service commands also cause drives to be evacuated. These commands are intended for service engineers (e.g., TS).
Technical white paper | HP 3PAR sparing
Here is an example from a 7400c with a failed drive. This example uses the –used option to only show chunklets currently in
use as spare chunklets.
Lab-eosxx cli% showspare -used
Pdid Chnk LdName LdCh State Usage Media Sp Cl From To
12 1616 tp-3-sd-0.13 10 normal ld valid Y N 2:92 ---
14 1609 tp-3-sd-0.1 24 normal ld valid Y N 2:6 ---
14 1610 tp-3-sd-0.1 192 normal ld valid Y N 2:15 ---
14 1611 tp-3-sd-0.5 112 normal ld valid Y N 2:40 ---
14 1612 tp-3-sd-0.11 42 normal ld valid Y N 2:80 ---
16 1609 tp-3-sd-0.1 0 normal ld valid Y N 2:5 ---
16 1610 tp-3-sd-0.1 181 normal ld valid Y N 2:14 ---
16 1611 tp-3-sd-0.3 114 normal ld valid Y N 2:27 ---
Total chunklets: 8
------Parameter------ --Value--
RawSpaceAlertFC : 0
RawSpaceAlertNL : 0
RawSpaceAlertSSD : 0
RemoteSyslog : 0
RemoteSyslogHost :
SparingAlgorithm : Minimal
EventLogSize : 3M
VVRetentionTimeMax : 336 Hours
UpgradeNote :
PortFailoverEnabled : yes
AutoExportAfterReboot : yes
AllowR5OnNLDrives : no
AllowR0 : no
ThermalShutdown : yes
FailoverMatchedSet : no
SessionTimeout : 01:00:00
Technical white paper | HP 3PAR sparing
Technical white paper | HP 3PAR sparing
Sparing performance
When a drive fails, it reduces the data protection for the RAID set. Since data protection is a high priority, a key question
becomes “How long will the reduction in data protection last?” Stated another way, how long will it take to rebuild the data
from the failed drive to a new location so data protection is restored, or simply, how long will the rebuild take?
There are a large range of possibilities depending on factors like the size of the failed drive, how much data is on that drive,
how busy the array is, and the configuration. A 600 GB FC disk, for example, in a configuration with 140 other FC disks may
take 90 minutes to rebuild while a 900 GB FC disk configured in RAID 6 on a busy system may take considerably longer.
This window of reduced availability that follows the failure of a physical disk is important for several reasons. First, data
protection is reduced such that a second failure during this window could lead to data loss.
A second dynamic that occurs during this window is a change in the I/O behavior of the data that was on the failed disk.
A write to the failed disk will result in the write being cached as always. The write will be de-staged to a log space, which
is space set aside to handle situations like this. Once the rebuild is complete, the log disk is replayed to apply the latest
changes to the volume. A read operation will require the data be reconstructed from the available good data. This
reconstruction may require multiple back-end reads depending on the RAID mode.
Figure 2 shows what’s happening on the back-end of the array in the same example. You can clearly see a large increase in
the read rate (green line goes from ~300 Mbps to ~500 Mbps) and a smaller increase in the write rate (red line).
Figure 2. Disk Throughput during Rebuild
Although the host service times are not impacted by the failed disk in this case, there is an increase in the workload on the
back-end of the array. It is easy to imagine a workload where this increase in back-end I/O resulting from a failed disk could
cause increased host service times.
Technical white paper | HP 3PAR sparing
There are other potential impacts from rebuilding a failed disk. When space is constrained it is possible that some data may
be moved between controllers which could result in changing the workload balance between nodes. In one example a
balanced system handling 20,000 IOPS by each of two controllers before a disk failure became imbalanced after the disk
rebuild. Following the disk rebuild, one controller was handling 17,500 IOPS (44 percent) and the other controller was
handling 22,500 IOPS (56 percent). This imbalance was caused by the need to rebuild some chunklets on a different
controller than the controller owning the failed disk.
HP 3PAR storage arrays are highly available and have many features to protect user data from single failures. When a
failure does occur, however, it is possible to introduce a change in performance. It will not be observed in all cases, but
following a failure you should not be surprised if performance changes. There is no guarantee that performance will be
maintained at the same level following a failure.
Sizing: NinjaSTARS
Sparing requires resources and therefore must be considered when sizing the system. NinjaSTARS (version includes
a provision to specify either Default or Minimal as spare space policies. Let’s restate HP’s recommended sparing option
which is the Default policy. The NinjaSTARS calculations are a bit simpler than the actual algorithms, but are very useful in
understanding the impact of the two spare space policies on usable space, especially in small configurations.
NinjaSTARS (STorage Assessment, Recommendation, and Sizing) is an HP 3PAR sizing tool used by account teams to size
storage solutions. If you are sizing a HP 3PAR array, contact your local account team for more information about
The main menu bar includes a pull-down as highlighted in the screen-shot below. It allows you to choose a sparing policy of
Default or Minimal.
Figure 3. NinjaSTARS Default Sparing Algorithm
Figure 3 shows , NinjaSTARS estimating 8.2 TiB usable capacity from the small configuration when using the Default Sparing
Algorithm. The “Usable vs. Overhead” box on the right of the NinjaSTARS screen shows it is allocating 1.60 TiB for spare
space in this configuration.
In figure 4 NinjaSTARS is now estimating a usable capacity of 8.8 TiB following a change to the configuration to use the
Minimal Sparing Algorithm. The source of this change can be seen in the “Usable vs. Overhead” box on the right where spare
space is now 0.80 TiB.
Technical white paper | HP 3PAR sparing
The usable capacity difference of about 600 GiB is the result of the sparing algorithm changes between Default and Minimal.
In small configurations (less than 48 physical devices), the difference is at most the space of one drive. In this example using
900 GB drives configured in RAID 5 (3+1), the one drive difference is about 600 GiB as reflected in the NinjaSTARS output.
Adaptive sparing
HP announced adaptive sparing on some SSDs in 2014 as a unique, patented way to take spare space to the next level.
HP 3PAR has always addressed the need for spare space better than some with distributed sparing. This is native to
HP 3PAR and it eliminates unused performance that plagues some vendors’ dedicated spare disk implementations.
Adaptive sparing takes the solution to the next level by matching the SSD vendors need for extra space to manage
endurance with HP 3PAR spare space needs. Current NAND flash technology has a property where write operations wear
the flash chips and they eventually wear out. Vendors use many features to extend the life of the flash such as wear leveling
and over provisioning and today’s NAND flash endurance is guaranteed by HP in HP 3PAR StoreServ arrays for five years.
NAND flash over provisioning is key to managing endurance, but it’s expensive and limits the stated capacity of the SSD to
less than the quantity of flash chips in the device. Overprovisioning is in many ways the opposite of thin provisioning (TPVV).
Thin provisioning allows HP 3PAR StoreServ arrays to tell a host a particular VV has more capacity available than is currently
written (the host believes there is more storage than is currently provisioned). SSD overprovisioning reports less capacity
than the sum of the flash chips. A 480 GB SSD, for example, might have a total of 576 GB of flash chips where the additional
96 GB (20 percent) is used for overprovisioning.
Overprovisioning is a design feature incorporated into SSDs by flash vendors. HP 3PAR Adaptive Sparing works with SSD
vendors flash devices to maximize SSD space and endurance. Adaptive sparing matches SSD overprovisioned space with
HP 3PAR spare space, which is allocated, but not put into service until needed. The following figure shows how this works.
Technical white paper | HP 3PAR sparing
In figure 5 the traditional architecture shows the overprovisioned flash chips used to manage endurance in the blue space
labeled Internal OP. Distributed sparing will allocate spare chunklets as indicated by the gray space, further reducing space
available for user data.
The Modern HP 3PAR architecture in figure 5 shows Adaptive Sparing merging the HP 3PAR spare space and part of the
internal overprovisioning space. The result is significantly greater user space while preserving over provisioned space during
normal operations. Adaptive sparing allows a 1.6 TB SSD, for example, to have 1.92 TB of stated capacity, representing a
20 percent increase in usable capacity and allows HP 3PAR to deliver a five-year lifespan for the drive.
In normal operations, adaptive sparing allows an SSD to operate with both increased space available to users and the full
complement of overprovisioned space designed by the HP partner flash device OEM.
When a failure occurs, however, the SSD will have to operate on a reduced amount of overprovisioned space for a time as
spare chunklets are called into use by the HP 3PAR sparing policy. When the failed component is returned to service, spare
space and overprovisioned space are once again merged.
Adaptive sparing adds an additional consideration to the sparing algorithm. The sparing algorithm starts with a spare
allocation equal to the size of the two largest drives when using the Default setting. An additional consideration with
adaptive sparing is to allocate a minimum of 10 percent spare space per drive for drives with adaptive sparing.
Technical white paper | HP 3PAR sparing
HP 3PAR distributed sparing provides protection from drive failures while avoiding costly idle resources such as hot spares.
Many-to-many rebuilds quickly restore redundancy and minimize exposure to multiple failures. Sparing schemes offer
choice to the storage administrator, to tailor the sparing policy to the needs of the environment. The CLI offers the
commands needed to monitor the sparing policy and implement changes when necessary. The NinjaSTARS sizing tool is
aware of spare space and includes this calculation in space estimates. Finally, adaptive sparing allows HP 3PAR spare space
to work with SSD overprovisioned space, to provide more usable space and lower the cost per GB.
HP 3PAR Command Line Interface Reference
HP 3PAR StoreServ Storage: optimized for flash
News Advisory: HP Delivers All-flash Arrays for the Mainstream
HP 3PAR StoreServ Storage best practices guide
An Introduction to HP 3PAR StoreServ for the EVA Administrator
Learn more at
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for
HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as
constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.