How To Build A Disaster Recovery Plan: Best Practices, Templates and Tools

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

TECH BRIEF

How To Build a Disaster


Recovery Plan: Best Practices,
Templates and Tools

EXECUTIVE SUMMARY
How do you start building a DR plan? While there are lots of tools from
vendors, it's hard to find a practical approach that comes from first-hand
experience.

This 4-step DR planning framework Business Impact Assessment, Risk


Assessment, Risk Management, and Recovery Testing were developed by
Zetta.nets Director of Operations, Rich Webster, over 20 years of managing
large scale IT infrastructure environments, at companies including Netscape,
eBay, and Shutterfly.

No amount of money or planning can stop some IT disasters from happening. But a good disaster
recovery plan can reduce your downtime from a week or a day to hours or even minutes.

Like any important project, DR starts with planning, followed by best-practice templates and procedures, which in turn are
implemented, in part, by the right tools.
In addition to identifying mission-critical applications and any infrastructure they rely on, you should also identify the data
these applications and tasks need to have access to.
This can include recent email, customer databases, and any documents, spreadsheets, presentations and other
"unstructured" files used by project/product management, development, sales, manufacturing, etc.
Your company has accumulated a substantial amount of data over time hundreds of gigabytes, perhaps terabytes or even
petabytes. But only some often a small fraction of this data has to be made available again quickly.

DR Planning 101
What Causes IT Disasters
The cause of an IT disaster may be small and specific. A power supply, CPU, network interface card, RAM, fan, or other component
on an individual server may fail. A brief power fluctuation may scramble data or disrupt a program's activity.
An entire data center going down is rare, but can happen. Weather may take down external power or network service. The resulting
fire, flood, or building damage may bring down your entire computer room or data center.
For real world examples, Rich Webster, Director of Operations at Zetta, recalls, While at Zetta, I've had a backhoe break one of the
network connections to my data center, a code upgrade to edge routers not go well, and PDU breakers at a hosting company fail.

DISASTER RECOVERY PLANNING STEP 1: Business Impact Analysis


A Business Impact Analysis (BIA) defines what capabilities your
company can't operate without. This is the first step in creating a
working disaster recovery plan.
Doing a BIA must involve top-level non-IT management, to identify and
agree on the list of applications that are considered essential, and IT
management, to map these tasks against the applications along with the
associated infrastructure and other services needed to run and use
these applications.
All top stakeholders must be involved in the analysis. You don't want to find out only after a site goes down that there was
an additional application an executive considers essential.
www.zetta.net

DR Planning 101
Recovery Point Objective (RPO) and Recovery Time Objective (RTO)
The data that you want to regain availability to in a timely fashion is called the Recovery Point Objective (RPO).
How soon you want this data available again is called the Recovery Time Objective (RTO).
How much IT downtime for mission-critical applications and data is acceptable depends on many factors (notably cost), and will
vary from one company to the next -- but in general, acceptable downtime today is minutes-to-hours, compared to days to a week
or more from a decade ago.

DISASTER RECOVERY PLANNING STEP 2: Risk Assessment


The second step to a complete DR plan for you organization includes
mapping the 2 types of IT infrastructure:
1. IT infrastructure you control, whether located in your offices or in
co-location facilities, and IT.
2. IT infrastructure you don't control like web and cloud services or
web sites running in a hosting center.
Once the IT infrastructure has been mapped, look for single points of failure, like a server with only one network card.
These are your first places to consider "fortifying" with redundancy.

DISASTER RECOVERY PLANNING STEP 3: Risk Management


To lower the risk of a data disaster occurring, fortify yourself against the
most common issues and you will have protected yourself against
90%-95% of that small incidents that may impact you. For example,
says Webster, Because of good DR planning by IT, Zetta's uptime,
both for its own operations, and for the service that Zetta provides to its
customers, is above 'five nines' 99.999%, meaning total annual
downtime of less than five and a half minutes availability for over five
years. From a customer perspective, we've never had more than a brief
blip of unavailability.

www.zetta.net

Redundancy is one popular approach to avoiding or minimizing many IT disaster events. For example, servers, storage
and network gear can be configured with two power supplies, connected in turn to separate power sources. Servers,
firewalls, UPSs and other gear, even entire sites, can be duplicated. Network and electrical service can be supplied by two
separate utilities, on separate cables. Data can be stored across multiple hard drives.

DR Planning 101
Hosting Applications vs. Outsourcing
Another critical component of managing the risk of data disasters is assessing whether it's time to outsource any of your IT
applications and services, and move them to the cloud. At Zetta, says Webster, We had been running our own phone PBX and
email in-house, on-site. By moving these to cloud-based services, we now have redundant phone service centers via the cloud, so
sales and support stays up if we have an 'IT event' at our office.

DISASTER RECOVERY PLANNING STEP 4: DR Testing


There are only two ways to determine whether a DR plan works.
One is when there's a disaster. This, of course, is the wrong time to
discover that you chose wrong, or that one of your tools or services has
failed, or that you didn't include a critical application.
The other way is to periodically conduct tests.
It is better to uncover a shortcoming in your infrastructure by testing
failure scenarios under controlled circumstances, says Webster. For example, in a controlled test, if you discover that a
network card isn't working properly, you can halt the test, install a new card, and run the test again. If you don't discover
this until a real event, you may miss your target time to restore IT service.
External audits can help identifying whether there are any parts of your DR that still need work. One reason is that not all
organizations will simulate a full disaster scenario, or carry through to confirm that a full recovery can be done. An external
audit can hold you to a higher standard than your company may have set, and conduct full, rigorous tests, forcing you to
follow the best IT practices, says Webster.

www.zetta.net

DR Planning 101
Make DR Documentation Available
There's a lot of information associated with a disaster recovery plan. Contact information for vendors, your employees, utility
companies, and other organizations you may need to talk with. IT equipment inventories including serial numbers and warrantee
information, circuit IDs, building maps, etc.
Make sure that you have copies of this information that you can access even if all your regular IT and possibly the wired and
wireless phone networks is offline. Consider storing a copy online, and also keeping a secured copy on your smartphone, tablet,
or notebook, and on a flash drive.

Offsite Backup Approaches


In most IT disaster events, disaster recovery involves restoring data, because the primary copy has been damaged,
destroyed, or rendered inaccessible.
To ensure that a copy of your data is available if when an IT disaster occurs, an offsite backup is critical. It should be
geographically far enough away that a major event like fire, flood, power outage, explosion or earthquake doesn't hurt or
isolate the backup.
Tape ruled the offsite backup world for decades. But there are problems with tape-based backups:
Off-site tapes take time to request, find, and retrieve.
If a tape is faulty, you don't find out until you need it.
To read older-generation tapes, you need to have a working tape drive that supports them and if your site is
inaccessible, at your alternative location. This adds to infrastructure costs.
You may have to go through the entire tape just to retrieve a few files.
Many tape-oriented backups use proprietary formats, and require vendor software to be read another recurring cost.
In today's online, 24x7x365 world, a backup that's not quickly and easily available may be good for preserving important
company data but it isn't useful for disaster recovery. Today's RTOs are measured in hours or even minutes.

Free DR Plan Templates


Provider

Link

TechTarget

http://searchdisasterrecovery.techtarget.com/tip/Top-five-free-disaster-recovery-plan-templates

IBM

http://publib.boulder.ibm.com/iseries/v5r1/ic2924/index.htm?info/rzaj1/rzaj1sampleplan.htm

Michigan State University

http://www.drp.msu.edu/documentation/stepbystepguide.htm

Texas A&M University

http://www.tamuct.edu/departments/informationtechnology/extras/ITDisasterRecoveryPlan.pdf

www.zetta.net

ZETTA.NET: An Enterprise-grade Disaster Recovery Tool


With Zetta, your backup data is replicated making it a fully instantiated file system in its native format so
disaster recovery becomes as easy as pulling a file off a file server.

4 Disaster Recovery Options Local Storage, Mounted Web Drive, Web Browser and
Software Client
1. Local Recovery: Recover data from local storage over
the LAN.
2. Mounted Drive: Data backed up to Zetta can be
mounted as a drive letter, enabling full file-system
recoveries, as easily as accessing a network share.
3. Web-Based Recovery: If your data center has been damaged by a fire or tornado, for example, all that is needed
to retrieve lost data is a web browser. Simply login to you System Mangement Portal and browser the file system to
most critical files.
4. ZettaMirror: ZettaMirror is award-winning backup software that replicates data creating a second copy that is
always available online. This option is most common for Exchange & SQL data base recoveries.

Pricing starts at $195 a month and includes:


Software licenses for an unlimited number of servers/endpoints
All functionality included (Backup, DR,archiving)
Plug-ins and APIs (SQL, Exchange, VMware, Files)
500GB of secure online storage
24x7 US-based support

If youre lucky and have fortified your IT infrastructure your company may escape major IT-impacting
disaster events.
During my 20 years in IT, I haven't yet had to invoke a full DR plan -- although I have come close, says
Webster. But I have had to invoke parts of my DR plan, about once a quarter. You have to be ready to do
some level of DR periodically.

CONTACT US

For more information, contact Zetta: sales@zetta.net


or call 877-Go-Zetta (650-590-0950)
Zetta Inc. is an award-winning provider of enterprise-grade 3-in-1 backup, disaster recovery, and archiving.

WWW.ZETTA.NET

You might also like