Bus Cont Plan
Bus Cont Plan
Bus Cont Plan
Incident response a process that gets triggered when something unexpected happens in a way that
threatens continuity.
Disaster recovery play a role when huge incident appears where the business can’t continue its
operations.
An example of an event is quarantined e-mail that appears to be suspicious. A security analyst assesses
the e-mail and decides either to release it to the recipient or eradicate it.
System outages, whether malicious or accidental, fall into the adverse event bucket.
Insider threats remove data without authorization trigger a full-fledged incident response.
The team
The incident response plan identifies the individuals who make up the
incident response team and their roles.
Usually, someone from cybersecurity, at the manager or director level,
owns incident response.
Management: owns incident response where It funds, allocates resources,
and controls policy decisions.
IT support: Not everyone in IT will respond to incidents. Unique events call for others to
participate, based on expertise.
Legal department: The general counsel’s presence on the extended team or executive response
team is expected. Engaging the legal department earlier should be expected in certain situations.
Public affairs and media relations: Large breaches garner media attention, and involvement of
personal information requires disclosure.
Human resources: This group’s input becomes necessary when employee involvement is
suspected.
How Vulnerabilities Become Risks
Vulnerabilities represent weaknesses in information systems. Threat actors seek to uncover and exploit
these in a successful attack. Weak passwords, default accounts with default passwords, and unpatched
systems are examples of vulnerabilities commonly exploited.
For a risk to be present, a threat and a vulnerability must exist. Vulnerabilities that no threat actor or
scenario would exploit are not a cybersecurity risk.
A threat actor, in this case, a malicious insider, exploits a vulnerability—default admin credentials—
creating a risk to the confidentiality, integrity, or availability of customer data.
Packet capture aids incident response teams’ need to confirm whether suspected events exist.
Organizations implement these solutions based on the incident response and monitoring strategy.
Example is NetFlow developed by Cisco, which allows entities to capture data on the origination,
destination, and amount of traffic.
A set of tools and services offering a holistic view of an organization's information security. SIEM tools
provide Real-time visibility across an organization's information security systems.
Containment
- Containment comes after identifying an event and concluding that action is required to limit its
impact.
- Containment is about limiting the damage done by attackers. This is achieved by keeping the
attacker away from key assets not yet compromised. Containing an event or incident requires
identifying indicators of the attack and identifying them in other systems.
- Once a system is suspected of being compromised, it should be isolated. Some ways to do this
include: Unplugging the network cable, Putting the machine in sleep mode (Powering it off
causes volatile memory loss and the loss of forensic evidence.), or Isolating the machine so that
it cannot receive data via changes to DNS and firewall rules.
- Denial of Service (DoS)
DoS and distributed denial of service (DDoS) attacks aim to shut down services and disrupt
business operations. The attacks target web-facing applications, and DNS services.
Attempting to contain these attacks involves the following important steps:
1. Assess firewalls, routers, servers, and other affected device logs.
2. Pinpoint how the DDoS attack traffic differs from non-threatening ones and review network
traffic looking for DDoS traffic.
3. Block traffic with perimeter devices.
4. Block outbound traffic responding to the DDoS.
5. Blackhole malicious IPs attributed to the attacker.
6. Temporarily disable applications and services affected by the attack.
7. contact the Internet service provider to confirm if it sees the attack.
- Lost Assets
Assets can be misplaced or stolen by end users and employees, and when these events occur,
several questions must be answered.
Assets can be laptops, tablets, mobile phones, desktops, printers, hard drives, and other types of
removable or portable storage.
Attempting to contain these attacks involves the following important steps:
1. Reporting to Policy
2. Wiped remotely.
This starts once systems known to be compromised are available to be taken offline so that eradication
can occur. Removing files and reversing registry and configuration changes malware and attackers made
during the attack are addressed.
Once all the affected machines are identified and isolated and forensic backups are completed, the
company can address weaknesses exploited by the attackers. These vulnerabilities are patched, and
insecure configurations are repaired.
Eradication Techniques
Malware Artifacts
Antivirus solutions removed files and fixed changes made to operating systems by malicious
software.
Some Malware can only be removed by:
1. Taking the machine offline by removing the network cable
2. Booting the machine in safe mode
3. Using the Malware removal tool
4. Rebooting the machine and confirming that the infection is gone.
6 Business continuity STRATEGY
BCM strategy should be aligned with business and IT strategies to ensure that regulatory and
legal requirements are met. BCM policies and procedures should incorporate the necessary
controls to ensure that data integrity and privacy are not compromised during recovery efforts.
While developing business continuity strategy, the following should be focused:
1. Business processes and operations
2. Users
3. Datacenter
4. Networks
5. Facilities
6. Supplies
7. Data (off-site storage of backup data and applications)
The following factors pose a large challenge in the choice of appropriate BCM strategies:
1. Presence in multiple locations
2. Availability of recovery options such as owned, leased, shared or mobile facilities.
3. Increasing number of threats, risks and vulnerabilities
4. Complexity of external dependencies on supply chain channel
Business Continuity strategy is based on worst-case scenarios, and Business Continuity team will
help build these scenarios based on past incidents and future predictions. Some businesses
propose business recovery strategies that are different from the rest of the organization.
1. Protection of assets.
2. Measures to limit loss during disruption.
3. Minimize business loss and loss of customer goodwill.
4. Improving prompt salvage of assets during disaster.
5. Ensuring orderly evacuation of personnel and moving them to safety. Providing resources for
BCM and ensuring proper coordination between BCM teams by properly structuring them across
locations and providing for their backups in case any of them is not available during crisis.
6. Reduction of response time through planning and exercising.
Recovery Options
1. Prevention: strategy that aims to reduce the chances of the disaster happening. It consists of
deterrent controls that reduce the likelihood of occurring threats. Preventive controls safeguard
vulnerable areas to ward off any threat that occurs and reduce its impact. Having these
measures in place is always more cost-effective than attempting recovery after the interruption.
Types of preventive controls that can be adopted by the enterprise:
authorized personnel only, and a log of entry other than authorized personnel have to
C. Application controls: They help run business processes. Hence proper access control,
detection systems to study anomalous behavior over the network, annual vulnerability
assessments and penetration testing to overrule risk from open ports, and so on may be
D. Data storage controls: Off-site storage of backups and a proper predefined backup policy
and procedures for backup, storage, testing, restoration, and purging after retention
2. Response: In this stage, the first responses to an incident should be delisted. The first response
to an incident is to notify the right people. A point to note is that major recipients of BCM
communication are:
A. CIOs and CTOs
B. IT directors and data center managers
C. Security and risk management officers
D. Data center architects
E. Application owners
Notification of impending disaster can be given by issuing prior warnings through the appointed
Timely notification can ensure orderly shutdown of machines and systems and if necessary, have
an orderly evacuation of premises made in case of risk to premises. This is one of the first
response steps to move to safety all personnel on the premises and to alert the police, fire
service, and hospitals. This is required only if the interruption is of the nature of an accident, act
of sabotage, or natural calamity. Precise notification procedures must be documented, and call
lists for persons to be contacted and informed should exist both at primary site and at the
backup site to facilitate mobilization of notification procedures. Notification can be done using
various tools: pager, short message service (SMS), phone, and e-mail.
3. Resumption: involves resuming only the time-sensitive business processes, either immediately
after the interruption or after the declared mean time between failures (MTBF).
All operations have not fully recovered. The focus shifts to the command center once the BCM
teams declare the severity of the disaster and invoke the appropriate plan of action. The
resumption and subsequently the recovery activities are coordinated after this point.
Command center is a facility located near to the primary facility and has adequate
communication facilities, PCs, printers, fax machines, and office equipment to support the
It is a 24/7 staffing; it is ready to be operational within a small span of time. In case of extremely
small RTOs and RPOs, it would be good to have the systems up and running in a short time.
Organizations such as financial institutions where they hold a lot of customer data and have a lot
personnel. The element missing here is customer data. An organization can install additional
3) Cold site: A cold site is a type of data center that has its own associated infrastructure that
systems, applications, and data which are installed only when disaster strikes, and the DR plan is
activated.
4) Mobile site: A mobile site is a portable van or trailer that can be used as an emergency-
processing center at the time of disaster. It provides an excellent alternative to the above three
options. After a disaster, the trailer can move to site, all essential equipment, and supplies can
be loaded onto it, and then connections for power and communication are added to it before it
5) Mirrored site: A mirror site is identical in all aspects to the primary site, right down to the
naturally the most expensive option. At the alternate site (or primary site, if still usable), the
work environment is restored. Communication, networks, and workstations are set up and
Restoration: It is the process of repairing and restoring the primary site. At the end of this, the business
operations are resumed in totality from the original site or a completely new site. While the recovery
team is supporting operations from the alternate site, restoration of the primary site for full functionality
is initiated.
7 Operations Security
Information Security Domains
1. Application
2. Physical
3. Network
4. Cryptograph
5. Access Control
6. Legal Regulations
7. Operational
8. Risk Mgmet
9. Security Architecture
10. BCP/DRP
Security Operations are primarily concerned with the daily tasks required to keep security services
operating reliable and efficiently.
Operations security is a quality of other services. Security operations is a service in its own right.”
Issuing policies, standards, baselines, and procedures are part of due diligence. Applying these
types of documents is due care.
Installing patches to mitigate the latest CVE is due care, understanding the reason for the CVE
and making sure it has been fully understood is due diligence.
Performing an annual security audit is due diligence, but taking corrective action from the results
of an audit is due care.
Administrative Management
• It is the most important piece of Operations management.
Separation of Duties:
is a preventive administrative control put into place to reduce the potential of fraud. For example, an
employee cannot complete a critical financial transaction by herself. She will need to have her
supervisor’s written approval before the transaction can be completed.
Objective is to ensure one person acting alone cannot compromise the security of a system in
any way.
Prevents any person from becoming too powerful within an organization. This policy also
provides singleness of focus. For instance, a network administrator who is concerned with
providing users access to resources should never be the security administrator. This policy also
helps prevent collusion as there are many individuals with discrete capabilities. Separation of
Duties is a preventative control.
Collusion is an agreement among multiple people to perform unauthorized or illegal actions. It is
hindered by the separation of duties, restricted job responsibilities, audit logging, and job
rotation. It helps prevent mistakes and minimizes conflicts of interest.
Separation of Privilege
Similar to Separation of duties but builds upon the principle of least privilege and applies it to
applications and processes. It mandates that users, accounts, and computing processes only have
minimal rights and access to resources that they absolutely need. Requires the use of granular rights and
permissions.
Segregation of duties
Goal is to ensure the individuals do not have excessive system access that may result in conflict of
interest.
Most common implementation of segregation of duties policy is ensuring that security duties are
separate from other duties.
Need-to-know Access.
Access is granted only to data or resources that are needed to perform a task.
It is commonly associated with security clearances of subject.
Restricting access based on need-to-know helps protect against unauthorized access resulting in
a loss of confidentiality.
Typically, it is focused on user privileges, but it can also be applied to processes and applications.
Two-Person Control
Also called two-man rule, requires approval of two individuals for a critical task.
It ensures peer-review and reduces opportunity for collusion and fraud.
Split knowledge
Combines the concept of separation of duties and two-person control.
No single person has sufficient privileges to compromise the security of the environment.
Job Rotation
Employees are rotated through jobs.
Provides peer-review, controls fraud and enables cross-training.
It can act as a deterrent and detective control.
Mandatory vacation
Employees are required to take one-week or two-week vacations mandatorily.
Provides peer-review, helps detect fraud and collusion.
It can act as a deterrent and detective control.
Clipping Levels
Predefined thresholds for the number of certain types of errors that will be allowed before the
activity is considered suspicious.
In most cases IDS software is used to track these activities and behavior patterns.
Clipping Level
• Important term* – “the threshold of “violations attempts” that should be considered NORMAL
and NOT logged”
• Example: you might not log that a user unsuccessfully tried to login, unless they unsuccessfully
logged in more than 3 times. (for example, the first or second time might have been typing
mistakes or caps lock being down”
• Why use clipping levels – (avoid to many false positives, avoid “overwhelming” the analysis unit)
Control Mechanisms
Control Mechanisms
Protect information and resources from unauthorized disclosure, modification, and destruction
• Configuration Management
• System Hardening
• Change Control
• Trusted Recovery
• Media Management
• Monitoring
• Security Auditing and Reviews
Fault Management
The goal of high availability is to reduce/eliminate downtime within an organization. Though there are a
variety of ways to provide availability most revolve around the idea of fault tolerance and eliminating a
single point of failure.
Spares
Redundant Servers
UPS
Clustering
RAID
Back Ups
Redundancy of Staff
• Mean Time to Repair (MTTR) is the time needed to repair a failed hardware module. In an
operational system, repair generally means replacing a failed hardware part. It is the time it
takes to run a repair after the occurrence of the failure.
Clustering
is a fault tolerant server technology that is similar to redundant servers, except each server takes part in
processing services that are requested. A server cluster is a group of servers that are viewed logically as
one server to users and are managed as a single logical system. Clustering provides for availability and
scalability.
Backups
Full backup
is a method of backup where all the files and folders selected for the backup will be backed up
Incremental backup
Backs up all files that have been modified since last backup
Differential backup
Backs up all files that have been modified since last full backup.
Copy backup.
Use before upgrades, or system maintenance.
Server Crash!!!!!
Configuration Management
Is a process of identifying and documenting hardware components, software and the associated
settings.”
The goal is to move beyond the original design to a hardened, operationally sound configuration.
These changes come about as we perform system hardening tasks to secure a system.
Implemented hand in hand with change control.
ESSENTIAL to Disaster Recovery
Important in running a network or business, especially when subject to regulation (ex. SOX)
• Disable unnecessary ports: Applications and services often have associated ports that are
configured to “listen”. Essentially this provides an attacker with an entry point into the network.
• Ensuring that the latest security patches and services packs are installed is another way to work
towards creating a secure environment.
• Further, keep in mind that systems are rarely configured with the most robust security settings
out of the box. Often, ease of use and performance are the first considerations. Default settings
and accounts, though useful, are well known and easy targets. Everyone knows the
administrative account is often named simply “administrator” and many people may choose a
very simple password, like “password”.
Trusted Recovery
When an operating system or application crashes, it should not put the system in any type of insecure
state
• It releases resources and returns the system to a more stable and safe state
In single user mode, administrator salvages damaged files and attempts to find the cause of the
shutdown to prevent it from happening.
Administrator then brings the system out of single user mode.
Administrator must ensure validate the contents of configuration files and ensure system files are
consistent with their expected state.
Media Management
Media must reflect the companies security policy and enforce Confidentiality, Integrity and proper
access controls (same as Confidentiality)
Backup media need to be protected from people and the environment (how?)
Auditing of media access must be done
Company may have “media librarian”
Media reuse issues?
Media destruction
Media destruction
Sanitization process of destroying media when it is no longer used.
Data remanence residual information left on a computer after being erased. (Object re-use)
Zeroization overwriting, don’t use simple all zeros or all ones. Do multiple passes.
Degaussing data is exposed to the powerful magnetic field of a degausser and neutralized, rendering
the data unrecoverable.
Physical destroy.
• Preventive Controls stop a bad event from happening. For example, requiring a user ID and
password for access to a system is a preventive control. It prevents (theoretically) unauthorized
people from accessing the system.
• Detective Controls record a bad event after it has happened. For example, logging all activities
performed on a system will allow you to review the logs to look for inappropriate activities after
the event.
• Reactive Controls (aka Corrective Controls) fall between preventive and detective controls. They
do not prevent a bad event from occurring, but they provide a systematic way to detect when
those bad events have happened and correct the situation, which is why they are sometimes
called corrective controls. For example, you might have a central antivirus system that detects
whether each user’s PC has the latest signature files installed.
• If the system or its data were lost, system functionality would be unavailable, resulting in a loss
of your ability to track outstanding receivables or post new payments.
• What are some internal controls that would mitigate this risk?
internal audit
• To provide independent assurance to the audit committee (and senior management) that
internal controls are in place at the company and are functioning effectively.
• To improve the state of internal controls at the company by promoting internal controls and by
helping the company identify control weaknesses and develop cost-effective solutions for
addressing those weaknesses.
Therefore, restricting physical access is just as critical as restricting logical access. In a data center
environment, physical access control mechanisms consist of the following:
Environmental Controls
• Computer systems require specific environmental conditions such as controlled temperature and
humidity. Data centers are designed to provide this type of controlled environment. When
auditing a data center, you should verify that there is enough HVAC capacity to service the data
center even in the most extreme conditions.
• IT Auditor need to review the Temperature and humidity logs to verify that each falls within
acceptable ranges over a period of time. In general, data center temperatures should range from
65 to 70°F (with temperatures above 85°F damaging computer equipment) and humidity levels
should be between 45 and 55 percent. However, this will vary depending on the specifications of
the equipment.
• Auditor should verify the temperature and humidity alarms to ensure data center personnel are
notified of conditions when either factor falls outside of acceptable ranges. Sensors should be
placed in all areas of the data center where electronic equipment is present. Ensure that sensors
are placed in appropriate locations either by reviewing architecture diagrams or by touring the
facility.
• Auditor should verify that the HVAC design to verify that all areas of the data centers are covered
appropriately. Determine whether the air flow within the data center has been modeled to
ensure adequate and efficient coverage.
Fire Suppression
Since data centers face a significant risk from fire, they typically have sophisticated fire suppression
systems, generally one of two types: gas-based systems and water-based systems.
The Auditor should Ensure that fire suppression systems are protecting the data center from fire. All data
centers should have a fire suppression system to help contain fires. Most systems are gas-based or
water-based and often use multistage processes, in which the first sensor (usually a smoke sensor)
activates the system and a second sensor (usually a heat sensor) causes a discharge of either water or
gas.
• Gas-Based Systems Varieties of gas-based fire suppression systems include CO2 FM-200 and CEA-
410. Gas-based systems are expensive and often impractical, but their use does not damage
electronic equipment.
• Water-Based Systems Water-based systems are less expensive and more common but can cause
damage to computer equipment. To mitigate the risk of damaging all the computer equipment in
a data center or in the extended area of a fire, fire suppression systems are designed to drop
water from sprinkler heads only at the location of the fire.
• The Auditor should ensure that a disaster recovery plan (DRP) exists and is comprehensive and
that key employees are aware of their roles in the event of a disaster. If a disaster strikes your
only data center and you don’t have a DRP, the overwhelming odds are that your organization
will suffer a large enough loss to cause bankruptcy. Disaster recovery, therefore, is a serious
matter.
• An auditor who is auditing an organization’s disaster recovery plan should also interview
personnel who participate in Disaster Recovery
• The Auditor should verify that the DRP covers all systems and operational areas. It should
include a formal schedule outlining the order in which systems should be restored and detailed
step-by-step instructions for restoring critical systems.
• The Auditor should verify that the DRP identifies a critical recovery time period during which
business processing must be resumed before suffering significant or unrecoverable loss. Validate
that the plan provides for recovery within that time period.
Data Backup and Restore: System backup is regularly performed on most systems. Often, however,
restore is tested for the first time when it is required because of a system corruption or hard-disk failure.
Sound backup and restore procedures are critical for reconstructing systems after a disruptive event.
• The Auditor should ensure that backup procedures and capacity are appropriate for respective
systems. Backup schedules typically are 1 week in duration, with full backups normally occurring
on weekends and incremental or differential backups at intervals during the week.
• Ensure that backup media can be retrieved promptly from off-site storage facilities.
• The Auditor should determine whether a Business Impact Analysis (BIA) has been performed on
the application to establish backup and recovery needs. A business impact analysis is the first
major task in a disaster recovery or business continuity planning project. A business impact
analysis helps determine which processes in an organization are the most important.
Criticality Analysis
When all of the BIA information has been collected and charted, the criticality analysis (CA) can be
performed. Critical analysis is a study of each system and process, a consideration of the impact on the
organization.
1 Risk Management
What Is Risk?
• Risk: The likelihood that a loss will occur. Losses occur when a threat exposes a vulnerability.
• Threat: Any activity that represents a possible danger.
• Vulnerability: A weakness.
• Loss: A loss results in a compromise to business functions or assets.
Tangible
Intangible
Threat Assessment
• Process of formally evaluating the degree of threat to an information system or enterprise and
describing the nature of the threat.
• Threats are the tactics, techniques, and methods used by threat actors that have the potential to
cause harm to an organization's assets.
• Vulnerability: unpatched
• Asset: web server
• The process of threat assessment begins with the initial assessment of a threat. It is then
followed by a review of its seriousness, and creation of plans to address the underlying, Finally, a
follow-up assessment and plans for mitigation. In the last phase.
Vulnerability assessment
• The vulnerability assessment analyzes how vulnerable, susceptible, and exposed a business or
system is to a particular threat.
• it is useful to know that a system is vulnerable to a threat that has a 90% chance of occurring, a
50% chance of occurring, or a 1% chance of occurring. The vulnerability and the likelihood of the
event are closely related, and the results are used as inputs to the impact assessment.
• A server that is outside the firewall is far more vulnerable to external attacks than a server that is
inside the firewall.
Impact assessment
• The impact assessment analyzes how great or small the impact of a threat occurrence will be on
the business or system.
• An earthquake has an enormous impact on a business that is in or near the epicenter of the
quake; it has a lesser impact on businesses further from the epicenter.
• Most businesses are more likely to build in state-of the art fire suppression systems rather than
construct a building with absolutely no flammable materials. The cost of building a completely
fireproof building is far higher than installing a high-quality fire system.
• Some risks are worth accepting We drive cars, we cross busy intersections on foot, we eat
unhealthy food.
Identify threats1 2
Identify vulnerabilities 3
Estimate likelihood of a
threat exploiting a
vulnerability
Risk Avoidance
• Risk avoidance is a way for businesses to reduce their level of risk by not engaging in certain
high-risk activities. While it’s impossible to eliminate all risks, a risk avoidance strategy can help
prevent some losses from happening.
• The key advantage of this technique is that it’s the most successful method of mitigating risk. You
eliminate the possibility of suffering losses by stopping the threat altogether.
Risk Avoidance
Risk Transfer
• You can transfer all or part of the risk to a third party. The two main types of transfer are
insurance and outsourcing. For example a company may choose to transfer a collection project
risk by out sourcing the project.
• The advantage here is that you can take some or most of the burden from risks and share it with
a third party.
Risk Mitigation
• Residual Risk: Risk treatments don’t necessarily reduce risks to zero. Remaining risk after
treatment is known as residual risk.
• Residual risk is the level of risk remaining after applying risk controls.
Best Practices for Managing Threats
Risk Analysis
Single Loss Expectancy (SLE)
SLE
X Annualized Rate of Occurrence (ARO)
Annual probability of a compromise
= Annualized Loss Expectancy (ALE)
Expected loss per year from this type of compromise
Categories of Risks
• There are multiple ways into which risks can be categorized.
• Final categories used will depend upon each organization / unit’s circumstances.
Financial
1. Reduction in funding
2. Failure to safeguard assets
3. Poor cash flow management
4. Lack of value for money
5. Fraud / theft
6. Poor budgeting
Operational: These risks result from failed or inappropriate policies, procedures, systems or activities
e.g.
1. Failure of an IT system
2. Poor quality of services delivered.
3. Lack of succession planning
4. Health & Safety risks
5. Staff skill levels
6. No process to track contractual commitments.
Reputational: Organization engages in activities that could threaten its good name
Risk Register
• A Risk Register is a management tool used to record relevant details relating to risks.
• Regulatory fines
• Provide the resource information from which an appropriate recovery strategy can be
determined
• The network, system, or application outage that is mission-critical would cause extreme
disruption to the business.
• Vital systems might include those that interface with mission-critical systems
Category 3: Necessary functions ---Important
Systems may include e-mail, Internet access, databases, and other business tools
MTD -MAO
Maximum tolerable period of disruption (MTPOD), also known as maximum tolerable downtime
(MTD), maximum tolerable outage (MTO), or maximum allowable outage (MAO).
Methodological steps for developing a business impact analysis.
• Top management should have identified the scope, considering the products and services of the
organization. Several key criteria could be considered to decide the products and services of the
organization that need to be protected to assure continuity; including:
a) market pressure,
• Once the scope has been established, it is strategically recommended that its boundaries are
outlined and precisely defined in terms of with what activity they initiate and with which one
they terminate.
• When the scope is delimited, the organization should identify all the activities involved in the
scope that directly contribute to the generation of its products and services. A good tool that
helps in this step is a flowchart.
Assess Financial and operational impacts.
• The next step is to assess the financial and operational impacts that would affect the
organization in the event of a disruption of the activities identified in the preceding step.
• The financial impact assessment is performed before carrying out the operational impact
assessment.
• A financial impact assessment is carried out for each activity. The question to be asked is “What
would the magnitude and severity of financial loss be if the activities were interrupted following
a disruption?” The losses are estimated daily.
The second part of the financial impact assessment ranks each impact in a severity level based on its
monetary loss value. The following scale is recommended:
Critical Assets
Assess MTPDs and prioritize critical activities:
• “The maximum tolerable period of disruption (MTPD) is the duration after which the viability of
the organization will be irrevocably threatened if product and service delivery cannot be
resumed”.
• The estimates of MTPD can be based on either financial or operational impacts. The personnel
responsible for assessing the financial and operational impacts are asked the following question:
“What is the maximum period of time that can be tolerated for this process based on the
financial and operational impact levels?” Let’s imagine that the financial loss of US $25,000 per
day becomes unacceptable when it exceeds US $50,000.
• Therefore, the MTPD is two days, since then the financial losses will exceed US $50,000, if the
disruption continues for a longer period of time. This example assumes that the operational
impacts are insignificant relative to the financial losses.
• Usually the analysis requires revising the financial and operational impacts of the disruption to
estimate the MTPD.
• Once the MTPDs are calculated, a priority for their recovery should be established. A critical
activity that has a shorter MTPD compared with another critical activity is assigned a higher
recovery priority.
• Considering today’s connectivity and the dependency on information technology, the trend of
MTPDs is to shrink in terms of duration and probably they will be close to zero in the near future.
Estimate the resources that each critical activity will require for
resumption.
• In this step, the organization needs to estimate the resources required for resumption at the
level of each critical activity. Previously, the firm should have identified the minimum level at
which each critical activity needs to be performed upon resumption.
• The sources that a business can use to determine the minimum levels of performance
acceptable are the contractual agreements and service level agreements for the key products
and services involved in the scope. The minimum resources needed for each activity can be
classified as:
• This second category can be subdivided in: ‘physical areas’, ‘human competences’, ‘equipment’
and ‘documents’.
• Survey: the method uses a set of questions which are prepared in advance and are sent to each
activity owner. The survey allows covering a vast number of respondents. However, this method
has two main constraints: (1) The accuracy of respondents becomes a problem in the event of
lack of internal consistency and reliability of the survey. (2) Survey responses may not be
returned within the time allowed for this purpose.
• Interview: in this method the BIA information is collected by personally interviewing the activity
owners. The questions can be tailored according to each particularly activity concerned.
Although this method is very accurate and minimizes the possibility of misinterpreting the
questions, it is more expensive than the survey approach and involves the additional effort of
planning, scheduling, and conducting the interview.
• Workshop: this method, which uses group dynamic techniques, allows a group of people
strategically chosen to work together to provide the BIA information needed. Because of group
dynamics, a large amount of data is generated in a short period of time with this method. This
technique also allows the activity owners to have a systematic view of the BIA process and to
clear out any misunderstanding regarding the BIA process. In addition to this, an important side
effect associated with this method is the teamwork spirit it helps to create among owners of
critical activities.
• The choice of the appropriate method for gathering BIA information seems to be influenced by
its cost, efficiency, and by the quality of the information. Sometimes the best methodological
strategy is to combine these three techniques.
• Moreover, someone at the strategic level, with appropriate seniority and authority among other
responsibilities, should be accountable for supporting the BIA process and ensuring that the BIA
methodology is implemented in the most effective and efficient manner. It is important to
understand that a BIA is developed within an organizational context.
• It is highly probable that there will be organizational obstacles that could prevent a BIA project
from accomplishing its goals.
• If external consultants are involved, the project manager should ensure that the consultants
work closely together with the critical activity owners.
Plan Interviews And Avoid The Quick Use Normal Project Consider The Use Of
Meetings In Advance Solution Management Technology
Methods Resources
3 Cost-Benefit Analysis
Performing a Cost-Benefit Analysis
1. Identify losses you expect before, or without, a countermeasure.
2. Identify the losses you expect after implementing the countermeasure.
Calculating projected benefits: Loss Before Countermeasure ─ Loss After Countermeasure = Projected
Benefits
Asset Value
Asset Value (AV) – includes the following:
Exposure – percentage loss that would occur from a given vulnerability being exploited.
Other feasibilities
• Organizational feasibility – A firewall may be good from security point of view, but it may prevent
free flow of data.
• Behavioral feasibility – user’s acceptance
• Technical feasibility
• Political feasibility
• Disaster recovery is a part of business continuity and deals with the immediate impact of an
event. Recovering from a server outage, security breach, or hurricane, all fall into this category.
• Disaster recovery involves stopping the effects of the disaster as quickly as possible and
addressing the immediate aftermath. This might include shutting down systems that have been
breached, evaluating which systems are impacted by a flood or earthquake, and determining the
best way to proceed.
• Once the effects of the disaster or event have been addressed, business continuity activities
typically begin.
Components Of Business
The components include people, process, and technology. Technology is implemented by people using
specific processes. Technology is only as good as the people who designed and implemented it, and the
processes developed to utilize it.
People in DR planning
• People are the ones who do the actual planning and implementation of a disaster plan.
• Every company is different, and therefore, every DR planning process will have to be different. A
small retail outlet’s IT planning for DR will be very different from a college, hospital, accounting
firm, or a manufacturing facility.
• According to a survey completed in 2010, human error is responsible for 40% of all data loss, as
compared to just 29% for hardware or system failures. People are responsible for designing,
implementing, and monitoring processes intended to safeguard data. However, people make
mistakes every single day.
• Another key aspect to people in DR planning is that it’s critical to remember that if a disaster hits
your company, people will have a wide variety of responses. Some people, especially those with
emergency preparedness training, will rise to the occasion and start taking effective action
through leadership roles. As was seen in many natural disaster responses over the years, people
are often without food, shelter, power, or cellular service.
Questions:
• Process in DR planning has two phases: the planning phase and the implementation phase.
• The processes your company uses to run the day-to-day business are key to the long-term
success of the business. These processes are developed (and hopefully documented) in order to
manage the recurring business tasks. Things outside the normal recurring tasks typically are
handled as exceptions until they recur often enough to create a new process, and the cycle
continues.
Technology in DR planning
Question:
• A power outage, for example, impacts all the technology in a building. As we look at DR
planning, we’ll also look at various vulnerabilities of different technologies and discuss, in broad
strokes, strategies, tools, and techniques that might be helpful to mitigate or avoid some of
these risks.
• Despite the high likelihood that a company will go out of business after a disaster, more than
90% of small businesses lack a disaster recovery plan.
• Even though many companies say they understand the need for a disaster recovery plan, very
few make it a priority.
• There may be substantial financial and legal implications for failing to plan and for failing to take
reasonable precautions. This can add to a company’s burdens after a disaster strikes.
Types of Disasters
• Threats or hazards come in three basic categories: Natural hazards, Human-caused hazards ,
Accidents and technological hazard.
• Natural hazards include weather problems in both hot and cold climates as well as geological
hazards such as earthquakes, tsunamis, volcanic eruption, and land shifting.
• Human-caused hazards can be accidental or intentional. Some intentional human-caused
hazards fall under the category of terrorism, and some are less severe and may be “simply”
criminal or unethical. • Human-caused hazards include cyber-attacks, rioting, protests, product
tampering, bombs, explosions, and terrorism, to name a few.
• Accidents and technological hazards include such issues as transportation accidents and failures,
infrastructure failures, and hazardous materials accidents, to name a few.
1. Application failure
2. Communication failure
3. Data center disaster
4. Building disaster
5. Campus disaster
6. Citywide disaster
7. Regional disaster
8. National disaster
9. Multinational disaster
Having two servers or routers in the same rack leaves your network vulnerable—the single point of
failure could be as simple as someone tripping and spilling a large cup of coffee on the rack itself.
You might conscientiously make backups, verify the backups, and store them securely but leave them
on-site. The single point of failure could be as minor as something falling on the rack holding your tape
backups or as major as a serious fire in the server room or building.
Project initiation
one of the most important elements in Disaster Recovery planning because without full organizational
support, the plan will be incomplete. As an IT professional, there may be limits to what you can do to
create an organization-wide functional DR plan. For example, If the application server is destroyed and
you have data backups, do you also have a way to access those backups? Do you have a way to allow
users to connect to the application securely? Where are users located? How will business resume? Can it
resume without that application in the near term or not? You will not likely be able to answer these
questions.
Risk assessment
The process of sitting down with key members of your company and looking at the potential risks your
company faces. These risks run from ordinary to extraordinary—from a fire or minor flood in a server
room to a catastrophic loss such as an earthquake or major hurricane and everything in between.
IT professional, can certainly lend your expertise to this process by helping define the likely impact to
technology components in various types of disasters or events, but you can’t do it alone. For example,
it’s likely that your transportation manager understands the potential business impact of bad weather
around the country, not just in your local area. Your marketing manager might best understand the
potential business risk of a contaminated product or a Web site breach.
• For example, you might determine that your Enterprise Resource Planning or your Electronic
Medical Record application cannot be down. Period. E-mail, Web servers, and reporting tools,
however, can go down, even though both events would be disruptive. Once you understand
these parameters, you can develop an IT-based strategy to meet the requirements that result
from this analysis.
Mitigation strategy development
The mitigation strategy might be quite simple for a small company. Keep critical data backed up to a
secure cloud location, keep several copies, of backups off-site, and keep several copies of key
information such as employee list, phone numbers, emergency service phone numbers, key suppliers,
and customers in a binder off-site in a secure but accessible location.
Plan development
After you’ve gone through the analysis steps, you’ll be ready to develop your plan. As with other types of
IT project plans, you’ll want to outline the methodology you’re going to follow so that you improve your
chance of success and reduce your chances for errors and gaps. This includes standard processes such as
developing business and technical requirements, defining scope, budget, timeline, quality metrics, and
so forth.
Plan maintenance
Finally, plan maintenance is the last step in the DR planning process, and in many companies, it is “last
and least.” Without a plan to maintain your plan, it will become just another project document on a file
server or sitting in a binder on a shelf. If it doesn’t get maintained, updated, and revalidated from time to
time, you’ll find that the plan may be rendered useless if a disaster does strike. Maintenance doesn’t
have to be an enormous task, but it is one that must be done.
• The recovery point objective (RPO) describes the age of files that must be recovered
from backup storage for normal operations to resume.
• Network disaster recovery plan - Developing a plan for recovering a network gets more
complicated as the complexity of the network increases. It is important to detail the step-by-step
recovery procedure, test it properly and keep it updated. Data in this plan will be specific to the
network, such as in its performance and networking staff.
• Cloud disaster recovery plan - Cloud disaster recovery (cloud DR) can range from a file backup in
the cloud to a complete replication. Cloud DR can be space, time and cost-efficient, but
maintaining the disaster recovery plan requires proper management. The manager must know
the location of physical and virtual servers. The plan must address security, which is a common
issue in the cloud that can be alleviated through testing.
• Data center disaster recovery plan - This type of plan focuses exclusively on the data center
facility and infrastructure. An operational risk assessment is a key element in data center DRPs. It
analyzes key components such as building location, power systems and protection, security, and
office space. The plan must address a broad range of possible scenarios.
Example
The DR plan for a modern Company, running 200 physical servers and virtual servers in an on-premises
data center. The company relies on its production environment being available 24/7 to customers, which
is why their DR strategy needs to function perfectly with minimal downtime. This company uses Amazon
Web Service (AWS) as their target DR infrastructure in order to cut costs and improve their RTO and
RPO.