CPM - User's Guide V2.1.0
CPM - User's Guide V2.1.0
CPM - User's Guide V2.1.0
V2.1.0
1
Contents
1 Introduction to CPM .......................................................................................................... 8
1.1 What is CPM? ............................................................................................................ 8
1.2 What you can do with CPM ....................................................................................... 8
1.3 Purchasing CPM on the AWS Marketplace ............................................................... 8
1.3.1 Purchasing ......................................................................................................... 8
1.3.2 Moving between CPM Editions ......................................................................... 9
1.3.3 Downgrading ..................................................................................................... 9
1.4 CPM architecture....................................................................................................... 9
1.5 The CPM Server Instance......................................................................................... 10
1.5.1 Root Volume .................................................................................................... 11
1.5.2 Backing up the CPM Server ............................................................................. 11
1.5.3 CPM Server with HTTP Proxy ........................................................................... 11
1.5.4 Multiple CPM Servers ...................................................................................... 11
1.5.5 Upgrading the CPM Server Instance................................................................ 11
1.6 CPM Technology ...................................................................................................... 12
1.7 Browser support ...................................................................................................... 12
2 Configuring CPM .............................................................................................................. 13
2.1 General .................................................................................................................... 13
2.2 Instance ID and License Agreement ........................................................................ 13
2.3 Root user ................................................................................................................. 14
2.4 Defining a time zone and data volume type ........................................................... 14
2.5 Fourth stage of configuration .................................................................................. 15
2.5.1 New data volume ............................................................................................ 16
2.5.2 Existing data volume........................................................................................ 16
2.5.3 Web server settings ......................................................................................... 17
2.5.4 Anonymous Usage Reports ............................................................................. 17
2.6 Registering and finalizing the configuration............................................................ 17
2.7 Configuration troubleshooting ................................................................................ 19
2.8 Modifying the configuration of a CPM Server ......................................................... 19
2.9 Configuring CPM in silent mode .............................................................................. 20
2
3 Start Using CPM ............................................................................................................... 22
3.1 Main screen ............................................................................................................. 22
3.2 Associating an AWS account ................................................................................... 24
3.2.1 Account Type ................................................................................................... 24
3.2.2 Authentication ................................................................................................. 24
4 Defining Backup Policies .................................................................................................. 26
4.1 Schedules ................................................................................................................. 26
4.1.1 Defining ........................................................................................................... 26
4.1.2 Scheduling and Time Zones ............................................................................. 26
4.1.3 Disabled times ................................................................................................. 27
............................................................................................................................................. 27
4.2 Policies ..................................................................................................................... 28
4.2.1 Creating a new policy ...................................................................................... 28
4.2.2 Adding backup targets ..................................................................................... 29
4.2.3 Instance Configuration .................................................................................... 30
4.2.4 AMI Creation.................................................................................................... 31
4.2.5 More Options .................................................................................................. 31
5 Introduction to Consistent Backup .................................................................................. 34
5.1 Crash-consistent backup ......................................................................................... 34
5.2 Application-consistent backup ................................................................................ 34
5.3 CPM and a “Point in Time” ...................................................................................... 34
5.4 Summary or “What Type of Backup to Choose” ..................................................... 35
5.4.1 Crash-consistent .............................................................................................. 35
5.4.2 Application-consistent ..................................................................................... 35
6 Windows Instances Backup ............................................................................................. 36
6.1 Introduction ............................................................................................................. 36
6.2 Configuring CPM Thin Backup Agent ....................................................................... 36
6.2.1 Associating an agent with a policy .................................................................. 36
6.2.2 Support for old 1.8.0 agents ............................................................................ 37
6.2.3 Installing the agent .......................................................................................... 38
6.2.4 Changing Agent Configuration ........................................................................ 39
6.2.5 Using the agent with an http proxy ................................................................. 40
6.3 Using VSS ................................................................................................................. 40
3
6.3.1 Introduction ..................................................................................................... 40
6.3.2 CPM’s use of VSS ............................................................................................. 41
6.3.3 Configuring VSS ............................................................................................... 42
6.3.4 Excluding and verifying VSS writers................................................................. 42
6.3.5 Troubleshooting VSS issues ............................................................................. 42
6.3.6 VSS Recovery ................................................................................................... 43
............................................................................................................................................. 44
6.4 Using backup scripts on Windows ........................................................................... 44
6.4.1 “before” script ................................................................................................. 45
6.4.2 “after” script .................................................................................................... 45
6.4.3 “complete” script............................................................................................. 45
6.4.4 Capturing the output of backup scripts........................................................... 45
7 Linux/Unix Instances Backup ........................................................................................... 46
7.1 Connecting to the CPM Server ................................................................................ 46
7.2 Backup scripts .......................................................................................................... 46
7.2.1 General ............................................................................................................ 46
7.2.2 “before” script ................................................................................................. 47
7.2.3 “after” script .................................................................................................... 47
7.2.4 “complete” script............................................................................................. 47
7.2.5 Capturing the output of backup scripts ........................................................... 47
7.2.6 Troubleshooting and debugging backup scripts.............................................. 47
7.2.7 Example backup scripts ................................................................................... 48
7.2.8 Scripts and SSH access in a multi-user environment ....................................... 49
8 Additional Backup Topics ................................................................................................ 51
8.1 CPM in a VPC Environment ..................................................................................... 51
8.2 Backup when an Instance is stopped ...................................................................... 51
8.3 Backing up independent volumes ........................................................................... 52
8.4 The Freezer .............................................................................................................. 52
9 Performing Recovery ....................................................................................................... 53
9.1 Recovery AWS credentials ....................................................................................... 53
9.2 Instance recovery .................................................................................................... 54
9.2.1 Basic options.................................................................................................... 54
9.2.2 Advanced options ............................................................................................ 56
4
9.2.3 AMI Assistant ................................................................................................... 58
9.3 Volume recovery ..................................................................................................... 60
9.4 RDS Database Recovery ........................................................................................... 62
9.5 Aurora Cluster Recovery .......................................................................................... 63
9.6 Redshift Cluster Recovery ....................................................................................... 65
10 Disaster Recovery (DR) ................................................................................................ 67
10.1 Introduction ............................................................................................................. 67
10.2 Configuring DR ......................................................................................................... 67
10.3 How it actually Works? ............................................................................................ 68
10.4 DR and mixed-region policies .................................................................................. 69
10.5 Planning your DR Solution ....................................................................................... 69
10.5.1 Considerations ................................................................................................. 69
10.5.2 Timing your DR processes ............................................................................... 70
10.5.3 Performing DR on the CPM Server (The cpmdata Policy) ............................... 70
10.6 DR Recovery............................................................................................................. 71
10.6.1 DR Instance Recovery ...................................................................................... 71
10.6.2 DR of Encrypted Volumes, AMIs and RDS Instances ....................................... 72
10.6.3 A Complete Disaster Recovery Scenario ......................................................... 72
10.7 DR Monitoring and Troubleshooting ....................................................................... 73
11 Cross-account DR, Backup and Recovery .................................................................... 76
11.1 Introduction ............................................................................................................. 76
11.2 Snapshot Vaulting.................................................................................................... 76
11.3 Configuring cross-account backup .......................................................................... 76
11.4 Cross-account DR and clean-up ............................................................................... 77
11.5 Cross-account with cross-region ............................................................................. 78
11.6 Cross-account recovery ........................................................................................... 78
11.7 Cross-account backup and cost ............................................................................... 78
12 File-level Recovery ....................................................................................................... 79
13 Tag-based Backup Management ................................................................................. 80
13.1 Introduction ............................................................................................................. 80
13.2 The “cpm backup” tag ............................................................................................. 81
13.2.1 Adding to a policy or policies........................................................................... 81
13.2.2 Creating a policy from a template ................................................................... 81
5
13.2.3 Setting backup options for EC2 Instances ....................................................... 82
13.2.4 Tagging a resource to be removed from all policies ....................................... 82
13.3 Tag scanning ............................................................................................................ 82
13.4 Pitfalls and troubleshooting .................................................................................... 83
13.4.1 Pitfalls .............................................................................................................. 83
13.4.2 Troubleshooting .............................................................................................. 84
14 Security Concerns and Best Practices.......................................................................... 85
14.1 Introduction ............................................................................................................. 85
14.2 CPM Server .............................................................................................................. 85
14.3 Best security practices for CPM ............................................................................... 86
14.3.1 Credentials rotation......................................................................................... 86
14.3.2 Passwords ........................................................................................................ 86
14.3.3 Security Groups ............................................................................................... 86
14.4 Using IAM ................................................................................................................ 86
14.4.1 CPM Server Configuration Process .................................................................. 87
14.4.2 CPM Server IAM Settings ................................................................................. 87
14.4.3 CPM Agent IAM Role ....................................................................................... 91
14.5 Thin Backup Agent ................................................................................................... 92
15 Alerts, Notifications and Reporting ............................................................................. 93
15.1 Introduction ............................................................................................................. 93
15.2 Alerts ....................................................................................................................... 93
15.3 “Pull” Alerts ............................................................................................................. 93
15.4 Using SNS ................................................................................................................. 95
15.4.1 Introduction ..................................................................................................... 95
15.4.2 Configuring SNS ............................................................................................... 95
15.5 “Push” Alerts ........................................................................................................... 96
15.6 Daily Summary......................................................................................................... 96
15.7 Raw Reporting Data................................................................................................. 97
15.7.1 Backup view csv report.................................................................................... 97
15.7.2 Snapshot view csv report ................................................................................ 98
15.7.3 Keeping Records after Deletion ....................................................................... 98
15.8 Usage Reports.......................................................................................................... 99
16 CPM User Management ............................................................................................ 100
6
16.1 Independent Users ................................................................................................ 100
16.2 Managed Users ...................................................................................................... 100
16.3 User definitions ..................................................................................................... 101
16.4 Delegates ............................................................................................................... 101
16.4.1 Delegate permissions .................................................................................... 102
16.5 Usage Reports........................................................................................................ 103
16.6 Audit Reports ......................................................................................................... 103
7
1 Introduction to CPM
1.1 What is CPM?
CPM – Cloud Protection Manager – is an enterprise-class backup, recovery & disaster
recovery solution for the EC2 compute cloud. It is a software product that uses AWS native
technologies (e.g. EBS snapshots).
CPM is marketed as a service. When you register to use the service, you get permission to
launch a virtual machine image (AMI) of an EC2 instance. After you launch the instance, and
after a short configuration process, you can start backing up your data using CPM.
1.3.1 Purchasing
CPM comes in several different editions which represent different usage tiers of the
solution. The price for using the software (CPM) is a fixed monthly price which varies
between the different CPM editions.
To see the different editions with pricing and details, please go to our pricing & purchase
page on N2WS’s web site. Once you subscribe to one of CPM’s editions, you can launch a
CPM Server instance and start working. Only one CPM Server per subscription will actually
perform backup. If you run additional instances, they will only perform recovery operations
(see 1.5.3).
8
1.3.2 Moving between CPM Editions
If you are already subscribed and using one CPM edition and want to move to another that
better fits your needs, you need to perform the following steps:
Terminate your existing CPM instance. It is recommended to do so while no backup
is running.
Unsubscribe from your current CPM edition. It is important, since you will continue
to be billed for that edition if you don’t cancel your subscription. You will only be
able to unsubscribe if you don’t have any running instances of your old edition. You
manage your subscriptions on the AWS Marketplace site in the “Your Software”
page.
It is recommended to create a snapshot of your CPM Data Volume before
proceeding, just to be on the safe side. You can delete that snapshot once your new
CPM Server is up and running. The data volume is typically named “CPM Cloud
Protection Manager Data,” so it’s easy to find.
Subscribe to the new CPM Edition and launch an instance. You need to launch the
instance in the same availability zone the old one was. If you want to launch your
new CPM Server in a different zone or region, you will need to create a snapshot of
the data volume and either create the volume in another zone, or copy the snapshot
to another region and create the volume there.
During configuration choose “Use Existing Data Volume” and select the existing data
volume.
Once configuration completes, you’ll continue to work with your existing
configuration with the new CPM edition.
1.3.3 Downgrading
If you moved to a lower CPM edition, you may find yourself in a situation where you exceed
the resources your new edition allows. For example, you used CPM Advanced Edition to
manage the backup of 30 EC2 instances, and you moved to CPM Standard Edition, which
allows only 25 instances. CPM will detect such a situation as a “compliance issue,” will cease
to perform backup, display a message and issue an alert specifying the problem.
To fix the problem, you can move back to a CPM edition that fits your current configuration,
or remove the excessive resources, e.g. remove users, AWS accounts or instances from
policies. Once the resources are back in line with the current edition, CPM will automatically
resume normal operations.
9
performs the backup operations. These components reside in the CPM server and should
not concern you as a user.
The architecture of the CPM solution can be seen in Figure 1-1. CPM Server is an EC2
instance inside the cloud, but it also connects to the AWS infrastructure to manage the
backup of other instances. CPM doesn’t need to communicate or interfere in any way with
the operation of other instances. The only case where CPM server communicates directly
with, and has software installed on, an instance, is when backing up Windows Servers. If you
wish to have VSS or scripts support for application quiescence, you will need to install CPM
Thin Backup Agent. The agent will get its configuration from the CPM server, using the
HTTPS protocol.
Figure 1-1
10
1.5.1 Root Volume
Although you have access to the CPM Server instance by SSH, we expect you to consider the
CPM Server to be a virtual appliance. We expect you not to change the OS and not to start
running additional products or services on it. If you do so and it affects CPM and causes it to
malfunction, we will not be able to provide you with support. Our first requirement will be
for you to launch a clean CPM server. Please remember that all your changes in the OS will
be wiped out as soon as you upgrade to a new release of CPM, which will come in the form
of a new image (AMI). That said, if you need to install software to use with backup scripts
(e.g. Oracle client) or you need to install a Linux OS security update, you can. We
recommend you consult N2W Software support before doing so.
11
web site. To determine the availability zone of the new instance or to launch it in a
VPC subnet, you'll need to launch the instance using the EC2 console rather than
using the 1-click option.
Terminate the old instance, preferably while no backup is being performed. Please
wait until it is in "terminated" state.
Recommended: go to the volumes view in AWS Management Console and create a
snapshot of the CPM data volume. The volume is easy to find as it's typically named
"CPM Cloud Protection Manager Data." The snapshot is just for the case that there is
a problem with the upgrade process and can be deleted afterwards.
When the new instance is in "running" state, connect to it with a browser using
https.
Approve exception to the SSL certificate
Step 3: Choose "Use Existing Data Volume," and paste in your AWS credentials.
Select your old data volume from the list of volumes to complete the configuration
process. Operations will resume automatically.
If you are using backup scripts that utilize SSH, you may need to login to the CPM
Server once and run the scripts manually, so the use of the private key will be
approved.
12
2 Configuring CPM
2.1 General
As with most other operations, you use a web interface to configure a new CPM Server.
When launching a new CPM Server, the server will automatically create a new self-signed
SSL certificate. This certificate will be used for the web application at the configuration
stage. If no other SSL certificate is uploaded to the CPM Server, the same certificate will be
used also for the main CPM application. Every CPM Server will get its own certificate. This
means that no two CPM servers will ever have the same certificate, and therefore it is
perfectly safe to use. Since it is not signed by an external authority, you will need to approve
an exception for your browser to start using CPM.
When you configure a CPM server you enter the following settings:
Credentials for the CPM root user
The time zone for the server
Whether to create a new CPM data volume, or attach an existing one (from a
previous CPM server)
Proxy settings: configure proxy settings in case CPM server needs to connect to the
Internet via a proxy. These settings will also apply to the main application.
The port the web server will listen on. The default is 443.
Whether to upload an SSL certificate and private key for the CPM server to use. If
you provide a certificate you need to provide a key as well. The private key must not
be protected by a passphrase, or the application will not work.
Register the AWS account with N2W Software – this is mandatory only for free trials
but recommended to all users. It will allow us to provide quicker and better
support. This information will not be shared with anyone.
Furthermore, for the configuration process to work, as well as for normal CPM operations,
CPM needs to have outbound connectivity to the Internet, for the HTTPS protocol.
Assuming the CPM server was launched in a VPC, it either needs to have a public IP, an
Elastic IP attached to it, or it needs connectivity via a NAT setup, Internet Gateway or HTTP
proxy. If such an issue happens, please check that the instance has Internet connectivity,
that the DNS is configured properly and that the security group allow outbound connections
for port 443 (HTTPS).
We will now go through the stages one by one.
13
2.3 Root user
The root user is the user that controls all the operations of the CPM server. Root user
credentials are used to log in the system and to use it. As you can see in Figure 2-1, you need
to define the user name, email, and password. The email may be used when defining SNS-
based alerts and notification. You can then choose to automatically add this email to the SNS
topic recipients. Also, If you are using the Free Trial & BYOL Edition, you will have the
“license” field. Choose “Start free trial” for a free trial, and if you purchased a license
directory from N2W Software, you’ll get instructions.
Figure 2-1
14
AWS credentials are needed to create a new EBS data volume if needed, and to attach the
volume to the CPM Server instance. If you are using IAM credentials that have limited
permissions, these credentials need to have permissions to view EBS volumes in your
account, to create new EBS volumes, and to attach volumes to instances (see 14.4). These
credentials are kept for file-level recovery later on, and they are used only for the these
purposes. If you assigned an IAM Role to the CPM Server instance, and this role includes the
needed permissions, you can “Use Instance’s IAM Role” and then you will not be required to
enter credentials.
Regarding proxy settings: if the CPM server needs an HTTP proxy to connect to the Internet,
please chose to enable HTTP proxy and a few fields will be added to the screen, allowing you
to define proxy address, port, user and password. These settings will be kept as the default
for the main application.
Figure 2-2
15
Figure 2-3
First thing you need, is to finish configuring your data volume. If you chose to create a new
volume in the previous step, you will see the screen as in Figure 2-3. If you chose to use an
existing volume, instead of the capacity field, you will see a drop-down select box, from
which to choose the volume.
16
availability zone your volume was created in the first place. Another option is to create a
snapshot from the original volume, and then create a volume from it at any availability zone
you require.
Although CPM data volumes typically have a special name, it is not a requirement. If you, for
some reason, choose for an existing data volume a volume that was not created by a CPM
server, the application will simply not work.
17
Figure 2-4
Click on “Configure System” to finalize the configuration. The configuration will take
somewhere between 30 seconds and 2-3 minutes for new volumes, and usually less for
attaching existing volumes. After the configuration is complete, you will be redirected to a
screen that will indicate it:
Figure 2-5
After you see the success page, you know that CPM was configured correctly. Click on the
“here” link and wait a few seconds. You should be redirected to the login screen of the CPM
application. If, for some reason, you are not redirected, try to refresh the browser manually.
If that doesn’t work, just reboot the CPM server via AWS Management Console (or another
management tool), and it will come back up configured and running.
18
2.7 Configuration troubleshooting
Most inputs you have in the different configuration steps are checked when you click “next,”
and you will usually get a clear and straightforward message indicating what went wrong.
These errors are easy to correct.
A less obvious problem you may encounter is if you reach the third step and get the existing
instance select box with only one value in it: “No Instances found.” This can arise from two
reasons. The first is obvious - if you chose to use an existing volume, and in the CPM Server’s
availability zone there are no available EBS volumes, you will get this response. In this case,
you probably did not have your existing data volume in the same availability zone. To correct
this, you can either terminate and relaunch the CPM server instance in the correct zone and
start over the configuration process, or you can take a snapshot of the data volume, and
create a volume from it in the zone the server is in. The other reason that may cause this
issue is a problem with the credentials you typed in. In this case the “No Instances found”
message may appear, even if you chose to create a new data volume. This usually happens
if you are using invalid credentials, or if you mistyped them. To fix this, please go back and
enter the credentials correctly.
In very rare cases, you may encounter a more difficult error. It is discovered after you
already approved configuring the server. In this case you will usually get a clear message
regarding the nature of the problem. This type of problems can occur for several reasons - if
there is a connectivity problem between the instance and the Internet (low probability); if
the AWS credentials you entered are correct, but lack the permissions to do what they need
(in case they were created using IAM); and also, if you chose a bad port (e.g. the SSH port
which is already in use), or if you specified an invalid SSL certificate and/or private key files.
In case you can’t figure the problem, you can try again. If it persists, please contact N2W
Software support (support@n2ws.com).
In any case, if the error occurred after approving the last configuration stage, it is
recommended to terminate the CPM server instance, delete the new data volume (if one
was already created), and try again with a fresh instance.
19
All you need to do is to configure the server as you wish, and connect to the old data
volume.
As for the CPM root user, you may change the email or the password. The username of the
root user can’t be changed. If, during the configuration process, you type a different
username than the original, CPM will assume you forgot the root username. In that case the
username will not change, and a file, “/tmp/username_reminder” will be created on the
CPM server. It will contain the username. You can connect to CPM server using SSH to view
this file (see 7.1).
CPMCONFIG
[SERVER]
user=<username for the cpm user>
password=<password>
volume_option=<new or existing>
volume_size=<in GB, used only for the new volume option>
volume_id=<Volume ID for the data volume, used only in the existing volume option>
snapshot_id=<snapshot ID to create the data volume from, used only with the existing
volume option, and only if volume_id is not present>
Additionally, if you need the CPM server to connect to the internet via an HTTP proxy, you
need to add a proxy section:
[PROXY]
proxy_server=<address of the proxy server>
proxy_port=<proxy port>
proxy_user=<user to authenticate, if needed>
proxy_password=<password to authenticate, if needed>
20
The snapshot option is something that does not exist in the GUI, and can be used for
automation of a DR server recovery. Additionally, if you state an existing volume ID from
another Availability Zone, CPM will attempt to create a snapshot of that volume and migrate
it to the AZ of the new CPM server. This option can be used in a high availability setup.
Please note that you are not required to click to approve the license terms when using the
silent configuration option, but since you already approved the terms when subscribing to
the product on AWS Marketplace, it does not matter.
21
3 Start Using CPM
3.1 Main screen
As soon as you log in to CPM with the root user credentials you created during
configuration, you are redirected to the main screen. CPM is a very simple application to
work with. The user interface is simple, intuitive, and user-friendly. Most operations are
only one mouse-click away from the main screen.
Figure 3-1
As you can see in Figure 3-1 the main screen is divided by five tabs:
Backup Monitor – Here you will see all your backups. For each backup you can see
the start and end times, policy, status and DR status. All operations regarding a
backup are present in this tab – viewing the list of snapshots, opening the backup
log, recovering from a backup, and moving it to the freezer (see 8.4).
Sometimes you have many backups and are looking for a specific one. You can filter
by policy and status, sort by all relevant columns, or browse between pages. You can
also choose how many records to view in one page.
Policies – Backup Policies defined in the system. From this tab you can create,
modify, configure and delete backup policies.
Schedules – Backup Schedules can be created, configured and deleted in this tab.
You attach a schedule to a policy in the policy definition screen.
Agents – Thin Backup Agents that are connected to this CPM server can be viewed
here. Currently, Thin Backup Agents are needed only when application consistency is
needed for Windows Servers. In any other case, the backup is done agent-less.
22
Freezer – The freezer is a place where you can keep backups indefinitely. When you
identify a backup that is worth keeping (e.g. a successful backup of a clean system
right after an upgrade), you can move it to the freezer. Elements in the freezer will
not be deleted by the automatic cleanup process.
Recovery Monitor: This tab will contain records for all recovery operations. Each
recovery record will contain a time stamp of the recovery operation, the backup is
wads recovered from and additional information. Recovery records are
automatically deleted as the backups are.
In addition to the tabs, you have a logout link at the top right corner of the screen, and a top
panel of buttons:
Main – Brings you back to the main screen from wherever you are. It can also be
used to reload the whole page.
Manage AWS Accounts – Depending on the edition of CPM you subscribed to, you
can define one or more AWS accounts to work with. These accounts contain the
objects (instances, EBS volumes, RDS databases, Aurora clusters and Redshift
clusters) you may wish to back up. Each backup policy is associated with a single
AWS account.
Change Password – Changes the password for the logged-in user, whether it’s the
root user or a different one.
Notifications - Define notifications and alerts.
Manage Users – Depending on the CPM edition you are registered to, you may have
the ability to create additional users. By clicking this button you may create new
users, or manage existing ones, i.e. delete them, reset their passwords or download
a usage report. Only the root user may create and manage other users.
General Setting – Contains some settings you can control, including tag scan
settings, when to run cleanup, and how long to save deleted records and user audit
logs.
At the bottom of the screen you can find a few useful links:
To view the license agreement
To download the Thin Backup Agent.
To enable or disable sending anonymous usage reports.
To download the CPM logs as a tar ball (in case you need to send to our support
team).
To enter a new activation key. If any special permission is required in addition to the
default permissions of your CPM edition, N2W Software can issue you an activation
key.
To download a backup view or snapshot view raw report in CSV format
To download usage reports
To download user audit reports
To register the CPM instance account with N2W software: if you haven’t done this
already during configuration, it is recommended to do so, as it will allow us to
provide faster and better support.
23
To go to the “cpm patches” page to install patches
To send configurations to agents
3.2.2 Authentication
CPM Supports three methods of authentication:
IAM User - Authentication using IAM credentials, access and secret keys, these
credentials are created for IAM users.
CPM Instance IAM Role – If an IAM role was assigned to the CPM server at launch
time, you can use that IAM role to manage backups in the same AWS account the
CPM server is in. Only the root/admin CPM user is allowed to use the IAM role.
Other users will not have this option available.
Assume Role – This type of authentication requires another AWS account already
configured in CPM. If you want to use one account to access another, you can
define a cross-account role in the target account and allow access from the first
one. The operation of using one account to take a role and accessing another
account is called “assume role”.
To allow account authentication using “assume role” in CPM, you choose this
option, then you need to choose the account that will assume the role in the field
“Assuming Account.” The target account is identified by the 12-digit account
number (with no hyphens) you type into the “Account Number” field. In addition,
you need to type the role name under the field “Role to Assume.” This field needs
to contain the role name, not the full ARN of the role. CPM can’t determine what
the role name is, since it is defined at the target account, which CPM has no access
to yet.
The ”External ID” field is optional. If the cross-account role was created with the
“3rd party” option, then this field is required as an additional identifier.
24
Figure 3-2
As the root user you are also able to add accounts for other managed users. If you are the
root user and have managed users defined, an additional select box will be added, allowing
you to select the user.
“Scan Resources” allows you to determine whether the current account will be included in
scan tags performed by the system. Once “Scan |Resources” is set to “Enabled” you may
choose in which region to scan resources. By default CPM will scan all the region, but you
can disable any region which is not relevant to your deployment.
If this is a DR account, you choose whether this account is allowed to delete snapshots. If it
is not, CPM will not delete snapshots of this account when performing cleanup of outdated
backups. It will tag them instead. Not allowing CPM to delete snapshots of this account
implies that the IAM credentials given do not have the permission to delete snapshots.
You can add as many AWS accounts as your CPM edition permits.
25
4 Defining Backup Policies
The backbone of the CPM solution is the backup policy. A backup policy defines everything
about a logical group of backed-up objects. A policy defines:
4.1 Schedules
Schedules are the objects defining when to perform backup. Schedules are defined
separately from policies. A schedule can be associated with several policies. Multiple
schedules can be associated with the same policy.
4.1.1 Defining
To define a schedule click on the “Schedules” tab in the main screen, then click on “New
Schedule.” You need to enter a name and an optional description. The main field, defining
the behavior of the schedule, is “Repeats Every.” It defines the frequency of the backups this
schedule will launch. The possible units are months, weeks, days, hours, and minutes.
The other important field is “Start Time.” Start time determines when the schedule will start.
If you want a daily backup to run at 10:00 AM, you set “Repeats Every” to one day, and the
start time to 10:00 AM (the date can also be set, the default is the current day). If you want
an hourly backup to run at 17 minutes after the hour, you set “Repeats Every” to one hour,
and the start time to XX:17. All backup times are derived from the start time. Please note:
for weekly or monthly backups, the start time will also determine the day of week of the
backup schedule and not the days of week check-boxes.
“End Time” is when the schedule will expire. By default it’s never, because typically
schedules are not temporary. Furthermore, you can define which week days the schedule
will be active on.
For the root/admin user, if you have created additional managed users, you will be able to
select to which user the schedule belongs.
26
database times are saved in UTC time zone (Greenwich). So, if, at a later stage, you start a
new CPM server instance, configure it to a different time zone, and use the same CPM data
volume as before, it will still perform backup at the same times as before.
Figure 4-1
You can define a disabled time when the finish time is earlier than the start time. The
meaning of disabling the schedule Mondays between 17:00 and 8:00 is that it will be
disabled every Monday at 17:00 until the next day at 8:00. The meaning of disabling
the schedule every day between 18:00 and 6:00 will be that every day the schedule
will be disabled until 6:00 and after 18:00.
Info 4-1
4.2 Policies
Policies are the main objects defining backups. With a policy you define what to backup,
how to back it up, and by associating schedules, when to perform backup.
As a user, you need to balance the amount of time you want to be able to go back
and recover from (RPO – Recovery Point Objective), and the cost of keeping more
snapshots. Sometimes you will want to trade off the frequency of backups, and
the number of generations. Consider what best suits your needs.
Info 4-2
Figure 4-2
The field “Auto Target Removal” will specify whether to automatically remove resources
that no longer exist. So, if you enable this removal, if an instance is terminates, or an EBS
volume deleted, the next backup will detect that and remove it from the policy. Choose “yes
and alert” if you want the backup log to include a warning about such a removal.
28
For the root/admin user, if you have created additional managed users, you will be able to
select to which user the policy belongs.
After clicking “Apply,” you can see the new policy on the list of policies in the “Policies” tab.
Instances – This is the most common type. You can choose as many instances as you
wish for a policy (limited by the number of instances you’re licensed to use). For
each instance, you can define whether you back up all its attached volumes, some,
or none. Also, for each instance you can decide whether to take snapshots (default
for Linux), take snapshots with one initial AMI (default for Windows) or just create
AMIs.
EBS Volumes – If you wish to back up volumes, not depending on the instance they
are attached to, you can choose volumes directly. This can be useful for backing up
volumes that may be detached part of the time, or move around between instances
(e.g. cluster volumes).
RDS Databases – You can use CPM to backup RDS databases using snapshots. There
are advantages with using the automatic backup AWS offers. However, if you need
to use snapshots to back up RDS, or if you need to back up databases in sync with
instances, this option may be useful.
Aurora Clusters – Even though Aurora is part of the RDS service, Aurora is defined in
clusters rather than in instances, and so CPM treats them a bit differently. Use this
type of backup target to back-up your Aurora clusters.
Redshift Clusters – You can use CPM to backup Redshift clusters. Similar to RDS,
there is an automatic backup function available, but using snapshots can give an
extra layer of protection.`
From the Backup Targets screen you can click on “Add Instances,” “Add Volumes,” “Add RDS
Databases,” “Add Aurora Clusters,” or “Add Redshift Clusters” to add backup targets to the
policy. When adding backup targets, you see all the backup targets of the requested type
that reside in the current region, except the ones already in the policy. You can select
another region to see the objects in it. In case you have many objects, you have the ability to
filter, sort, or browse between pages. Furthermore, for each backup target, you can see the
number of policies it’s already in (“Policies” column). If the number is larger than zero, you
can click on it to see which policies it’s in. You can see the selection screen for instances in
Figure 4-3
29
Figure 4-3
When you want to add an instance (or another type of backup target) to the policy, you
simply check the “Add” checkbox of that instance (you can check more than one), and then
click “Add Selected.” This operation will not close the popup window. It will remove the
selected objects from the list and add them to the policy’s backup targets. You can repeat
this operation as many times as you like, and click “Close” when you are done.
Figure 4-4
In this screen you can decide whether you need to back-up all the volumes attached to this
instance, or include or exclude some of them. By default CPM will back-up all the attached
storage of the instance, including volumes that are added over time.
30
There are several other options available from this screen, mainly “Backup Options,” which
lets you decide whether you want to take only snapshots (the default for Linux-based
instances), take an initial AMI and then snapshots (the default for Windows-based
instances), or just schedule AMI creation.
31
It can also be used by a script to log the operations it is performing and write
useful information. This output is captured, saved in the CPM database, and
can be viewed from the “Recovery Panel” screen. To turn this option on you
need to choose “Collect.” The default option is “Ignore,” which means the
output is not collected.
Note that the output of a script should typically be a few lines of output. It’s okay if it is larger
than that. However, if it gets really big (MBs) it can affect the performance of CPM. If it gets
even larger, it can even cause crashes in CPM processes. Please make sure the scripts don’t
output large amounts of data to stderr. If there is any risk of this, make sure its output is
redirected elsewhere.
Warning 4-2
32
makes sense to substantially extend the time until the next retry, so there is
a better chance the system will be more responsive.
Figure 4-5
33
5 Introduction to Consistent Backup
This guide by no means claims to be a comprehensive guide on how to create consistent
backups. It simply explains a few key concepts to help you use CPM correctly.
34
In the case of taking snapshots of multiple volumes (probably the most common case), we
would like all the volumes to have the exact same point in time. Unfortunately, AWS does
not currently support such an option. Therefore, the best CPM can offer is taking the
snapshots of multiple volumes in close succession (typically only split seconds between
them). In most cases it will not make a difference, but in cases where exact point in time
across volumes/disks is needed, only backup scripts or VSS can achieve this goal. If the
backup script of a backup policy flushes and locks all volumes in a synchronized manner,
snapshots of this policy will reflect an exact point in time. Using VSS achieves this goal, since
VSS by definition performs shadow copies of multiple volumes in a synchronized manner. By
freezing applications that use multiple volumes - like a database which has a volume for data
and a separate volume for transaction logs - you can also achieve the goal of backing up
multiple volumes at a single point in time.
5.4.1 Crash-consistent
Pros:
Does not require writing any scripts
Does not require installing agents in Windows servers
Does not affect the operation and performance of your instances and applications
Fastest
Cons:
5.4.2 Application-consistent
Pros:
Prepares the application for backup and therefore achieves a consistent state
May ensure one exact point in time across multiple volumes/disks
May automatically truncate database transaction logs
Cons:
Requires writing and maintaining backup scripts
Requires installing a CPM Thin Backup Agent for Windows Servers
May slightly affect the performance of your application, especially for the
freezing/flushing phase
Backup takes more resources and time
35
6 Windows Instances Backup
6.1 Introduction
From the point of view of the AWS infrastructure, there is not much difference between
backing up Linux/Unix instances or Windows instances. You can still run snapshots on EBS
volumes. However, there is one substantial difference regarding recovering instances.
In Unix/Linux instances we can back up system volumes (root devices), and later launch
instances based on the snapshot of the system volume. We can simply create an image
(AMI) based on the system volume snapshot and launch instances.
This option is currently not available for Windows servers. Although you can take snapshots
of the system volume of a Windows Server, there is no way to create a launchable image
(AMI) from that snapshot. Because of this limitation, CPM needs an AMI to start a recovery
of a Windows instance, but it can still make sure all the volumes, including the root device
(OS volume) will be from the point-in-time of the recovered backup. By default, CPM will
create an initial AMI when you start backing up a Windows instance. That AMI will be used
as the default when recovering this instance.
Figure 6-1
36
Once you enable the “Application-consistent backup” field, you’ll see all the fields that are
needed to configure application aware backup for that instance:
“Enable VSS on Agent” – By default this option is turned on. This means that VSS
quiescence will be activated for this policy. In case the agent represents a Windows
2003 instance, VSS will fail every time. You need to turn off this option and use only
backup scripts. If you have a Windows 2003 instance and you don’t need scripts,
there is no use installing an agent, so just perform backup without one.
“Volumes for shadow copies” – This option is used only if VSS is turned on. If you
leave this field empty, VSS will create shadow copies of all of the volumes of this
instance. If you want it to create shadows for only part of the volumes, you can type
in drive letters with commas between them, e.g. “c:,d: ” For more information about
VSS, see chapter 6.
“Backup Scripts” – Whether to enable running backup scripts locally on the Windows
instance.
“Scripts Timeout” – Timeout value on the scripts in seconds
“Script Output” – Whether to capture the output of the script as a log. It will capture
anything the script printed to the stderr socket. That log will be viewable from the
recovery panel screen.
37
Figure 6-2
To upgrade an existing policy to the new agent, you only need to click on the link “Upgrade
to multi agents” and the policy will upgrade and will configure the agent instance. This
action is irreversible. Please make sure you install the 2.0.0 agent, and the old agent will stop
working.
To upgrade the agent, please uninstall the previous 1.8.0 agent first. Future versions of the
CPM agent will support upgrade rather than uninstall-install.
38
Figure 6-3
The fields are straightforward. You need the address of the CPM server. It is required that
the CPM Server will be reachable from this instance. The port is 443 by default (agents
communicate with CPM server using the HTTPS protocol). However, if you are using a
custom port for your CPM server, you will need to change the port here to the correct value.
After finishing the installation, the CPM agent will be a service in your Windows system. It
will run automatically and you will not need to deal with it, unless you need to change its
configuration.
39
6.2.5 Using the agent with an http proxy
If the Windows instance the agent is installed on can reach the CPM server only through a
proxy, CPM agent supports such a configuration. To do so, you will need to edit
“cpmagent.cfg” (see previous section) and add the following lines under the general section:
proxy_address=<dns name or ip address of the proxy server>
proxy_port=<port for the proxy (https)>
If your proxy server requires authentication you add the following two lines as well:
proxy_user=<proxy user name>
proxy_password=<proxy password>
For these changes to take effect, you will need to restart the CPM Agent service from the
service manager.
6.3.1 Introduction
VSS, or Volume Shadow Copy Service, is a backup infrastructure for Windows Servers. It is
beyond the scope of this guide to explain how VSS works (You can read more at
http://technet.microsoft.com/en-us/library/cc785914%28v=WS.10%29.aspx). However, it is
important to state that VSS is the standard for Windows application quiescence, and all
recent releases of many of the major applications that run on Windows use it, including
Microsoft Exchange, SQL Server, and SharePoint. It is also used by Windows versions of
products not developed by Microsoft, like Oracle.
CPM supports VSS for backup on Windows 2008 or 2012 Servers only. Trying to run VSS on
older Windows OSs will always fail. VSS is turned on by default for every Windows agent. For
unsupported OSs, you will need to disable it yourself. This can be done in the instance
configuration screen, see 6.2.1.
In a nutshell, any application that wishes to be “backup aware” has a component called “VSS
Writer,” e.g. SQL Server has its VSS writer, the Windows Registry has its VSS Writer, NTFS
(the file system) has its own writer, etc… Every vendor who would like to support copying
the actual backup data (or, in other words, making shadow copies) provides a component
called a “VSS Provider.” The operating system comes with a “System Provider,” which knows
how to make shadow copies to the local volumes. Storage hardware vendors have
specialized “Hardware Providers,” which know how to create shadow copies using their own
hardware snapshot technology. Components that initiate an actual backup are called “VSS
Requestors.”
When a requestor requests a shadow copy to be done, the writers flush and freeze their
applications. At the point of time of the shadow copy, all the applications and the file
systems are frozen. They all get thawed after the copy is started (copy-on-write mechanisms
keep the point in time consistent, not unlike EBS snapshots). When the backup is complete,
the writers get notified. They can then do “stuff” knowing that they have a consistent
40
backup for the point in time of the shadow copy. As an example, Microsoft Exchange
automatically truncates its transaction logs when it gets notified that a backup is complete.
Figure 6-4
41
6.3.3 Configuring VSS
By default, VSS is enabled when a CPM Thin Backup Agent is associated with an instance in a
policy. So in many cases, you will not need to do anything. By default VSS will take shadow
copies of all the volumes. However, you may want to narrow it down. For example, since the
system volume (typically C:\) can’t be used to recover the instance in a regular scenario, you
may want to exclude it from the backup. In this case there is no use taking a shadow copy of
it; this will unnecessarily take up additional resources. To make shadow copies of only some
of the volume you use, change the value of “Volumes for shadow copies” in the Instance and
Volume configuration screen screen. You need to type drive letters followed by a colon, and
separate volumes with a comma, e.g. “d:,e:,f:”.
42
diskshadow utility from a command line window, and use it to try and create a shadow copy.
Any issue you have with VSS using CPM should also occur here. To learn how to use the
diskshadow utility, see its documentation: http://technet.microsoft.com/en-
us/library/cc772172%28v=ws.10%29.aspx. You may see failures in backup because VSS
times out or is having issues. You will see that the backup is in status “Backup Partially
Successful.” Most times you will not notice it, since CPM will retry the backup and the retry
will succeed. If the problem repeats itself too often, it may be worth checking that
everything is working properly with your Windows server. You can check the application log
in Window’s Event Log once in a while. If you see VSS errors reported frequently, you should
look into it. Contact N2W Software support for any questions.
If you have a strict requirement to recover the consistent shadow copy for the system
volume as well, it is possible to do so. Please follow these instructions:
Before reverting for other volumes, stop the instance; wait until it is in “stopped”
state.
Using the AWS Console, detach the EBS volume of the C: drive from the instance and
attach it to another Windows instance, but as an additional disk
Using the Windows “disk management” utility, make sure the disk is online and
exposed with a drive letter.
Go back to the process in the previous section, and revert to the snapshot of drive C
(it will now have a different drive letter). Since it’s now not a system volume, it is
possible to do so.
43
Detach the volume from the second Windows instance, re-attach to the original
instance using the original device (typically /dev/sda1), and turn the recovered
instance back on.
Shadow copy data is stored by default in the volume that is being shadowed.
However, in some cases it is stored on another volume. In order for you to be able to
recover, you need to make sure you also have the volume the shadow copy is on
included in the backup and the recovery operation. Furthermore, when you revert to
shadows you need to do it in the right order. If you revert a volume that contains
another volume’s shadow data, it will delete it, and you will not be able to revert that
other volume. Since this is a recovery operation, you can always start over if you
Warning 6-1
encounter this issue.
44
6.4.1 “before” script
This script is run before backup begins. Typically this script is used to move applications to
backup mode. The “before” script typically leaves the system in a “frozen” state. This state
will stay so for a very short while, until the snapshots of the policy start.
The name of the “before” script is “before_<policy name>.<ext>”
When you enable backup scripts, CPM assumes you implemented all three scripts. Any
missing script will be interpreted as an error and will reflect in the backup status. Sometimes
you do not need all three (the “complete” script is often not needed). In cases like this, you
should still write a script that does nothing but exit with the code “0,” and the policy will
experience no errors.
Warning 6-2
6.4.4 Capturing the output of backup scripts
You can have the output of backup scripts collected and saved in the CPM Server. Please see
7.2.54.2.5.
45
7 Linux/Unix Instances Backup
Making application-consistent backup of Linux instances does not require any agent
installation. Since the CPM server is Linux based, backup scripts will run on it. Typically, such
a script would use SSH to connect to the backed-up instance and perform application
quiescence. However, this can also be done differently (e.g. using custom client software).
There is no parallel to VSS in Linux, so the only method available is by running backup
scripts.
7.2.1 General
Backup scripts should be placed in the path “/cpmdata/scripts.” If the policy belongs to a
CPM user other than the root user, scripts will be located in a subfolder named like the user
(e.g. /cpmdata/scripts/cpm_user1). This path resides on the data volume of CPM, and will
remain there even if you terminate the CPM server instance and wish to launch a new one.
Backup scripts will remain on the data volume, together with all other configuration data. As
“cpmuser,” you have read, write, and execute permissions in this folder.
All scripts need to exit with the code “0” when they succeed and “1” (or another non-zero
code) when they fail. All scripts have a base name (detailed for each script in the coming
sections), and may have any addition after the base name (e.g. before_policy1_v11.5.bash).
Scripts can be written in any programming language: shell scripts, Perl, Python, or even
binary executables. You only have to make sure they can be executed (and have the correct
permissions).
Note that having more than one script with the same base name can result in
unexpected behaviour. CPM does not guarantee which script it will run, and even to
run the same script every backup. Please avoid such a situation.
Warning 7-1
46
There are three scripts for each policy:
47
7.2.7 Example backup scripts
As an example, we can look at a set of backup scripts that use ssh to connect to another
instance and freeze a MySQL Database. The “before” script will flush and freeze the
database, the “after” script will release it, and the “complete” script will truncate binary logs
older than the backup. Please note, these scripts are given as an example without
warranties. Please test and make sure scripts work in your environment and do what you
expect them to before actually using them in your production environment.
The scripts are written in “bash”:
before_MyPolicy.bash
#!/bin/bash
ssh -i /cpmdata/scripts/mysshkey.pem sshuser@ec2-host_name.compute-
1.amazonaws.com "mysql -u root –p<MySQL root password>” -e 'flush tables with read lock;
flush logs;'"
if [ $? -gt 0 ]; then
echo "Failed running mysql freeze" 1>&2
exit 1
else
echo "mysql freeze succeeded" 1>&2
fi
This script connects to another instance using ssh, and then runs a MySQL command.
Another approach would be to use a MySQL client on the CPM Server and then the SSH
connection won’t be necessary.
After that script is executed CPM server will start the snapshots, and after that call the next
script:
after_MyPolicy.bash
#!/bin/bash
if [ $1 -eq 0 ]; then
echo "There was an issue running first script" 1>&2
fi
ssh -i /cpmdata/scripts/mysshkey.pem sshuser@ec2-host_name.compute-
1.amazonaws.com "date +'%F %H:%M:%S' > sql_backup_time; mysql -u root -p<MySQL root
password> -e 'unlock tables;'"
if [ $? -gt 0 ]; then
echo "Failed running mysql unfreeze" 1>&2
exit 1
48
else
echo "mysql unfreeze succeeded" 1>&2
fi
This script checks the status in the first argument and then does two things: First it saves a
timestamp of the current time into a file, then it releases the lock on the MySQL table. This
time stamp is at the exact point-in-time of the current backup, since it is taken when the
database is frozen. After that, CPM waits for all the snapshots to succeed, and when they do,
it run the last script:
complete_MyPolicy.bash
#!/bin/bash
if [ $1 -eq 1 ]; then
cat /cpmdata/scripts/complete_sql_inner |ssh -i /cpmdata/scripts/mysshkey.pem
sshuser@ec2-host_name.compute-1.amazonaws.com "cat > /tmp/complete_ssh; chmod
755 /tmp/complete_ssh; /tmp/complete_ssh"
if [ $? -gt 0 ]; then
echo "Failed running mysql truncate logs" 1>&2
exit 1
else
echo "mysql truncate logs succeeded" 1>&2
fi
else
echo "There was an issue during backup - not truncating logs" 1>&2
fi
It calls an inner script, complete_sql_inner:
butime=`<sql_backup_time`
mysql -u root -p<MySQL root password> -e 'PURGE BINARY LOGS BEFORE "'"$butime"'"'
What these two scripts do, is essentially to purge the binary logs, and only if the ”complete”
script gets “1” as the argument, indicating success. They read the time from the timestamp
file and execute the purge command to purge logs earlier than the timestamps.
49
the group of all user subfolders and scripts. Then if given “executable” permissions for the
group, “cpmuser” will be able to access and execute all scripts.
50
8 Additional Backup Topics
8.1 CPM in a VPC Environment
CPM supports working in a VPC environment. Let’s look at a few caveats:
If the CPM Server is in a VPC, it will need outward access to the Internet (AWS
endpoints), for that you will need to either attach an elastic IP to it or enable a NAT
configuration. Furthermore you will need to access it using HTTPS to manage it and
possibly SSH as well, so some inward access will need to be enabled. If you will run
Linux backup scripts on it, it will also need network access to the backed up
instances. If CPM backup agents will need to connect, they will need access to it
(HTTPS) as well.
If a Linux backed up instance is in a VPC and backup scripts are enabled, it will need
to be able to get inward connection from the CPM Server.
If a Windows backed up instance is in a VPC and you need to install a Thin Backup
Agent, the agent will need outbound connectivity to the CPM Server.
If you disable a policy, you need to be aware that this policy will not perform backup until it is
enabled again. If you disable it when an instance is stopped, make sure you enable it again
when you need backup to resume.
Warning 8-1
51
8.3 Backing up independent volumes
Backing up independent volumes in a policy is done regardless of volumes attachment state.
A volume can be attached to any instance or not attached at all, and the policy will still back
it up. If this policy is using backup scripts, these can be aware of the volume’s state. They
can, for instance, check which instance is the active node of a cluster and perform
application quiescence through it.
52
9 Performing Recovery
CPM offers several options for data recovery. Since all backup is based on AWS’s snapshot
technology, CPM can offer rapid recovery of instances, volumes, and databases. When you
click on “Recover” for a backup at a certain hour, you are directed to the recovery panel
screen. This screen will include the instances that were backed up with links to recover
them, and links to recover independent volumes and databases. It will also include the
outputs of backup scripts and VSS, if they exist. These outputs may be important as
reference during a recovery operation.
Also, in the recovery panel screen you may see a drop-down menu to choose whether to
perform the recovery in the original AWS region or to another region. This choice will be
available if this backup includes DR to another region.
If you have cross-account functionality enabled for your CPM license, you may see two other
drop-down menus. You will see “Restore to Account” field where you can choose to restore
the resources to another account. If you defined cross-account DR for this policy, you will
have the “Restore from Account” to choose from which account to perform recovery.
All the choices about regions and accounts you make in the recovery panel apply to all
recovery operation you initiate from this screen.
Figure 9-1
We strongly recommend you perform recovery drills from time to time to make sure your
recovery scenarios work. It’s not recommended to try it for the first time when your servers
are down. For any policy you can see on the policy screen, when the last time recovery was
performed on it. It can help you track the last time you performed a recovery drill.
53
credentials that were used for backup will be used also for recovery. You can choose to
uncheck it, and fill in different credentials for recovery. This can be useful if you choose to
use IAM-created credentials for backups that do not have permissions for recovery. Please
see 14.4. When using custom credentials CPM verifies these credentials actually belong to
the recovery account. If they are not, the recovery operation will fail.
When you recover an instance, by default you recover it with all its configuration, tags, and
data, as they were at the time of the backup. However, you can change any of these
elements if you wish. You can change instance type, placement, architecture, user data, etc.
You can also choose how to recover the system itself. For Linux EBS-based instances, if you
have a snapshot of the boot device, you will, by default, use this snapshot to create the boot
device of the new instance. You can, however, choose to create the new instance from an
image: its original image, or a different one. For instance-store-based, you will only have the
image option. This means you can’t use the snapshot of the instance’s root device to launch
a new instance. For EBS-based Windows Servers, there is a limitation in AWS, prohibiting
launching a new instance from a snapshot (as opposed to from an AMI). CPM knows how to
overcome this limitation. You can recover an instance from a snapshot, but you also need an
AMI for the recovery process. By default, CPM will create an initial AMI for any Windows
instance it backs up. By default it will use that AMI for the recovery process, so typically you
don’t need to change anything to recover a Windows instance.
Your data EBS volumes will be recovered by default, to create a similar instance as the
source. However, you can choose to recover some or none of them. You can also choose to
enlarge their capacity, change their device name or iops value.
You can choose to preserve tags related to the instance and/or data volumes, or you can
choose not to.
The instance recovery screen is divided to “Basic Options” and “Advanced Options.” This
helps making the recovery process simpler.
54
o “De-Register after Recovery” – This is the default. The image will only be
used for this recovery operation and will be automatically de-registered at
the end. This option will not leave any images behind after the recovery is
complete.
o “Leave Registered after Recovery” – In this case the new created image will
be left after recovery. This option is useful if you want to hold on to this
image to create future instances. The snapshots the image is based on will
not be deleted by the automatic retention process. However, if you want to
keep this image and use it in the future, you should move the whole backup
to the freezer (see 8.4).
o “Create AMI without Recovery” – This option creates and keeps the image,
but does not launch an instance from it. This is useful, if you want (for some
reason) to launch the instance/s from outside CPM. Again, if you wish to
keep using this image, you should to move the backup to the freezer.
“Image ID” – This is only relevant if “Launch From” is set to “image,” or if you are
recovering a Windows instance. By default, this will contain the initial AMI that CPM
created, and if such an AMI does not exist, the default will be the original AMI ID
from which the backed-up instance was launched. You can type or paste a different
AMI ID here, but you can’t search AMIs from within CPM. You can search for it with
a different tool (like AWS Management Console).
“Instances to Launch” – Specifies how many instances to launch from the image. The
default is one, and it’s also the sensible choice for production servers. However, in a
clustered environment you may want to launch more than one. It is not guaranteed
that all the requested instances will launch. In the message at the end of the
recovery operation, you will see exactly how many instances were launched, and
their IDs.
“Key” – The key (or key pair) you want to launch the instance with. The default is the
key that the backed-up instance was created with. You can choose a different one
from the list. Keys are typically needed to connect to the instance using SSH (in Linux
instances), or to decrypt the Administrator password (in Windows instances).
“Instance volumes” – All data volumes (those included in the policy excluding the
boot device) are listed here. Their default configuration is the same as it was in the
backed-up instance at the time of the backup. You can uncheck “recover” to exclude
a volume, or change capacity (only to enlarge it), device and iops. You can also
decide to exclude any tags associated with the volume (like its name), or whether
the volume will be deleted on termination of the instance (for instances recovered
from a snapshot).
55
Figure 9-2
56
same zone as the backed-up instance. However, you can choose a different one
from the list.
VPC – This option is only visible if you chose “By VPC Subnet” in “Placement.” You
choose the VPC the instance is to be recovered to. By default it will contain the VPC
the original instance belonged to.
“VPC Subnet ID” – This option is only visible if you chose “By VPC Subnet” in
“Placement.” This will hold all the subnets in the currently selected VPC.
“VPC Assign IP” – This option is only visible if you chose “By VPC Subnet” in
“Placement.” If the backed-up instance was in a VPC subnet, the default value will
be the IP assigned to the original instance. Note that if that IP is still taken, it can fail
the recovery operation. You can type a different IP here. When you begin recovery,
CPM will verify the IP belongs to the chosen subnet. If you leave this field empty, an
IP address from the subnet will be automatically allocated for the new instance.
“Auto-assign Public IP”: Will let you choose whether to assign a public IP to the new
instance. This is for public subnets. By default it will behave as the subnet defines.
“Placement Group” - This option is only visible if you chose “By Placement Group” in
“Placement.” You can choose the placement group from the list.
“Security Groups” – You can choose which security groups will be applied with the
new instance. This is a multiple-choice field, which means you can choose more than
one. By default, the security groups of the backed-up instance will be chosen. Please
note that security groups for VPC instances are different than groups of non-VPC
instances. Every time you toggle the “Placement” option between “By Availability
Zone” and “By VPC Subnet,” the list of security groups will be updated, and the
previous checked items will not be saved. This field also has a filter to help you find
the security group that you need, in case you have many security groups defined.
“Enable User Data” – States whether to use user data for this instance launch. If
checked, another option appears: “User Data.”
“User Data” – The text of the user data. Special encoding or using a file as the source
is not currently supported from within CPM.
“Preserve Tags” – By default this option is checked. If checked, all the tags that were
associated with the backed-up instance at the time of the backup (like the instance’s
name) will also be associated with the new instance/s.
“Instance Type” – Choose the instance type of the new instance/s. By default the
instance type of the backed-up instance will be chosen. If you choose an instance
type that is incompatible with the image or placement method, the recovery
operation will fail.
“Shutdown Behavior” – By default it will have the value of the original instance. If
the recovered instance is instance-store-based, this option is not used. The choices
are:
o “stop” – if the instance is shut down, it will not be terminated and will just
move to “stopped” state.
o “terminate” – if the instance is shut down it will also be terminated.
57
“API Termination” – States whether terminating the new instance by API is enabled
or not. The default value will be as the backed-up instance.
“Kernel” – Will hold the kernel id of the backed-up instance. You can type or paste a
different one. However, you can’t search for a kernel ID from within CPM. Change
this option only if you know exactly which kernel you need. Choosing the wrong one
will result in a failure.
“RAM disk” - Will hold the RAM Disk id of the backed-up instance, if it had one. You
can type or paste a different one. However, you can’t search for a RAM Disk ID from
within CPM. Change this option only if you know exactly which RAM Disk you need.
Choosing the wrong one will result in a failure.
“Allow Monitoring” – Is checked if monitoring should be allowed for the new
instance. The default will be the value in the backed-up instance.
“Instance Profile ARN” – The ARN of the instance role (IAM Role) for the instance.
You can find the ARN by clicking on the Role name in IAM Management Console and
clicking on the “Summary” tab. The default will be the instance role of the backed-
up instance, if it had one.
“EBS Optimized” – Is checked to launch an EBS Optimized instance. The default will
be the value from the backed-up instance.
“Tenancy” – Lets you choose the tenancy option for this instance.
Figure 9-3
To complete the recovery operation, click on “Recover Instance” and then approve. If there
are errors CPM detects in your choices, you will return to the recover instance screen with
error messages. Otherwise, you will be redirected back to the recovery panel screen, and a
message will be displayed regarding the success or failure of the operation.
58
start backing up the instance. If at that time, the AMI no longer existed, then CPM can’t do
anything about it. However, if the AMI gets deleted sometime after the instance started
backing up, CPM will remember the details of the original AMI.
Figure 9-4
When clicking on the “AMI Assistant” button in the instance recovery screen, you will see
these details. You will then be able to try and find similar AMIs. Clicking on “find exact
matches” will try and find AMIs that according to their properties are exactly like the
original. If that doesn’t turn up anything, you can click on “perform fuzzy search” which will
take a bit longer and will try and find AMIs similar to the original. That will typically turn up
several AMI’s, usually different versions/flavours of the same offering.
AMI Assistant can be useful for the following scenarios:
You want to recover an instance by launching it from an image, but the original AMI
is no longer available.
You want to recover an instance by launching it from an image, but you want to find
a newer version of the image (fuzzy search will help you there).
You are using DR (see 10) and you need to recover the instance in a different region:
You may want to find the matching AMI in the target region to use it to launch the
instance, or you may need its kernel ID or ram disk ID to launch the instance from a
snapshot.
59
9.3 Volume recovery
Volume recovery basically means creating EBS volumes out of snapshots. In CPM, you can
recover volumes that were part of an instance’s backup, or recover EBS volumes that were
added to a policy as independent volumes. The recovery process is basically the same.
To recover volumes belonging to an instance, simply click on “Volumes Only” next to an
instance backup in the recovery panel screen.
Figure 9-5
As you can see in Figure 9-5, the volume recovery screen is straightforward. The following
are the fields you can change:
“Recover” – Checked by default. Uncheck if you don’t want that volume recovered.
“Zone” – Availability zone. The default is the original zone of the backed-up volume.
“Capacity” – You can choose to enlarge the capacity of a volume. You can’t make it
smaller than the size of the original volume, which is also the default.
“Type” - Lets you choose the type of the EBS volume.
“IOPS” – Number of iops. This field is used only if the type of volume you chose is
“Provisioned IOPS SSD”. The default will be the setting from the original volume.
Values for IOPS should be at least 100, and the volume size needs to be at least 1/10
that number in GiBs. E.g. if you want to create a 100 IOPS volume, its size needs to
be at least 10Gib. If you will not abide to this rule, the recovery operation will fail,
and you will receive an error message.
“Device” – Which device it will be attached as. This is only used if you choose to
automatically attach the recovered volume to an instance. If the device is not free or
not correct the attach operation will fail.
“Preserve Tags” – Whether to associate the same tags (like the volume’s name) to
the recovered volume. Default is yes.
“Attach to Instance” – Choose whether to attach the newly recovered volume to an
instance. The list holds instances that are in the same availability zone as the
60
volume. Changing “Zone” will refresh the content of this list. This field also has a
filter, to allow finding the instance easily.
“Attach Behavior” – This applies to all the volumes you are recovering, if you choose
to attach them to an instance. You can choose from these three options:
o “Attach only if Device is Free” – This means that if the requested device is
already taken in the target instance, the attach operation will fail. You will
get a message saying the new volume was created, but was not attached.
o “Switch Attached Volumes” – This option will work only if the target
instance is in “stopped” state. If the instance is running, you will get an error
message. CPM will not try to forcefully detach volumes from a running
instance, since this can cause systems to crash.
o “Switch Attached Volumes and Delete Old Ones” – As the previous option,
this one will work only on stopped instances. This option will also delete the
old volumes that are detached from the instance.
If you choose “Switch Attached Volumes and Delete Old Ones,” please make sure you don’t
need the old volumes. CPM will delete them after detaching them from the target instance.
Warning 9-1
As with other recovery screens, you can choose to use different AWS credentials for the
recovery operation. After clicking “Recover Volumes” and approving, if there was a logical
error in a field that CPM detected, you will be returned to the screen with an error
notification. If not, you will be redirected back to the recovery panel screen with a message
regarding the status of the operation.
To recover independent volumes, you simply click on the “Recover Independent Volumes”
button at the top right of the recovery panel screen. This button will only be available if
there are independent volumes in the current backup. After clicking, you will reach a similar
recover volumes screen as with instance volumes.
61
Figure 9-6
Figure 9-7
Clicking on it will bring you to the RDS Database Recovery screen, as seen in Figure 9-8.
62
Figure 9-8
In this screen you will see a list of all RDS databases in the current backup. You can change
the following options:
“Recover” – Uncheck to not recover the current database.
“Zone” – The availability zone of the database. By default it will be the zone of the
backed-up database, but this can be changed. Currently, recovering a database into
a VPC subnet is not supported by CPM. You can always recover from the snapshot
using AWS Management Console.
“DB Instance ID” – The default will the ID of the original database. If the original
database still exists, the recovery operation will fail. You can type in a new ID to
recover a new database.
“DB Snapshot ID” – This is just a display field of the snapshot ID.
“DB Instance Class” – The default is the original class, but you can choose another.
“Port” – You can choose the port of the database. The default is the port of the
original backed-up database.
“Multi AZ” – Determines whether to launch the database in a multi AZ configuration
or not. The default will be the value from the original backed-up database.
“Subnet Group” – Determines whether to launch the database in a VPC subnet or
not, and to which subnet group. The default will be the value from the original
backed-up database. You can recover a database from outside a VPC to a VPC
subnet group, but the other way around is not supported and will return an error.
As in other types of recovery, you can choose to use different AWS credentials.
63
is in production real-life deployments, the cluster will be created in a multi-AZ deployment,
and the cluster will have reader and writer dbinstances.
When recovering an Aurora cluster, CPM will recover the dbcluster and then will create the
dbinstances for it.
You reach the Aurora Clusters Recovery screen through the recovery panel:
Figure 9-9
By clicking the “Recover Aurora Clusters” button, you’ll reach the following screen:
Figure 9-10
In this screen you’ll see a list of all Aurora clusters that were backed up. You can change the
following options:
“Recover” – Uncheck to not recover the current Aurora cluster.
64
“RDS Cluster ID” – The default will be the ID of the original cluster. If the original
cluster still exists, the recovery operation will fail, unless you change the ID.
“DB Instance ID” – The default will the ID of the original instance. If the original
instance still exists, the recovery operation will fail. You can type in a new ID to
recover a new database. CPM will use this instance ID for the writer, and in the case
of multi-az, it will create the reader with this name with “_reader” added at the end.
“DB Cluster Snapshot ID” – This is just a display field of the snapshot ID.
“Instance Type” – The type or class of the DB instance/s.
“Port” – You can choose the port of the database. The default is the port of the
original backed-up database.
“Zone” – The availability zone of the cluster in case of single AZ. If using a subnet
group, please leave as is.
“Subnet Group” – Determines whether to launch the cluster in a VPC subnet or not,
and to which subnet group. The default will be the value from the original backed-up
cluster.
“Publicly Access.” – Whether the cluster will be publicly accessible or not. The
default will be the one from the original backed-up instance.
Figure 9-11
Clicking on it will bring you to the Redshift Cluster Recovery screen, as seen in Figure 9-10.
65
Figure 9-12
In this screen you will see a list of all Redshift clusters in the current backup. You can change
the following options:
“Recover” – Uncheck to not recover the current cluster.
“Zone” – The availability zone of the cluster. By default it will be the zone of the backed-
up cluster, but this can be changed. Currently, recovering a cluster into a VPC subnet is
not supported by CPM. You can always recover from the snapshot using AWS
Management Console.
“Cluster ID” – The default will the ID of the original cluster. If the original cluster still
exists, the recovery operation will fail. You can type in a new ID to recover a new cluster.
“Cluster Snapshot ID” – This is just a display field of the snapshot ID.
“Node Type” and “Nodes” – only for display. Changing these fields are not supported by
AWS.
“Port” – You can choose the port of the cluster. The default is the port of the original
backed-up cluster.
“Subnet Group” – Determines whether to launch the cluster in a VPC subnet or not, and
to which subnet group. The default will be the value from the original backed-up cluster.
You can recover a cluster from outside a VPC to a VPC subnet group, but the other way
around is not supported and will return an error.
As in other types of recovery, you can choose to use different AWS credentials.
66
10 Disaster Recovery (DR)
10.1 Introduction
CPM’s DR (disaster recovery) solution allows you to recover your data and servers in case of
a disaster. A “disaster” doesn’t necessarily mean a horrible man-made or natural disaster,
although you’ll want to be prepared for that as well. DR will also help you recover your data
in case of an outage or malfunction, or for any other reason.
What does that mean in a cloud environment like EC2? Every EC2 region is divided into
availability zones which use separate infrastructure (power, networking etc…). So, when you
use EBS snapshots as CPM does, then by definition you will be able to recover your EC2
servers to other availability zones in case of an outage in one of the zones. CPM’s DR is
based on AWS’s ability to copy EBS snapshots between regions, and allows you the extended
ability to recover instances and EBS volumes in other regions. You may need this ability if
there is a full-scale outage in a whole region. But it can also be used for the ability to migrate
instances and data between regions and is not limited to the case of an outage or disaster. If
you use CPM to take RDS snapshots, those snapshots will also be copied and will be available
in other regions.
RDS Aurora Clusters: DR across regions of RDS Aurora Clusters is currently not supported.
We plan to add DR support in future releases.
Redshift Clusters: Currently CPM does not support DR of Redshift clusters. If you enable DR
on a policy containing Redshift clusters, they will be ignored at the DR stage. You can enable
copying Redshift snapshots between regions automatically by enabling cross-region
snapshots using EC2 console.
10.2 Configuring DR
It is very easy setting up DR using CPM. After defining a policy (or any time after a policy
started to perform backup), you can click on the “DR” button under the “Configure” column
in the “Policies” tab of the main screen. It will then open a very simple popup screen:
67
Figure 10-1
68
can take a long time. CPM will wait until all copy operations are completed successfully
before declaring the DR status as “Completed.” As opposed to the backup process that
allows only one backup of a policy to run at one time, DR processes are completely
independent. This means that if you have an hourly backup and it runs DR each time, if DR
takes more than an hour to complete, DR of the next backup had already begun before the
first one completed. Although CPM can handle many DR processes in parallel, it is not
recommended to take it too far. AWS limits the number of copy operations that can run in
parallel to any given region, and too many processes can cause congestion and may never
catch up. See 10.5.2 later on this chapter.
CPM will keep all information of the original snapshots and the copied snapshots and will
know how to recover instances and volumes in all relevant regions.
The automatic retention process that deletes old snapshots will also clean up the old
snapshots in other regions. When a regular backup is outside the retention window and its
snapshots are deleted, so will the DR snapshots that were copied to other regions.
10.5.1 Considerations
There are some fundamental differences between local backup and DR to other regions. It’s
important to understand the differences and their implications when planning your DR
solution. Let’s look at the differences between storing EBS snapshots locally and copying
them to other regions:
Copying between regions is transferring data over a WAN (Wide Area Network). It
means that it will be much slower than moving data locally. As you’d expect, a data
transfer from the U.S to Australia or Japan will take considerably more time than a
local copy.
AWS will charge you for the data transfer between regions. This can affect your AWS
costs, and the prices are different depending on the source region of the transfer.
For example, in March 2013, transferring data out of U.S regions will cost 0.02
USD/GiB and can climb up to 0.16 USD/GiB out of the South America region.
Let’s take an extreme example: You have an instance with 4 1TiB EBS volumes attached to it.
The volumes are 75% full. There is an average of 3% daily change in data for all the volumes.
69
This brings the total size of the daily snapshots to around 100 GiB. Locally you take 4
backups a day. In terms of cost and time, it will not make much of a difference if you take
one backup a day or four, which is true also for copying snapshots, since that operation is
incremental as well. Now you want a DR solution for this instance. Copying it every time will
copy around 100GiB a day. You need to calculate the price of transferring 100 GiB a day and
storing them at the remote region on top of the local region.
70
you want to recover an instance from a week ago, you should always use the latest backup
of the “cpmdata” policy.
10.6 DR Recovery
DR recovery is similar to regular recovery with a few differences. First of all, when you click
on the “Recover” button for a backup that includes DR (DR is in “Completed” state), you get
the same Recovery Panel screen with the addition of a drop down list.
Figure 10-2
As you can see in figure 10-2, the default is “Origin” which will recover all the objects from
the original backup. It will perform the same recovery as a policy with no DR. When choosing
one of the target regions, it will display the objects and will recover them at the selected
region. We strongly recommend you perform recovery drills from time to time to be sure
your recovery scenario works. It’s not recommended to try it for the first time when your
servers are down. For any policy you can see on the policy screen, when the last time DR
recovery was performed on it. It can help you track when you performed a recovery drill last.
71
Key Pair: It’s easy to create key pairs with AWS Management Console. You should
have one ready so you won’t need to create it when you perform recovery.
Security Groups: In regular recovery CPM will remember the security groups of the
original instance and use them as default. In DR recovery CPM can’t choose for you.
You need to choose at least one, or the instance recovery screen will display an
error. Security groups are objects you own, and you can easily create them in AWS
Management Console. You should have them ready so you won’t need to create
them when you perform recovery.
Kernel ID: Linux instances need a kernel ID. If you are launching the instance from an
image, you can leave this field empty, CPM will use the kernel ID specified in the
AMI. If you are recovering the instance from a root device snapshot, you need to
find a matching kernel ID in the target region. If you do not do so, a default kernel
will be used, and although the recovery operation will succeed and the instance will
show as running in AWS Management Console, it will most likely not work. AMI
Assistant can help you find a matching image in the target region (see 9.2.3). When
you find such an AMI, you can simply copy and paste its kernel ID from the AMI
Assistant window.
RAMDisk ID: Many instances don’t need a RAM disk at all and this field can be left
empty. If you need it, you can use AMI Assistant the same way you do for Kernel id.
If you’re not sure, use the AMI Assistant or start a local recovery and see if there’s a
value in the RAMDisk ID field.
72
right-clicking on it and choosing “Create Volume from Snapshot.” You can give the
new volume a name so it will be easy to find later.
Launch a new CPM Server at the target region. You can use the “Your Software”
page to launch the AWS Marketplace AMI. Please wait until you see the instance in
“running” state.
As in with regular configuration of a CPM server, you connect to the newly created
instance using https. You approve the SSL certificate exception. Assuming the
original instance still exists, CPM will come up in “recovery” mode, which means that
the new sever will perform recovery and not backup. If you are running the BYOL
edition and need an activation key, most likely you do not have a valid key and that
time, and you don’t want to wait until you can acquire one from N2W Software. So
please register quickly to CPM Basic Edition (it is the cheapest). In step 2. Use your
own username and you can type any password. In step 3, choose the volume you
just created for the CPM data volume. Afterwards, complete the configuration.
Now with a working CPM server you can perform any recovery you need at the
target (current) region. You find the backup you want to recover, click on “Recover,”
then choose the target region from the dropdown list. If your new server allows
backup (it can happen if you registered to a different edition or if the original one is
not accessible), please bear in mind that it can start to perform backups. If that’s not
what you want, it’s best to disable all policies before you start the recovery process.
You can recover all the backed up objects that are available in the region.
73
Figure 10-3
Furthermore, you can view DR snapshot ids and statuses in the snapshots screen of the
backup:
74
Figure 10-4
Every DR snapshot is displayed with region information and the ids of both the original and
the copied snapshots.
If DR fails you will not be able to use DR recovery. However, some of the snapshots may exist
and be recoverable. You can see them in the snapshots screen and in case you need them,
you can recover from them manually.
If DR keeps failing because of timeouts, you may need to increase the timeout value for the
relevant policy. The default of 24 hours should be enough, but there may be a case with a
very large amount of data, that may take longer. Please bear in mind, that you can only copy
a limited number of snapshots to a given region at one time (currently the number is 5). If it
reaches the limit, CPM will wait for copy operations to finish before it continues with more
of them. That is no problem, but that can affect the time it takes to complete the DR
process.
75
11 Cross-account DR, Backup and Recovery
11.1 Introduction
Enabled only for Advanced and Enterprise Editions, CPM’s cross-account functionality allows
you to automatically copy snapshot between AWS accounts. This works as part of the DR
module, and in concert with cross-region DR: you can copy snapshots between regions as
well as between accounts and any combination of both.
In addition, CPM offers cross-account recovery: you can recover resources (e.g. EC2
instances) to a different AWS account whether you copied the snapshots to that account or
not.
This cross account functionality is important for security reasons. The ability to copy
snapshots between regions can prove crucial if your AWS credentials had been
compromised and there’s a risk of deletion of your production data as well as your snapshot
data. CPM utilizes the “snapshot share” option in AWS to enable copying them across
accounts.
Cross-account functionality is currently supported only for EC2 instances and EBS volumes.
From version 2.1.0, cross account DR is supported for RDS instance (excluding Aurora).
Cross-account functionality is enabled for encrypted EBS volumes and instances with
encrypted EBS volumes. You, as the user, will need to share the encrypted key used for the
encryption of the volumes or RDS instance to the other account, CPM will not do it for you.
In addition, CPM expects to find a key in the target account with the same alias as the
original key (or just uses the default key).
76
Figure 11-1
Cross-account fields will be available only if your CPM is licensed for cross-account
functionality. See the pricing and registration page in our website to see which CPM editions
include cross-account backup & recovery.
Once you set the “Cross Account DR” field to “Enabled” the other fields will become visible:
“To Account” – You need to choose to which account to copy the snapshots. This
account needs to be defined as a “DR Account” in the “Manage AWS Accounts”
screen.
“Keep Original Snapshots” – Whether to keep the snapshots both in the original and
the DR accounts or to delete the original snapshots once they are copied to the DR
account. This can save cost by not paying to store all snapshots twice.
77
11.5 Cross-account with cross-region
If you configure the backup policy to copy snapshots across accounts as well as across
regions, CPM will combine: it will copy to the other account and to other regions. So, you
can potentially copy snapshots to regions and accounts. What is important is to know exactly
what you are doing and not let the cost of these actions to be too high.
78
12 File-level Recovery
In version 2.0.0, CPM introduced file-level recovery. CPM does backup on the volume and
instance level, and specializes in instant recovery of volumes and complete instances.
However, in many cases a user may want to access specific files and folders rather than
recovering an entire volume.
In previous versions of CPM, you could recover a volume, attach it to an instance, mount it
and then access the data from within that instance. After completing the restore, assuming
the volume is no longer needed, the user needed to unmount, detach and delete the
volume.
CPM now automates this entire process. You can click on “Explore” (see image below) either
from the recovery panel screen for an instance, or from the volume recovery screen, to
recover a specific volume. CPM will open a new browser tab and after a short wait time, the
new browser tab will show a “File Explorer-like” view of the entire instance or a specific
volume. You’ll be able to browse, search for files, and download files and folders.
Figure 12-1
On the right-hand side of any file or folder there is a green download icon that will download
the file or folder. Folders are downloaded as uncompressed zip files.
79
Figure 12-2
To perform these operations, CPM needs to be able to use AWS credentials belonging to the
CPM server instance account, with sufficient permissions to create and attach volumes. By
default, CPM will use the same credentials used to initially configure the instance, but they
can be modified using the “General Settings” screen.
File-level recovery requires CPM to recover volumes in the background and attach them to
the CPM server. There are a few limitations:
Explore works only on simple volumes. LVM and Windows dynamic disks are
currently not supported. Additionally, disks defined with Microsoft Storage Spaces
are not supported.
Explore works only on snapshots taken in the same region the CPM server is in.
Explore for encrypted volumes will only work at the same account the CPM Server
instance is in. Cross-account explore of encrypted volumes is not supported.
After you complete the recovery operation, please click on the “close” button for all the
resources to be cleaned-up and to save cost. Even if you just close the tab, CPM will detect
the redundant resources and clean them up, but it is recommended to use the “close”
button.
80
can be notified what to do with this resource, and there is no need to use the GUI. Since
tagging is a basic functionality of AWS, it can be easily done via the API and scripts.
81
13.2.3 Setting backup options for EC2 Instances
When adding an instance to a policy (or creating a new policy from template) you may make
a few decisions about the instance. For once, you can decide whether to create snapshots
only for this instance, snapshots with an initial AMI or just to schedule AMI creation. If this
option is not set, CPM will assume the default: snapshots only for Linux and snapshots with
initial AMI for Windows instances.
You do that by adding a backup option after the policy name. The backup option can be one
of four values: “only-snaps,” ”initial-ami”, ”only-amis” or “only-amis-reboot”
For example, with existing policy: “policy1#only-snaps”
Or for a new policy based on template and setting AMI creation:
"my_new_policy:existing_policy#only-amis"
Please note that “only-amis” will create AMIs without rebooting them. “only-amis-reboot”
will create AMIs with reboot.
Another option you can set is whether a Windows instance is backed-up with “app-aware,”
i.e. it’s using a backup agent. It is used the same as the snaps and AMI options work, with
the backup option “app-aware.” When adding the “app-aware” option, the agent is set in
the default way: VSS is enabled and backup scripts are disabled. Additional configurations
need to be done manually, and not with the tag.
You can also use the backup options combined: “policy1#initial-ami#app-aware”.
82
Figure 13-1
13.4.1 Pitfalls
There are potential issues you should try to avoid when managing your backup via tags, the
first being not to create contradictions between the tags content and manual configuration.
If you tag a resource and it’s added to a policy, and later you remove it from the policy
manually, it may “come back” at the next tag scan. CPM tries to warn you from such
mistakes. Policy name changes can also affect tag scanning. If you rename a policy, the
policy name in the tag can be wrong. Please make sure that if you rename a policy, you will
also correct any relevant tag values.
When you open a policy to edit it, and this policy was created by a tag scan, you will be able
to see a message at the top of the dialog window: “* This policy was automatically added by
tag scan”. Please beware that even if all the backup targets are removed, CPM will not
delete any policy on its own, since deletion of a policy will also delete all its data. If you have
a daily summary configured (see 15.6), any policies with no backup targets will be listed.
Another issue you should try to avoid can happen if the same AWS account is added as
multiple accounts in CPM. In that case, the same tags can be scanned multiple times, and
the behaviour can become unpredictable. We generally discourage this practice. It is better
to define an account once, and then allow other delegates (see 16.4) access to it. In any
case, it you added the same AWS account multiple times (even for different users), please
make sure only one of the accounts in CPM has “Scan Resources” enabled.
83
13.4.2 Troubleshooting
Sometimes you need to understand what happened during a tag scan, especially if the tag
scan did not behave as expected (e.g. I expected a policy to be created and it didn’t). In the
“General Settings” screen you can view the log of the last tag scan, and you’ll be able to see
exactly what happened during this scan, and any problems (e.g. problem parsing the tag
value) that were encountered. Furthermore, if the daily summary is enabled, any new scan
results from the last day will be listed in the summary.
• When listing multiple policy names, make sure they are separated by spaces.
• When creating new policy, verify using ‘:’ and not ‘;’. The syntax is
“new_policy1:existing_policy1”.
• Use valid name for new policy or it won’t be created (error message will be
added to scan log).
• Make sure using correct names for existing/template policies.
• Resource scanning order is NOT defined, so use policy names as
existing/template only if you are sure that it exists in CPM – defined manually or
scanned previously.
84
14 Security Concerns and Best Practices
14.1 Introduction
Security is one of the main issues and barriers in decisions regarding moving business
applications and data to the cloud. The basic question is whether the cloud is as secure as
keeping your critical applications and data in your own data center. There is probably no one
simple answer to this question, as it depends on many factors.
Prominent cloud service providers like Amazon Web Services, are investing a huge amount
of resources so people and organizations can answer “yes” to the question in the previous
paragraph. AWS has introduced many features to enhance the security of its cloud.
Examples are elaborate authentication and authorization schemes, secure APIs, security
groups, IAM, Virtual Private Cloud (VPC), and more.
CPM strives to be as secure as the cloud it is in. It has many features that provide you with a
secure solution.
85
14.3 Best security practices for CPM
Implementing all or some of the following best practices depends on your company’s needs
and regulations. Some of the practices may make the day-to-day work with CPM a bit
cumbersome, so it is your decision whether to implement them or not.
14.3.2 Passwords
Create a strong password for the CPM server and make sure no one can access it. Change
passwords from time to time. Strong passwords should be impossible to guess. CPM does
not enforce any password rules. It is the user’s responsibility.
86
The permissions the IAM policy must have depend on what you want to do with them. For
more information about IAM, see IAM documentation:
http://aws.amazon.com/documentation/iam/
87
snapshots, but for recovery you will need to be able to create volumes, create instances and
create RDS databases. Plus, you will need to be able to attach and detach volumes and even
delete volumes. If your credentials fall into the wrong hands, recovery credentials can be
more harmful. If you use a backup-only IAM user or role, then you will need to type in ad-
hoc credentials when you perform a recovery operation, or else that operation will fail.
88
"rds:DescribeDBSubnetGroups",
"rds:ListTagsForResource",
"rds:CopyDBSnapshot",
"redshift:DescribeClusters"
],
"Sid": "Stmt1374237153000",
"Resource": [
"*"
],
"Effect": "Allow"
}
]
}
89
"*"
],
"Effect": "Allow"
}
]
}
14.4.2.4 Redshift
To add the ability to manage Redshift Cluster snapshots you need to either create a new
policy or add the following permissions to your backup policy:
{
"Sid": "Stmt1425805298000",
"Effect": "Allow",
"Action": [
"redshift:CopyClusterSnapshot",
"redshift:CreateClusterSnapshot",
"redshift:CreateTags",
"redshift:DeleteClusterSnapshot",
"redshift:DescribeClusterParameterGroups",
"redshift:DescribeClusterParameters",
"redshift:DescribeClusterSnapshots",
"redshift:DescribeClusterSubnetGroups",
"redshift:DescribeClusters",
"redshift:DescribeTags", #
"redshift:RestoreFromClusterSnapshot"
],
"Resource": [
"*"
]
}
90
The last permission is used to recover Redshift clusters from
snapshots. You can add this specific permission to your recovery iam
policy instead.
91
If you do not use CPM with RDS at all, you can omit all RDS permissions from your IAM policies
Info 14-1
92
15 Alerts, Notifications and Reporting
15.1 Introduction
CPM manages the backup operations of your EC2 servers. In order to notify you when
something is wrong and to integrate with your other cloud operations, CPM allows sending
alerts, notifications and even raw reporting data. So, if you have a NOC, are using external
monitoring tools or just want an email to be sent to the system administrator whenever a
failure occurs, CPM has an answer for that.
15.2 Alerts
What are alerts? Alerts are notifications about issues in your CPM backup solution.
Whenever a policy fails, in backup or DR, an alert is issued so you’ll know this policy is not
functioning properly. Later, when the policy succeeds, the alert is turned off or deleted, so
you’ll know that the issue is resolved. Alerts can be issued for failures in backup and DR, as
well as general system issues like license expiration (for relevant installations).
Figure 15-1
93
In order to configure this API call, you will need to go to the Notifications screen by clicking
the “Notifications” button at the top of any screen. In the notifications screen, click on the
“Configure API Authentication Key” link. In the popup screen you can enable or disable pull
alerts access and you can generate an authentication key by clicking “new authentication
key” (see Figure 15-1). Clicking that link while there is already an authentication key
assigned, will switch it to a new one. After enabling and setting the key, you can use the API
call to get all alerts:
https://<your CPM Server address>:<your port>/agentapi/get_cpm_alerts/
Let’s look at a simple python example:
d:\tmp>python
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more
information.
>>> import urllib2, json
>>> server_address = 'ec2-54-228-126-14.compute-1.amazonaws.com'
>>> server_port = 443
>>> authkey =
'afb488681baf0132fe190315e87731f883a7dac548c08cf58ba0baddc7006132a
a74f99ab07eff736477dca86b460a4b1a7bfe826e16fdbc'
>>> url = 'https://%s:%d/agentapi/get_cpm_alerts/' % (server_address,
server_port)
>>> url
'https://ec2-54-228-126-14.compute-
1.amazonaws.com:443/agentapi/get_cpm_alerts/'
>>> request = urllib2.Request (url)
>>> request.add_header("Authorization", authkey)
>>> handle = urllib2.urlopen (request)
>>> answer = json.load (handle)
>>> handle.close ()
>>> answer
[{u'category': u'Backup', u'message_body': u'Policy win_server (user:
root, account: main) - backup that started at 07/20/2013 09:00:00 AM
failed. Last successful backup was at 07/20/2013 08:00:00 AM',
u'severity': u'E', u'title': u'Policy win_server Backup Failure',
u'alert_time': u'2013-07-20 06:00:03', u'policy': {u'name':
u'win_server'}}, {u'category': u'Backup', u'message_body': u'Policy
web_servers (user: root, account: main) - backup that started at
07/20/2013 09:20:03 AM failed. Last successful backup was at
07/20/2013 08:30:00 AM', u'severity':u'E', u'title': u'Policy
94
web_servers Backup Failure', u'alert_time': u'2013-07-20 06:22:12',
u'policy': {u'name': u'web_servers'}}]
>>>
The json response is a list of alert objects, each containing the following fields: category,
title, message_body, alert_time (time of the last failure) and policy.
15.4.1 Introduction
CPM can also push alerts to notify you of any malfunction or issue. CPM uses SNS (Simple
Notification Service) for this purpose. To use it, your account needs to have SNS enabled.
SNS can send push requests to email, http/s, SQS and even SMS (in some locations).
Basically, with SNS you create a topic, and for each topic there can be multiple subscribers,
of multiple types (email, email-json, http/s, SQS, SMS). Every time a notification is published
to a topic, all subscribers get notified. For more information about SNS, see
https://aws.amazon.com/sns/.
CPM uses SNS in a simple way. It can create the SNS topic for you and subscribe the official
user email (the one entered in the configuration process). If you want to add recipients, use
SMS, http or other, you can do that using the SNS Management console (part of the AWS
Management console). You have a link to this console in CPM’s notifications screen.
SNS can incur costs. For the small volume of messages CPM uses, it is usually free or the cost
is negligible. For SNS pricing see https://aws.amazon.com/sns/pricing/.
Figure 15-2
You can see the Notifications screen in Figure 15-2. In order to use SNS, you will need to
enter AWS account credentials for the SNS service. There is one notifications configuration
95
per user, but there can be multiple AWS accounts (where applicable). SNS credentials are
not tied to any of the backed up AWS accounts. You can choose a region, and type in
credentials, which can be regular credentials, IAM user (see 14.4.2.3). To use the CPM Server
instance’s IAM role (only for the root user), simply type in “use_iam_role” for both access
and secret keys.
If you are the root (main) user, you can also choose whether to include or exclude alerts
about managed users (see 16.2).
SNS is used both for “push” alerts and for sending a daily summary.
If you are experiencing issues often (hopefully this won’t happen), it sometimes
reduces noise to get one summary a day. Furthermore, since backup is the second
line of defense, i.e. the production environment does not depend on it, some
people feel they don’t need to get an instant message on every backup issue that
occurs.
Even if there are no issues (hopefully this is the case most of the time), a daily
summary is sent, saying all is ok. This is a positive sign you get from the system once
a day. If something happens and CPM crashed altogether (and your monitoring
solution did not pick up on that) you will notice daily summaries will stop.
Daily summary contains a bit more than just alerts; they also contain a list of policies
which are disabled and policies that don’t have schedules assigned to them.
Although neither of these cases is an error, sometimes someone can leave a policy
disabled or without a schedule and forget about it, thinking that it continues to
perform backup, when actually is does nothing.
96
Figure 15-3
As seen in Figure 15-3, you set an SNS topic for the daily summary as well. If you have alerts
configured, you can choose to use the same SNS topic for summaries as well, or you can
choose to create a new one, or you can paste an ARN of your choosing. There is an
advantage of using a separate topic since sometimes you want different recipients: It makes
sense for a system admin to get alerts by SMS, but to get the daily summary by email only.
The display name of the topic also appears in the message (in emails it appears as the sender
name), so with separate topics it’s easier to know which is which.
Besides that you can choose the hour in which the daily summary will be sent.
97
Start Time – Time the backup started
End Time – Time the backup ended
Is Retry – Says yes if this backup was a retry after failure, otherwise says no
Marked for Deletion – Says yes if this backup was marked for deletion. If it is “yes,”
the backup no longer appears in the backup monitor and is not recoverable.
Backup ID – ID of the backup the snapshot belongs to. Matches the same snapshots
in the previous report.
Region – AWS region
Type – Type of snapshot: EBS, RDS or EBS Copy – which is a DR copied snapshot
Volume/DB – AWS ID of the backed up EBS volume or RDS database
Instance – If this snapshot belongs to a backed up EC2 instance, the value will be the
AWS ID of that instance, otherwise it will contain the string: None
Snapshot ID – AWS ID of the snapshot
Succeeded – Yes or No
End Time – Time the snapshot ended (start time is the start time of the backup)
Deleted At – Time of deletion, or N/A, if the snapshot was not deleted yet
98
15.8 Usage Reports
In addition to raw reports you can also download CSV usage reports. A usage report for a
user will give the number of AWS accounts, instance and non-instance storage this user is
consuming. This can be helpful for inter-user accounting. For each user, there’s a link “usage
report for current user.” For the root user, there’s also a link “usage report for all users”
which will give all the breakdown of usage between all the users on the CPM server.
99
16 CPM User Management
CPM is built for a multi-user environment. At the configuration stage you define a user
which is the root user. The root user can create additional users (depending on the edition
of CPM you are subscribed to). Additional users are helpful if you are a managed service
provider, in need of managing multiple customers from one CPM server or if you have
different users or departments in your organization, each managing their own AWS
resources. For instance, you may have a QA department, a Development Department and IT
department, each with their own AWS account/s.
To define additional users you need to click on the “Manage Users” button at the top of any
CPM screen (available only for the root/admin user).
Figure 16-1
There are two types of users you can define (you can also switch types after a user is
already created):
100
To create a managed user, click on the “Add New User” button in the “Manage Users”
screen, and fill in the type as “Managed.” If the root user does not want managed users to
login at all, he/she should not supply any credentials to them.
Figure 16-2
Besides the user name, email, password and user type, the root user can set limitations on
the amount of resources this user is allowed to have. The limitations include number of
accounts, instances and volume of non-instance storage: independent EBS volumes and RDS
databases. If you leave these fields empty, there is no limitation on resources, except the
system level limitations that are derived from the CPM edition used.
When editing a user, the root user can modify email, type of user and limitations. The user
name cannot be modified once a user is created.
16.4 Delegates
Delegates are a special kind of user, which is managed via a separate screen. Delegates are
similar to IAM users in AWS, they are credentials used to log in and access another user’s
environment. That access is given with specific permissions. For each user, whether it’s the
root user, an independent user or a managed user, there is a button “delegates” that
redirects to the delegates screen for that user:
101
Figure 16-3
You can add as many delegates as needed for each user and also edit any delegate’s
settings:
Figure 16-4
The settings include the delegate name (cannot be modified once a delegate is created),
email, and permissions. In a separate button in the delegates screen, the root user can reset
passwords for delegates.
102
edit accounts and modify credentials. “Allow Backup Changes” will allow changing policies:
adding, removing and editing policies and schedule, as well as adding and removing backup
targets.
Allowing all permissions will allow the delegate everything the original user does except
notification settings. For delegates of the root/admin user, they won’t be able to change
notification settings, general settings or manage users.
103