Skip to content

Commit 008a107

Browse files
Lev Kokotovgitbook-bot
authored andcommitted
GITBOOK-62: change request with no subject merged in GitBook
1 parent 0b9a526 commit 008a107

File tree

3 files changed

+178
-0
lines changed

3 files changed

+178
-0
lines changed

pgml-docs/docs/guides/SUMMARY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@
5858
* [Pooler](deploying-postgresml/self-hosting/pooler.md)
5959
* [Building from source](deploying-postgresml/self-hosting/building-from-source.md)
6060
* [Replication](deploying-postgresml/self-hosting/replication.md)
61+
* [Backups](deploying-postgresml/self-hosting/backups.md)
62+
* [Running on EC2](deploying-postgresml/self-hosting/running-on-ec2.md)
6163
* [PgCat](pgcat.md)
6264
* [Benchmarks](benchmarks/README.md)
6365
* [PostgresML is 8-40x faster than Python HTTP microservices](benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md)
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Backups
2+
3+
Regular backups are necessary for pretty much any kind of PostgreSQL deployment. Even in development accidents happen, and instead of losing data one can always restore from a backup and get back to a working state.
4+
5+
PostgresML backups work the same way as regular PostgreSQL database backups. PostgresML stores its data in regular Postgres tables, which will be backed up together with your other tables and schemas.
6+
7+
### Architecture
8+
9+
Postgres backups are composed of two (2) components: a Write-Ahead Log archive and the copies of the data files. The WAL archive will store every single write made to the database. The data file copies will contain point-in-time snapshots of what your databases had, going back up to the retention period of the backup repository.
10+
11+
Using the WAL and backups together, Postgres can be restored to any point-in-time version of the database. This is a very powerful tool used for development and disaster recovery.
12+
13+
### Configure the archive
14+
15+
If you have followed the [Replication](replication.md) guide, you should have a working WAL archive. If not, take a look to get your archive configured. You can come back to this guide once you have working WAL archive.
16+
17+
### Take your first backup
18+
19+
Since we are using pgBackRest already for archiving WAL, we can continue to use it to take backups. pgBackRest can easily take full and incremental backups of pretty large database clusters. We've used in previously in production to backup terabytes of Postgres data on a weekly basis.
20+
21+
To take a backup using pgBackRest, you can simply run this command:
22+
23+
```bash
24+
pgbackrest backup --stanza=main
25+
```
26+
27+
Once the command completes, you'll have a full backup of your database cluster safely stored in your S3 bucket. If you'd like to see what it takes to take a backup of a PostgreSQL database, you can add this to the command above:
28+
29+
```
30+
--log-level-console=debug
31+
```
32+
33+
pgBackRest will log every single step it does to take a working backup.
34+
35+
### Restoring from backup
36+
37+
When a disaster happens or you just would like to travel back in time, you can restore your database from your latest backup with just a couple commands.
38+
39+
#### Stop the PostgreSQL server
40+
41+
Restoring from backup will completely overwrite your existing database files. Therefore, don't do this unless you actually need to restore from backup.
42+
43+
To do so, first, stop the PostgreSQL database server, if it's running:
44+
45+
```
46+
sudo service postgresql stop
47+
```
48+
49+
#### Restore the latest backup
50+
51+
Now that PostgreSQL is no longer running, you can restore the latest backup using pgBackRest:
52+
53+
```
54+
pgbackrest restore --stanza=main --delta
55+
```
56+
57+
The `--delta` option will make pgBackRest check every single file in the Postgres data directory and, if it's different, overwrite it with the one saved in the backup repository. This is a quick way to restore a backup when most of the database files have not been corrupted or modified.
58+
59+
#### Start the PostgreSQL server
60+
61+
Once complete, your PostgreSQL server is ready to start again. You can do so with:
62+
63+
```
64+
sudo service postgresql start
65+
```
66+
67+
This will start PostgreSQL and make it check its local data files for consistency. This will be done pretty quickly and when complete, Postgres will start downloading and re-applying Write-Ahead Log files from the archive. When that operation completes, your PostgreSQL database will start and you'll be able to connect and use it again.
68+
69+
Depending on how much data has been written to the archive since the last backup, the restore operation could take a bit of time. To minimize the time it takes for Postgres to start again, you can take more frequent backups, e.g. every 6 hours or every 2 hours. While costing more in storage and compute, this will ensure that your database recovers from a disaster much quicker than would of otherwise happened with just a daily backup.
70+
71+
### Managing backups
72+
73+
Backups can take a lot of space over time and some of them may no longer be needed. You can view what backups and WAL files are stored in your S3 bucket with:
74+
75+
```
76+
pgbackrest info
77+
```
78+
79+
#### Retention policy
80+
81+
For most production deployments, you don't need or should retain more than a few backups. We would usually recommend keeping two (2) weeks of backups and WAL files, which should be enough time to notice that some data may be missing and needs to be restored.
82+
83+
If you run full backups once a day (which should be plenty), you can set your pgBackRest backup retention policy to 14 days, by adding a couple settings to your `/etc/pgbackrest.conf` file:
84+
85+
```
86+
[global]
87+
repo1-retention-full=14
88+
repo1-retention-archive=14
89+
```
90+
91+
This configuration will ensure that you have at least 14 backups and 14 backups worth of WAL files. Because Postgres allows point-in-time recovery, you'll be able to restore your database to any version (up to millisecond precision) going back two (2) weeks.
92+
93+
#### Automating backups
94+
95+
Backups can be automated by running `pgbackrest backup --stanza=main` with a cron. You can edit your cron with `crontab -e` and add a daily midnight run, ensuring that you have fresh backups every day. Make sure you're editing the crontab of the `postgres` user since no other user will be allowed to backup Postgres or read the pgBackRest configuration file.
96+
97+
### PostgresML considerations
98+
99+
Since PostgresML stores most of its data in regular Postgres tables, a PostgreSQL backup is a valid PostgresML backup. The only thing stored outside of Postgres is the Hugging Face LLM cache, which is stored directly on disk in `/var/lib/postgresql/.cache`. In case of a disaster, the cache will be lost, but that's fine. Since it's only a cache, next time a PostgresML `pgml.embed()` or `pgml.transform()` function is used, PostgresML will automatically repopulate all the necessary files in the cache from Hugging Face and resume normal operations.
100+
101+
#### HuggingFace cold starts
102+
103+
In order to avoid cold starts, it's reasonable to backup the entire contents of the cache in a separate S3 location. When restoring from backup, one can just use `aws s3 sync` to download everything that should be in the cache folder back onto the machine. Make sure to do so before you start PostgreSQL in order to avoid a race condition with the Hugging Face library.
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Running on EC2
2+
3+
AWS EC2 has been around for quite a while and requires no introduction. Running PostgresML on EC2 is very similar as any other cloud provider or on-prem deployment, but it does provide a few additional features that allow PostgresML to pretty easily scale into terabytes and beyond.
4+
5+
### Operating system
6+
7+
We're big fans of Ubuntu and use it in our Cloud. AWS provides its own Ubuntu images (called AMIs, or Amazon Machine Images) which work very well and come with all the standard tools needed to run a PostgreSQL server.
8+
9+
### Storage
10+
11+
The choice of storage is critical to scalable and performant AI database operations. PostgresML deals with large datasets and even larger models, so performant and durable storage is important.
12+
13+
EC2 provides two kinds of storage that can be used for running databases: EBS (Elastic Block Storage) and ephemeral NVMes. NVMe storage is typically faster than EBS and provides much lower latency, but it does lack some of the durability guarantees that one may want from a database deployment. We've ran databases on both, but currently prefer to use EBS because it allows us to take instant backups of our databases and to scale the storage of a database cluster independently from compute.
14+
15+
#### Choosing storage type
16+
17+
EBS has many different kinds of volumes, such as `gp2`, `gp3`, `io1`, `io2`, etc. The type of volume to use really depends on the cost/benefit analysis for the deployment in question. For example, if money is no object, running on `io2` would provide pretty great performance and durability guarantees. That being said, most deployments would be quite happy running on `gp3`.
18+
19+
#### Choosing the filesystem
20+
21+
The choice of the filesystem is a bit like getting married. You should really know what you're getting yourself into and more often than not, you're choice will stay with you for years to come. We've benchmarked and used many different types of filesystems in production, including ext4, ZFS, btrfs and NTFS. Our current filesystem of choice is ZFS because of its high durability, consistency and reasonable performance guarantees.
22+
23+
### Backups
24+
25+
If you choose to use EBS for your database storage, special consideration should be taken around backups. If you decide to use pgBackRest to backup your database, you needn't read any further, however if you'd like to use EBS snapshots, there is a quick tip that could save you from problematic outcomes down the line.
26+
27+
EBS snapshots are an atomic point-in-time copy of the EBS volume. That means that if you take a snapshot of an EBS volume and restore it, whatever you have on that volume at the time of the snapshot will be exactly the way you left it. However, if you take a snapshot while you're writing to the volume, that write may only be partially saved in the snapshot. This is because EBS snapshots are controlled by the EBS server and the filesystem is not aware of its internal operations or that it's taking a snapshot at all. This is very similar to how snapshots work on hardware RAID volume managers.
28+
29+
If you don't pause writes to your filesystem before you take an EBS snapshot, you will run the risk of possibly losing some of your data, or in the worst case, corrupting your filesystem. That means, if you're using a filesystem like ext4, consider running `fsfreeze(8)` before taking an EBS backup.
30+
31+
If you're like us and prefer ZFS, you don't need to do anything. ZFS is a copy-on-write filesystem that guarantees that all writes made to it are atomic. So even if the EBS volume cuts it off mid write, the filesystem will always be in a consistent state, although you may lose that last write that never fully made it into the snapshot.
32+
33+
#### Taking an EBS backup
34+
35+
You can use EBS backups for creating replicas and for disaster recovery. An EBS backup works exactly like `pg_basebackup` except it's instantaneous. To ensure that your backup is easily restorable, make sure to first create the `/var/lib/postgresql/14/main/standby.signal` file and only then taking a snapshot.
36+
37+
This ensures that when you restore from that backup, Postgres does not automatically promote itself and start accepting writes. If that happens, you won't be able to use it as a replica without getting into `pg_rewind`.
38+
39+
Alternatively, you can disable the `posgresql` service by default ensuring that Postgres does not start on system boot automatically.
40+
41+
#### pgBackRest
42+
43+
If you're using pgBackRest for backups and archiving, you can take advantage of EC2 IAM integration. Instead of saving AWS IAM keys and secrets in `/etc/pgbackrest.conf`, you can instead configure it to fetch temporary credentials from the EC2 API:
44+
45+
```
46+
[global]
47+
repo1-s3-key-type=auto
48+
```
49+
50+
Make sure that your EC2 IAM role has sufficient permissions to access your WAL archive S3 bucket.
51+
52+
### Performance
53+
54+
A typical single volume storage configuration is fine for low traffic databases. However, if you need additional performance, you have a few options. One option is to simply allocate more IOPS to your volume. That works, but that may be a bit costly when used at scale. Another option is to RAID multiple EBS volumes into either a RAID0 for maximum throughput or a RAIDZ1 for good throughput and reasonable durability guarantees.
55+
56+
ZFS supports both RAID0 and RAIDZ1 configurations. If you have say 4 volumes, you can setup a RAID0 with just a couple commands:
57+
58+
```
59+
zfs create tank /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1
60+
zfs create -o mountpoint=/var/lib/postgresql tank/pgdata
61+
```
62+
63+
or a RAIDZ1 with 5 volumes:
64+
65+
<pre><code><strong>zfs create tank raidz /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1
66+
</strong></code></pre>
67+
68+
RAIDZ1 protects against single volume failure, allowing you to replace an EBS volume without taking your database offline or restoring from backup. Considering EBS guarantees and additional redundancy provided by RAIDZ, this is a reasonable configuration to use for systems that require good durability and performance guarantees.
69+
70+
A RAID configuration with at 4 volumes allows up to 4x read throughput which, in EBS terms, can produce up to 600MBps, without having to pay for additional IOPS.
71+
72+
####
73+

0 commit comments

Comments
 (0)