|
5 | 5 |
|
6 | 6 | <para>
|
7 | 7 | This chapter explains page level compression and encryption in
|
8 |
| - <productname>PostgreSQL</> database system. |
| 8 | + <productname>&productname;</> database system. |
9 | 9 | </para>
|
10 | 10 |
|
11 | 11 | <sect1 id="cfs-overview">
|
12 | 12 | <title>Why database compression/encryption may be useful</title>
|
13 | 13 |
|
14 | 14 | <para>
|
15 | 15 | Databases are used to store larger number of text and duplicated information. This is why compression of most of databases
|
16 |
| - can be quite efficient and reduce used storage size 3..5 times. PostgreSQL performs compression of TOAST data, but small |
| 16 | + can be quite efficient and reduce used storage size 3..5 times. &productname; performs compression of TOAST data, but small |
17 | 17 | text fields which fits in the page are not compressed. Also not only heap pages can be compressed, indexes on text keys
|
18 | 18 | or indexes with larger number of duplicate values are also good candidates for compression.
|
19 | 19 | </para>
|
20 | 20 |
|
21 | 21 | <para>
|
22 |
| - PostgreSQL is working with disk data through buffer pool which accumulates most frequently used buffers. |
| 22 | + &productname; is working with disk data through buffer pool which accumulates most frequently used buffers. |
23 | 23 | Interface between buffer manager and file system is the most natural place for performing compression.
|
24 | 24 | Buffers are stored on the disk in compressed form for reducing disk usage and minimizing amount of data to be read.
|
25 | 25 | And in-memory buffer pool contains uncompressed buffers, providing access to the records at the same speed as without
|
|
49 | 49 | <listitem>
|
50 | 50 | <para>
|
51 | 51 | When modified buffers are flushed from buffer pool to the disk, they are written to the random locations
|
52 |
| - on the disk. PostgreSQL cache replacement algorithm makes a decision about throwing away buffer from the pool |
| 52 | + on the disk. &productname; cache replacement algorithm makes a decision about throwing away buffer from the pool |
53 | 53 | based on its access frequency and ignoring its location on the disk. So two subsequently written buffers can be
|
54 | 54 | located in completely different parts of the disk. For HDD seek time is quite large - about 10msec, which corresponds
|
55 | 55 | to 100 random writes per second. And speed of sequential write can be about 100Mb/sec, which corresponds to
|
56 | 56 | 10000 buffers per second (100 times faster). For SSD gap between sequential and random write speed is smaller,
|
57 | 57 | but still sequential writers are more efficient. How it relates to data compression?
|
58 |
| - Size of buffer in PostgreSQL is fixed (8kb by default). Size of compressed buffer depends on the content of the buffer. |
| 58 | + Size of buffer in &productname; is fixed (8kb by default). Size of compressed buffer depends on the content of the buffer. |
59 | 59 | So updated buffer can not always fit in its old location on the disk. This is why we can not access pages directly
|
60 | 60 | by its address. Instead of it we have to use map which translates logical address of the page to its physical location
|
61 | 61 | on the disk. Definitely this extra level of indirection adds overhead. But in most cases this map can fit in memory,
|
62 | 62 | so page lookup is nothing more than just accessing array element. But presence of this map also have positive effect:
|
63 | 63 | we can now write updated pages sequentially, just updating their map entries.
|
64 |
| - PostgreSQL is doing much to avoid "write storm" intensive flushing of data to the disk when buffer pool space is |
| 64 | + &productname; is doing much to avoid "write storm" intensive flushing of data to the disk when buffer pool space is |
65 | 65 | exhausted. Compression allows to significantly reduce disk load.
|
66 | 66 | </para>
|
67 | 67 | </listitem>
|
|
72 | 72 | Another useful feature which can be combined with compression is database encryption.
|
73 | 73 | Encryption allows to protected you database from unintended access (if somebody stole your notebook, hard drive or make
|
74 | 74 | copy from it, thief will not be able to extract information from your database if it is encrypted).
|
75 |
| - PostgreSQL provide contrib module pgcrypto, allowing you to encrypt some particular types/columns. |
| 75 | + &productname; provide contrib module pgcrypto, allowing you to encrypt some particular types/columns. |
76 | 76 |
|
77 | 77 | But safer and convenient way is to encrypt all data in the database. Encryption can be combined with compression.
|
78 | 78 | Data should be stored at disk in encrypted form and decrypted when page is loaded in buffer pool.
|
|
81 | 81 | </para>
|
82 | 82 |
|
83 | 83 | <para>
|
84 |
| - Why do we need to perform compression/encryption in PostgreSQL and do not use correspondent features of underlying file |
| 84 | + Why do we need to perform compression/encryption in &productname; and do not use correspondent features of underlying file |
85 | 85 | systems? First answer is that there are not so much file system supporting compression and encryption for all OSes.
|
86 | 86 | And even if such file systems are available, it is not always possible/convenient to install such file system just
|
87 | 87 | to compress/protect your database. Second question is that performing compression at database level can be more efficient,
|
|
91 | 91 | </sect1>
|
92 | 92 |
|
93 | 93 | <sect1 id="cfs-implementation">
|
94 |
| - <title>How compression/encryption are integrated in PostgreSQL</title> |
| 94 | + <title>How compression/encryption are integrated in &productname;</title> |
95 | 95 |
|
96 | 96 | <para>
|
97 |
| - To improve efficiency of disk IO, PostgreSQL is working with files through buffer manager, which pins in memory |
| 97 | + To improve efficiency of disk IO, &productname; is working with files through buffer manager, which pins in memory |
98 | 98 | most frequently used pages. Each page is fixed size (8kb by default). But if we compress page, then
|
99 | 99 | its size will depend on its content. So updated page can require more (or less) space than original page.
|
100 | 100 | So we may not always perform in-place update of the page. Instead of it we have to locate new space for the page and somehow release
|
|
137 | 137 | </para>
|
138 | 138 |
|
139 | 139 | <para>
|
140 |
| - PostgreSQL stores relation in a set of files, size of each file is not exceeding 2Gb. Separate page map is constructed for each file. |
| 140 | + &productname; stores relation in a set of files, size of each file is not exceeding 2Gb. Separate page map is constructed for each file. |
141 | 141 | Garbage collection in CFS is done by several background workers. Number of this workers and pauses in their work can be
|
142 | 142 | configured by database administrator. These workers are splitting work based on inode hash, so them do not conflict with each other.
|
143 | 143 | Each file is proceeded separately. The files is blocked for access at the time of garbage collection but complete relation is not
|
|
150 | 150 | </para>
|
151 | 151 |
|
152 | 152 | <para>
|
153 |
| - CFS can be build with several compression libraries: PostgreSQL lz, zlib, lz4, snappy, lzfse... |
| 153 | + CFS can be build with several compression libraries: &productname; lz, zlib, lz4, snappy, lzfse... |
154 | 154 | But this is build time choice: it is not possible now to dynamically choose compression algorithm.
|
155 |
| - CFS stores information about the compression algorithm used in a tablespace and produces error if PostgreSQL is built with different |
| 155 | + CFS stores information about the compression algorithm used in a tablespace and produces error if &productname; is built with different |
156 | 156 | library.
|
157 | 157 | </para>
|
158 | 158 |
|
|
0 commit comments