Writing data
Your client requests PostgreSQL to write data after inserting, updating, or
deleting a row on a table. PostgreSQL needs to bring the page into memory (if
not there), dirty it, and eventually write the page to disk. To improve
performance, it usually writes the transaction first to the WAL log, and the
page gets written to disk by a separate process.
Transactions also complicate the equation, since PostgreSQL might need to
keep several different versions of the data in memory. This is done using
multi-version concurrency control (MVCC), described later.
Learn how PostgreSQL processes a write request below.
● Client issues write request.
● User backend checks for data in shared_buffers.
● If not, reads into shared_buffers.
● User backend dirties data in shared_buffers.
● User backend records transaction in WAL.
● On commit, walwriter commits wal_buffers to pg_xlog.
● bgwriter (clock sweep) or checkpoint (forced) writes dirtied buffers to disk.
Multi-version concurrency control
Given that several clients may access the same data simultaneously, database
systems need to enable concurrency control. Relational database systems
provide the concept of transactions, or units of work that are executed
atomically. This can change several rows in different tables, which makes
concurrency control even harder.
Many database systems use locks to implement concurrency control, which is
feasible, but can result in decreased concurrency as early transactions block
newer ones. PostgreSQL uses multi-version concurrency control (MVCC),
keeping several versions of the same data in memory when needed, and
providing the appropriate version to each transaction. This provides for
concurrent access, but leads to problems if you mix long-lived transactions
with lots of short ones.
Learn more about concurrency control in PostgreSQL below.
Why MVCC?
Maintains data consistency internally.
Enables each transaction to see a snapshot of data (a database version) as it
was some time ago while querying a database.
Prevents transactions from viewing inconsistent data.
Provides transaction isolation in concurrent transactions.
Ensures that readers do not block writers, and writers do not block readers.
Logical and physical layouts
PostgreSQL lays out many of its structures like a file system. Here we will cover
the most important logical structures, and how they might map to physical
structures. Some logical items, like tablespaces and users, do not map to a
particular physical structure.
Learn more about PostgreSQL layout structures below.
instance
–
● Ties to a directory (data directory)
● Corresponds to one TCP/IP port (for postmaster)
● One postmaster
● One set of shared_buffers
database
–
● Multiple databases per instance
● Each database maps to a directory under base in the instance’s data directory
(named by its oid, which you can find on the pg_database table)
schema/namespace
–
● Purely logical grouping of relations
● A schema only exists in catalogs, with no real physical structure
tablespace
–
● A directory on disk where PostgreSQL can store relations
● Used for tiering storage
user
–
● Can be assigned across databases
● Does not have physical structure (other than in catalogs)