0% found this document useful (0 votes)
6 views13 pages

Understanding The PostgreSQL Architecture - Severalnines

The document provides an overview of PostgreSQL architecture, detailing its components such as shared memory, process types, and database structure. It explains the roles of various processes including the Postmaster, background processes, and backend processes, as well as the significance of shared buffers and WAL buffers. Additionally, it discusses the creation of user databases and tablespaces, and highlights the importance of the MVCC model in PostgreSQL's concurrency control.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

Understanding The PostgreSQL Architecture - Severalnines

The document provides an overview of PostgreSQL architecture, detailing its components such as shared memory, process types, and database structure. It explains the roles of various processes including the Postmaster, background processes, and backend processes, as well as the significance of shared buffers and WAL buffers. Additionally, it discusses the creation of user databases and tablespaces, and highlights the importance of the MVCC model in PostgreSQL's concurrency control.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

B LO G

Understanding the PostgreSQL


Architecture
Chitij Chauhan
Published: October 2, 2017
Last Updated: June 27, 2025

PostgreSQL is probably the most advanced database in the open source relational
database market. It was first released in 1989, and since then, there have been a lot
of enhancements. According to db-engines, it is the fourth most used database at the
time of writing.

In this blog, we will discuss PostgreSQL internals, its architecture, and how the various
components of PostgreSQL interact with one another. This will serve as a primer for
anyone interested in operationalizing a PostgreSQL database in a production
environment.

https://severalnines.com/blog/understanding-postgresql-architecture/ 2/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

PostgreSQL Architecture
The physical structure of PostgreSQL is very simple. It consists of shared memory
and a few background processes and data files. (See Figure 1-1)

Figure 1-1. PostgreSQL structure

Shared Memory
Shared Memory refers to the memory reserved for database caching and transaction
log caching. The most important elements in shared memory are Shared Buffer and
WAL buffers

Shared Buffer

The purpose of Shared Buffer is to minimize DISK IO. For this purpose, the following
principles must be met

You need to access very large (tens, hundreds of gigabytes) buffers quickly.
You should minimize contention when many users access it at the same time.
Frequently used blocks must be in the buffer for as long as possible

https://severalnines.com/blog/understanding-postgresql-architecture/ 3/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

WAL Buffer

The WAL buffer is a buffer that temporarily stores changes to the database. The
contents stored in the WAL buffer are written to the WAL file at a predetermined
point in time. From a backup and recovery point of view, WAL buffers and WAL files
are very important.

PostgreSQL Process Types


PostgreSQL has four process types.

Postmaster (Daemon) Process


Background Process
Backend Process
Client Process

Postmaster Process

The Postmaster process is the first process started when you start PostgreSQL. At
startup, performs recovery, initialize shared memory, and run background processes.
It also creates a backend process when there is a connection request from the client
process. (See Figure 1-2)

Figure 1-2. Process relationship diagram

https://severalnines.com/blog/understanding-postgresql-architecture/ 4/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

If you check the relationships between processes with the pstree command, you can
see that the Postmaster process is the parent process of all processes. (For clarity, I
added the process name and argument after the process ID)

Background Process

The list of background processes required for PostgreSQL operation are as follows.
(See Table 1-1)

Process Role

logger Write the error message to the log file.

When a checkpoint occurs, the dirty


checkpointer buffer is written to the file.

Periodically writes the dirty buffer to a


writer file.

wal writer Write the WAL buffer to the WAL file.

Fork autovacuum worker when


autovacuum is enabled.It is the
Autovacuum launcher responsibility of the autovacuum daemon
to carry vacuum operations on bloated
tables on demand

When in Archive.log mode, copy the WAL


archiver file to the specified directory.

https://severalnines.com/blog/understanding-postgresql-architecture/
DBMS t ti ti h i 5/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines
DBMS usage statistics such as session
execution information ( pg_stat_activity )
stats collector and table usage statistical information (
pg_stat_all_tables ) are collected.

Backend Process

The maximum number of backend processes is set by the max_connections


parameter, and the default value is 100. The backend process performs the query
request of the user process and then transmits the result. Some memory structures
are required for query execution, which is called local memory. The main parameters
associated with local memory are:

work_mem Space used for sorting, bitmap operations, hash joins, and merge joins.
The default setting is 4 MB.
Maintenance_work_mem Space used for Vacuum and CREATE INDEX . The default
setting is 64 MB.
Temp_buffers Space used for temporary tables. The default setting is 8 MB.

Client Process

Client Process refers to the background process that is assigned for every backend
user connection.Usually the postmaster process will fork a child process that is
dedicated to serve a user connection.

Database Structure
Here are some things that are important to know when attempting to understand the
database structure of PostgreSQL.

Items related to the database

PostgreSQL consists of several databases. This is called a database cluster.


When initdb () is executed, template0 , template1 , and postgres databases are
https://severalnines.com/blog/understanding-postgresql-architecture/ 6/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

created.
The template0 and template1 databases are template databases for user database
creation and contain the system catalog tables.
The list of tables in the template0 and template1 databases is the same
immediately after initdb (). However, the template1 database can create objects
that the user needs.
The user database is created by cloning the template1 database.

Items related to the tablespace

The pg_default and pg_global tablespaces are created immediately after initdb().
If you do not specify a tablespace at the time of table creation, it is stored in the
pg_dafault tablespace.
Tables managed at the database cluster level are stored in the pg_global
tablespace.
The physical location of the pg_default tablespace is $PGDATAbase.
The physical location of the pg_global tablespace is $PGDATAglobal.
One tablespace can be used by multiple databases. At this time, a database-
specific subdirectory is created in the table space directory.
Creating a user tablespace creates a symbolic link to the user tablespace in the
$PGDATAtblspc directory.

Items related to the table

There are three files per table.


One is a file for storing table data. The file name is the OID of the table.
One is a file to manage table free space. The file name is OID_fsm .
One is a file for managing the visibility of the table block. The file name is OID_vm
.
The index does not have a _vm file. That is, OID and OID_fsm are composed of two
files.

Other Things to Remember…


https://severalnines.com/blog/understanding-postgresql-architecture/ 7/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

The file name at the time of table and index creation is OID, and OID and
pg_class.relfilenode are the same at this point. However, when a rewrite operation (
Truncate , CLUSTER , Vacuum Full , REINDEX , etc.) is performed, the relfilenode
value of the affected object is changed, and the file name is also changed to the
relfilenode value. You can easily check the file location and name by using
pg_relation_filepath (‘< object name >’). template0, template1, postgres database

Running Tests
If you query the pg_database view after initdb() , you can see that the template0 ,
template1 , and postgres databases have been created.

Through the datistemplate column, you can see that the template0 and
template1 databases are database for template for user database creation.
The datlowconn column indicates whether the database can be accessed. Since
the template0 database can’t be accessed, the contents of the database can’t be
changed either.
The reason for providing two databases for the templateis that the template0
database is the initial state template and the template1 database is the template
added by the user.
The postgres database is the default database created using the template1
database. If you do not specify a database at connection time, you will be
connected to the postgres database.
The database is located under the $PGDATA/base directory. The directory name
is the database OID number.

https://severalnines.com/blog/understanding-postgresql-architecture/ 8/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

Create User Database


The user database is created by cloning the template1 database. To verify this, create
a user table T1 in the template1 database. After creating the mydb01 database,
check that the T1 table exists. (See Figure 1-3.)

Figure 1-3. Relationship between Template Database and User Database

pg_default tablespace

If you query pg_tablespace after initdb (), you can see that the pg_default and
pg_global tablespaces have been created.

https://severalnines.com/blog/understanding-postgresql-architecture/ 9/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

The location of the pg_default tablespace is $PGDATAbase. There is a subdirectory


by database OID in this directory. (See Figure 1-4)

Figure 1-4. Pg_default tablespace and database relationships from a physical configuration
perspective

pg_global tablespace

The pg_global tablespace is a tablespace for storing data to be managed at the


‘database cluster’ level.

For example, tables of the same type as the pg_database table provide the same
information whether they are accessed from any database. (See Figure 1-5)
The location of the pg_global tablespace is $PGDATAglobal.

https://severalnines.com/blog/understanding-postgresql-architecture/ 10/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

Figure 1-5. Relationship between pg_global tablespace and database

Create User Tablespace


postgres=# create tablespace myts01 location '/data01';

The pg_tablespace shows that the myts01 tablespace has been created.

Symbolic links in the $PGDATA/pg_tblspc directory point to tablespace directories.

Connect to the postgres and mydb01 databases and create the table.

https://severalnines.com/blog/understanding-postgresql-architecture/ 11/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

If you look up the /data01 directory after creating the table, you will see that the OID
directory for the postgres and mydb01 databases has been created and that there is
a file in each directory that has the same OID as the T1 table.

How to Change Tablespace Location

PostgreSQL specifies a directory when creating tablespace. Therefore, if the file


system where the directory is located is full, the data can no longer be stored. To
solve this problem, you can use the volume manager. However, if you can’t use the
volume manager, you can consider changing the tablespace location. The order of
operation is as follows.

https://severalnines.com/blog/understanding-postgresql-architecture/ 12/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

Note: Tablespaces are also very useful in environments that use partition tables.
Because you can use different tablespaces for each partition table, you can more
flexibly cope with file system capacity problems.

What is Vacuum?
Vacuum does the following:

Gathering table and index statistics


Reorganize the table
Clean up tables and index dead blocks
Frozen by record XID to prevent XID Wraparound

#1 and #2 are generally required for DBMS management. But #3 and #4 are
necessary because of the PostgreSQL MVCC feature.

The biggest difference I think is the MVCC model and the existence of a shared pool.
This is also considered a feature of PostgreSQL. (See Table 1-2)

Item ORACLE PostgreSQL

MVCC model UNDO Store previous

Implementation method Segment record within block

Shared Pool exists it does not exist


https://severalnines.com/blog/understanding-postgresql-architecture/ 13/19
7/10/25, 12:42 PM Understanding the PostgreSQL Architecture | Severalnines

Differences in the MVCC Model

To increase concurrency, you must follow the principle that “read operations do not
block write operations and write operations should not block read operations”. To
implement this principle, a Multi Version Concurrency Control (MVCC) is required.
Oracle uses UNDO segments to implement MVCC. On the other hand, PostgreSQL
uses a different way to store previous records in a block. It uses the transaction XID
and xmin and xmax pseudo columns for transaction row versioning.

Shared Pool

PostgreSQL does not provide a shared pool. This is somewhat embarrassing for
users familiar with Oracle. Shared Pool is a very important and essential component
in Oracle. PostgreSQL provides the ability to share SQL information at the process
level instead of the Shared Pool. In other words, if you execute the same SQL several
times in one process, it will hard-parse only once.

Wrapping up
By understanding the PostgreSQL architecture and concepts, you will have a good
grounding from which to inform your use of it in production workloads. But, you don’t
have to go it alone — modern tooling, like our very own ClusterControl (which you can
try free for 30 days), make deploying and running PostgreSQL clusters a lot simpler
in any environment.

If you aren’t ready to test ClusterControl, you can get a thorough understanding of
automating PostgreSQL operations with ClusterContol in this white paper. In the meantime,
don’t forget to join our monthly newsletter below and follow us on  Twitter or
 LinkedIn to stay up-to-date with the Postgres and open-source database world.

https://severalnines.com/blog/understanding-postgresql-architecture/ 14/19

You might also like