POSTGRESQL DATABASE ADMINISTRATION
By : Siddheshwar Soni, DataXorg data Services.
PostgreSQL is an advanced, enterprise class open source Object-
Relational Database Management System (ORDBMS). It is a relational
database that supports both SQL (relational) and JSON (non-relational)
querying.
PostgreSQL is having robust feature-sets including
Multi-Version Concurrency Control (MVCC),
Point in time recovery,
Tablespaces,
Asynchronous replication,
Nested transactions,
Online/hot backups,
Refined query planner/optimizer, and
Write ahead logging. It supports international character sets, multi-byte
character encodings, Unicode, and it is locale-aware for sorting, case-
sensitivity, and formatting. PostgreSQL is highly scalable both in the
quantity of data it can manage and in the number of concurrent users it can
accommodate.
Topics covered are:
• PostgreSQL Process and Memory Architecture
• Installation of PostgreSQL v12 - Windows and Linux
• Setup PostgreSQL v12 Environment Variables - Windows and Linux
• PostgreSQL Page Layout
• Database Directory Layout
• PostgreSQL Configuration Files
• Cluster in PostgreSQL (Initdb,Start/Stop,Reload/Restart Cluster)
• Createdb/Dropdb, Createuser/Dropuser Utility
• Create Schema and Schema Search Path
• Psql Interface Commands
• Pg System Catalogs
• Working with PostgreSQL Objects
PostgreSQL Administration (Windows & Linux) 1
• CRUD Operations
• Table Inheritance
• Table Partitioning
• Tablespace in PostgreSQL
• Backup, Continuous Archiving and PITR (Point-in-Time-Recovery)
• Explain plan and Query Execution Cost
• Maintenance (Updating Planner Statistics, Vacuum, Routine Reindex &
Cluster)
• Data Fragmentation
• Transaction ID wraparound, Vacuum freeze, Auto_Vacuum Demon
• Pgadmin 4
• Uninstall PostgreSQL v12 on Windows and Linux
• Bonus Module
• Journey into Postgresql v13
• Postgresql v13 - New Features
What you’ll learn
• Understand PostgreSQL Process and Memory Architecture.
• Install PostgreSQL v12 on Windows and Linux.
• How to Setup Environment Variable on Windows and Linux.
• PostgreSQL Directory Structure.
• PostgreSQL Configuration Files.
• How to use PSQL command line interface to connect to PostgreSQL.
• How to Create Database/User/Schema from utility and Psql commands.
• Learn how to use pg system catalogs.
• CRUD operations on PostgreSQL.
• Table Inheritance in PostgreSQL.
• Partitioning in PostgreSQL using Table Inheritance.
• Copy table from/to PostgreSQL.
• How to Create/Move/Drop Tablespace.
• Logical (pg dump, pgdumpall), Physical backup (Offline, Online).
PostgreSQL Administration (Windows & Linux) 2
• Continuous Archiving and PITR.
• What is Explain plan and how to calculate cost of query.
• Maintenance (Updating planner statistics, Vacuum, Routine Reindex,
Cluster).
• What is Transaction Id Wraparound Failure and Vacuum Freeze.
• How to use Pgadmin 4 Interface.
• Uninstall PostgreSQL on Windows and Linux.
• Bonus Module
• Course Extra Documents
• Journey into Postgresql v13
• Postgresql v13 - New Features
Are there any course requirements or prerequisites?
• Basic Knowledge of Database Concepts
• Internet access is required to download PostgreSQL Server
• Working Knowledge on Windows and Linux
Who this course is for:
• Database Administrators
• Accidental DBA's
• Developers
• Architects and Analyst
• Students
• Anyone who is curious about PostgreSQL
PostgreSQL Administration (Windows & Linux) 3
Introduction to PostgreSQL
What is PostgreSQL
PostgreSQL is a open-source free and Object Oriented relational database
management system. It supports object oriented features i.e classes, objects
, inheritance. Developers can use classes & inheritance in code.
PostgreSQL was developed as a research project in 1986 by Mr. Michel
Stonebreker in the university of callifornia in Berkeley.
PostgreSQL is Cross-Platform and runs on multiple operating systems i.e.
Linux , windows , macos, FreeBSD etc.
PostgreSQL is complying with ACID Properties (Atomicity , Consistency ,
Isolation and Durability) that ensures the completion of transaction in all
or nothing. For example , if you transferring money to your friend and the
server is crashed , in this case database has to commit (money send to your
friend account) or rollback(Money comes back to your account) as per the
feasibility.
PostgreSQL Administration (Windows & Linux) 4
PostgreSQL manages concurrency control by Multi Version Concurrency
control (MVCC), for example , if one user is trying to read the row and at
the same time other user is trying to update/modify the same row than it
might be possible that the reader may get inconsistent piece of data. To
prevent this, ISOLATION is the property which ensure that both the user
does not interfere with each other and both will get consistent copy of new
one. MVCC takes care of this.
PosgreSQL Object Naming Conventions
Common Names In PostgreSQL
Tables & Indexes Relation
Row Tuple
Column Attribute
Data block Page ( On the Disk)
Page Buffer ( When block in the memory)
PostgreSQL Limits
Item Upper Limit Description
Database Size Unlimited
Number Of Databases 4,294,950,911
Relations per database 1,431,650,303
Relation Size 32 TB.
Row per Table Limited by the number
of tuples that can fit
onto 4,294,967,295
pages.
Column per table 1600
PostgreSQL Administration (Windows & Linux) 5
Field Size 1 GB
Identifier Length 63 bytes
Indexes Per table Unlimited
Columns per indexes 32
Partition Keys 32
What is Page
• Page is an smallest unit of data storage.
• Every table and index is stored as an array of page (In Sequence) of
fixed size.
• Default page size in PostgreSQL is 8 KB.
• Different page size can be configured (not in same database) while
compiling the server.
• All pages are logically equivalent and any row can be stored in any
page.
Difference b/n Page and buffer
The data stored on disk is called Page , but while reading the data it is
fetched back to memory , so the data loaded back in the memory is
called Buffer.
PostgreSQL Administration (Windows & Linux) 6
Understanding PostgreSQL Page Layout
• Page Header (24 byte Long) – Keeps information about free space ,
page size and version.
If insertion performed , database will check the availability of space
in page header. Depending on the availability it stores the tuple or it
will move to the next page.
• Item Data ID (4 bytes per item) – Array of pairs pointing to the
actual item or tuples.
• Free Space – Un-allocated space. New item pointers are allocated
from the start of this area and new item from the end.
PostgreSQL Administration (Windows & Linux) 7
• Tuple – Row Itself.
• Special – Holds Information about index access & any relevant
information i.e. btree.
Installing PostgreSQL on Windows and Linux
Module Objective:
• Minimal System Requirement
• Installation of PostgreSQL 12 on Windows
• Setting Environment Variables on Windows
• Installation of PostgreSQL on Linux using Yum
• Setting Environment Variables on Linux
Hardware Requirements:
• 1 GHz processor
• 2 GB of RAM
• 512 MB of HDD
Software Requirements:
• User must have administrator privileges on windows system.
• Root or Super user access is required on Linux system.
PostgreSQL Administration (Windows & Linux) 8
PostgreSQL installation on Windows 64 Bit
Steps 1:
Navigate to www.postgresql.org/downloads
Step 2: Select operating system as Windows.
Step 3: Click on Download the installer
Step 4: Download Version 12 from available postgresql releases for Win
x86_64
Step 5: Right click on the setup file and select “Run as Administrator”
Step 6: Navigate through the wizard by selecting appropriate options
(initially All).
Step 7: Wait for the installation screen to finish installation.
Step 8: Search for psql interface from the start menu and try to connect to
Postgresql.
PostgreSQL Administration (Windows & Linux) 9
(Stack bulider is a package manager that contains additional tools that are
used for management, migration, replication, connectors, and other tools)
PostgreSQL Setting ENV Variable on Windows 64 Bit
Step 1: Click on Start -> Search for This PC/My PC -> Right click on This
PC -> Select Properties -> Advanced -> Environment Variables.
Step 2: In System Variable window search for path.
Step 3: Select Path and Click Edit.
Step 4: Click new and copy the location of postgresql bin folder in the new
line and click ok.
Step 5: Click option New in the system variable and enter variable name
as “PGDATA” and variable value as location of the postgresql data
directory.
Step 6: Select okay in all open windows to save & exit out of Environment
variable.
PostgreSQL installation on Linux
Step 1: Navigate to www.postgresql.org/downloads
Step 2: Select operating system as Linux.
Step 3: Select appropriate Linux Distribution (In my case Red Hat).
Step 4: Select Postgresql YUM Repository link
Step 5: Select Version 12(or latest) from available postgresql releases.
Step 6: Select “RHEL/Centos 8- x86_64” .
PostgreSQL Administration (Windows & Linux) 10
Step 7 : Download “ pgdg-redhat-repo-42.0-11.noarch.rpm” from the list.
Step 8: Install the downloaded rpm using the following syntax on the linux
box:
rpm -ivh pgdg-redhat-repo-42.0-11.noarch.rpm
Step 9 : Disable default postgresql on linux using the following syntax on
the linux box: dnf -qy module disable postgresql.
Step 10: Type the following command to list all available postgresql
version.
$ Yum list module postgresql*
Step 11: Look for postgresql v12 version.
$ Yum list module postgresql12*
Step 12: Install two packages from the postgresql12 list.
$Yum install postgresql12-server.x86_64 postgresql12-
contrib.X86_64
Step 13: Check the installation has completed successfully.
PostgreSQL Setting ENV Variable on Linux
Step 1: Login as Postgres user on the linux box.
Step 2: Edit bash_profile of postgres user using the following command
Vi .bash_profile
Step 3: Add the below mentioned lines to the existing bash_profile.
PATH=$PATH:HOME/bin
export PATH
export PATH=/usr/pgsql-12/bin:$PATH ## put relevant path as per
OS
PG_DATA=/var/lib/pgsql/12/data
PostgreSQL Administration (Windows & Linux) 11
export PGDATA
Step 4: Save and quit the bash_profile file.
Note: On windows OS , after installation it will create and initiate the
cluster automatically. But on linux it required to be created manually.
PostgreSQL Architecture
PostgreSQL Architecture Fundamentals
• PostgreSQL is a relational database management system with a
client-server architecture. Many clients/users (All of us)
requests/Receives services from host computer (Server) called client
server model. We run a application on our PC and send the request to
centralized server and a server responds to us.
• PostgreSQL uses "process per-user" client/server model. Each user
is given a particular process on connection. Which will stay there till
the life of connection.
• PostgreSQL's has a set of processes and memory structures which
constitutes an instance. The instance is a combination of memory
structures and processes. On a server there can be any number of
instances and each instance will have its own memory structure and
processes. Memory structure & processes can not be shared within
different instances.
PostgreSQL Administration (Windows & Linux) 12
• Programs run by clients connect to the server instance and request
read and write operations.
• Default port of PostgreSQL is 5432. Default port can be change
while installation or by changing in configuration files after
installation. Restart of service will be required for this operation.
Process and Memory Architecture
POSTMASTER is the first process to get started when we starts
PostgreSQL. It is assigned with the shared memory which combine of
Shared Buffer , WAL buffer, CLOG buffer and other. Postmaster will starts
other processes I.e Stats Collector, Checkpoints, WAL Writer,
BGWriter( or Writer) Logging Collector,Auto Vacume Launcher.
PostgreSQL Administration (Windows & Linux) 13
How the PG architecture Works : User (Client) Sends a request from
workstation to connect to the PG Server. The request is picked up by
postmaster to check validity of the credentials, IP address from
configuration files. On success, postmaster will start a new process and
handover the connection to a new process. Which in turn will connect to
the shared buffer , results the user communicate to the database.
Connection is kept alive till the user is logged in and user can read and
write to the database.
Assume the scenario , user sends SELECT request , data loaded from the
datafiles into the shared buffer and the results are sent back to the client by
the postgres process.
INSERT, UPDATE and DELETE we make changes in data are called
transactions. Any change request made by connected user the changes are
made in shared buffer and the copy of the transaction is recorded in WAL
buffer,when user commits changes WAL buffer will invoke the Wal Writer
which will write all the changes in WAL files.
The changes made in shared buffer (Actual data) is marked as commited
and stays in shared buffer.
CHECKPOINT is the process which invokes in every 5 Min.( Default,
can change as per requirement) is signals to the BGWRITER ( Writer) to
write all the commited data buffers (dirty Buffers) into the data-files.
CheckPoint ensure that all the data-files are in sync.
Generally in production system we enable the additional process i.e.
ARCHIVE. It keeps checking the WAL files. If the WAL file gets full or
switching between them, archive process copies the Wal files to Archive
files. These archive files will help us in doing PITR ( Point in time
Recovery) and provide additional level of security. It should not be enables
in test or development environment because it consumes lots of space.
PostgreSQL Administration (Windows & Linux) 14
CLOG Buffer : All commited transactions are marked in this buffer helps
us to check all the commited transactions.
STATS COLLECTOR : collects information and report of server
activities and update this information to database dictionary.
AUTO VACUUM PROCESS : when enabled its responsibility to run
Vacuum demon on fragmented or bloated tables. Tables gets fragmented
due to lots of update, insert or delete operations.
LOG COLLECTOR : Responsible to collect the error logs . For
example I starts the Postgres service and it doesn't starts. To seek the cause
or error I go to the error log files. Log collector writes all the log
information to the log files.
Postmaster – Supervisor process
• Postmaster is the first process which gets started in PostgreSQL
• Postmaster acts as supervisor process, whose job is to monitor, start,
restart some processes if they die.
• Postmaster acts a listener and receive new connection request from
the client. It listens the new connection request from users and create
new background process for newly received connection request.
• Postmaster is responsible for Authentication and Authorization of
all incoming request. When the new connection request received it
checks the validity, IP address by checking the configuration files.
Then it checks the database grant which user wants to access.
• Postmaster spawns a new process called Postgres for each new
connection.
PostgreSQL Administration (Windows & Linux) 15
Checking the PostgreSQL – postmaster and other process status
Windows : Goto taskbar > Right Click >> Open Task Manager , All OS
running processes will be displayed . Write click on status and add column
“Command Line”. You can see the all processes associated with postgres.
If one process is killed postmaster will restart the killed process.
Linux :
PostgreSQL Administration (Windows & Linux) 16
Utility Processes
• Bgwriter\Writer : Periodically writes the dirty buffer to a data file.
• Wal Writer : Write the WAL buffer to the WAL file when the
commit request received.
• Checkpointer : Checkpoint is invoked every 5 minute(default) or
when max_wal_size (PostgreSQL.conf) value is exceeded. The
check pointer signals to BGWriter to sync all the commited buffers
from the shared buffer area to the data files.
• Auto vacuum : Responsible to carry vacuum operations on bloated
tables.(If Enabled).
• Statscollector: Responsible for collection and reporting of
information about server activity then update the information to
optimizer dictionary((pg_catalog) .
• Logwriter\Logger: Write the error message to the log file.
• Archiver (Optional):When in Archive.log mode, copy the WAL file
to the specified directory.
PostgreSQL Administration (Windows & Linux) 17
Memory Segments of PostgreSQL
• Shared Buffers
• Wal Buffers
• Clog Buffers
• Work Memory
• Maintenance Work Memory
• Temp Buffers
Shared Buffer :
• User cannot access the datafile directly to read or write any data.
• Any select, insert, update or delete to the data is done via shared
buffer area.
• The data that is written or modified in this location is called “Dirty
data“.
• Dirty data is written to the data files located in physical disk through
background writer process.
Shared Buffers are controlled by parameter named: shared_buffer
(Default , 128MB) located in postgresql.conf file. We can change
size as per need.
Wal Buffer:
• Write ahead logs buffer is also called as “Transaction log Buffers”.
PostgreSQL Administration (Windows & Linux) 18
• WAL data is the metadata information about changes to the actual
data, and is sufficient to reconstruct actual data during database
recovery operations.
• WAL data is written to a set of physical files in persistent location
called "WAL segments" or "checkpoint segments".
• Wal buffers are flushed from the buffer area to wal segments by wal
writer.
• Wal buffers memory allocation is controlled by the wal_buffers
default -1 parameter in postgresql.conf.
Clog and other buffers:
• CLOG stands for "commit log", and the CLOG buffers is an area in
operating system RAM dedicated to hold commit log pages.
• The commit logs have commit status of all transactions and indicate
whether or not a transaction has been completed (committed).
• Work Memory is a memory reserved for either a single sort or hash
table (Parameter : work_mem - default 4MB in PostgreSQL.conf)
suppose we runs a query with order by clause, distinct , such sorting
& merge joins operations are performed in Work Memory.
• Maintenance Work Memory is allocated for Maintenance work i.e.
Vaccuming , index rebuild, ( Parameter : maintenance_work_mem
default -1).
• Temp Buffers are used for access to temporary tables in a user
session during large sort and hash table. ( Parameter : temp_buffers ,
Default 8mb).
PostgreSQL Administration (Windows & Linux) 19
Physical Files:
• Data Files: It is a file which is use to store data. It does not contain
any instructions or code to be executed.
• Wal Files : Write ahead log file, where all committed transactions
are written first here before writing to datafile.
• Log Files: All server messages, including stderr, csvlog and syslog
are logged in log files.
• Archive Logs( Optional): Data from wal segments are written on to
archive log files to be used for recovery purpose.
Database Clusters
Database Cluster:
• Database cluster is a collection of databases that is managed by
a single instance on a server.
• Initdb creates a new PostgreSQL database cluster. It makes new
directories set for the cluster.
• Creating a database cluster consists of creating the directories in
which the data is store. We call this the ”data directory”.
• We have to first initialize the storage area on the disk before we
begin any operation on the database.
• Location of Data Directory :
Linux : /var/lib/pgsql/data ( or according to version)
Windows : C:\Program Files\PostgreSQL\12\data ()
PostgreSQL Administration (Windows & Linux) 20
Initdb Syntax :
• We have to be logged in as PostgreSQL user to execute the below
commands.
• There are two way to initialize database
Syntax:
initdb -D /usr/local/pgsql/data
pg_ctl -D /usr/local/pgsql/data initdb
-D = refers to the data directory location.
-W = we can use this option to force the super user to provide password
before initialize db
Start\Stop Cluster:
• Start Cluster Syntax :
Linux : systemctl start postgresql-<Ver>
Windows : pg_ctl –D “C:\Program Files\PostgreSQL\12\data”
start
• Stop Cluster Syntax :
Linux : systemctl stop postgresql-<Ver>
Windows : : pg_ctl stop -D “C:\Program Files\PostgreSQL\12\
data” –m shutdown mode
PostgreSQL Administration (Windows & Linux) 21
Types of Shutdown:
Smart (SIGTERM): the server disallows new connections, but lets
existing sessions end their work normally. It shuts down only after all of
the sessions are terminated.
Fast :(SIGINT Default) : The server disallows new connections and abort
their current transactions and exits gracefully.
Immediate (SIGQUIT): Quits/aborts without proper shutdown which
lead to recovery from the WAL files on next startup .
Difference between Reload and Restart:
• When we make changes to server parameters, we need to reload the
configuration for them to take effect.
• Reload will just reload the new configurations, without restarting the
service.
• Few configuration changes in server parameters, Do not get reflected
until we restart the service.
• Restart gracefully shutdown all activity, relinquishes the resource,
close all open files and start again with new configuration.
Reload\Restart Cluster:
Syntax for Reload of Cluster:
On linux : systemctl reload posgresql-12
PostgreSQL Administration (Windows & Linux) 22
On windows: pg_ctl reload
Syntax for Restart of Cluster:
On linux : systemctl restart postgresql-12
On Windows : pg_ctl restart
Psql Command line:
SQL : SELECT pg_reload_conf(); (Irrespective of Env)
Pg_Controldata
Pg_controldata – Information about cluster.
Syntax : ./pg_controldata /var/lib/pgsql/11/data/
configure path if not working
PATH="/usr/lib/postgresql/13/bin/:$PATH
Syntax : pg_controldata /var/lib/postgresql/13/main
FULL Output:
PostgreSQL Administration (Windows & Linux) 23
pg_control version number: 1300
Catalog version number: 202007201
Database system identifier: 7002519706457325072
Database cluster state: in production
pg_control last modified: Saturday 28 May 2022 02:38:38 PM
Latest checkpoint location: 4A/6E000110
Latest checkpoint's REDO location: 4A/6E0000D8
Latest checkpoint's REDO WAL file: 000000020000004A0000006E
Latest checkpoint's TimeLineID: 2
Latest checkpoint's PrevTimeLineID: 2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:94813
Latest checkpoint's NextOID: 35281
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 478
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 94813
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Saturday 28 May 2022 02:38:37 PM
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
PostgreSQL Administration (Windows & Linux) 24
wal_log_hints setting: off
max_connections setting: 100
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce:
b700f489ed82c1424c2f16ecfd38a56a2b2c48656d92b034100e78bbefc7afd
Database Directory Layout
Module Objective:
• Installation Directory Layout
• Overview of Installation Directory & Demo
• Database Directory Layout
• Overview of Database Directory & Demo
PostgreSQL Administration (Windows & Linux) 25
• Base Directory
• Overview of Base Directory & Demo
Installation Directory Layout:
• PostgreSQL is typically installed to /usr/local/pgsql or /var/lib/pgsql
on linux.
• C:\Program Files\PostgreSQL\<version number> on windows.
• bin-> programs(createdb, initdb,createuser,etc)
• data -> Data Directory
• Doc --> Documentation
• Include --> Header Files
• Installer -> Installer files
• Scripts --> scripts like runpsql, serverctl vbscript files
• Share -> Sample configuration files
• pgadmin - pgadmin files.
Database (Data) Directory Layout:
Directory Name Description
Base Subdirectory containing per-database
subdirectories
PostgreSQL Administration (Windows & Linux) 26
Current_logfiles File recording the log file(s) currently
written to by the logging collector
Global Subdirectory containing cluster-wide tables,
such as pg_database,pg_tablespace,pg_index
etc
pg_commit_ts Subdirectory containing transaction commit
timestamp data= 9.5 and later,
track_commit_timestamp
pg_dynshmem Subdirectory containing files used by the
dynamic shared memory subsystem
pg_logical Subdirectory containing status data for
logical decoding
pg_multixact Subdirectory containing multitransaction status
data (used for shared row locks)
pg_notify Subdirectory containing LISTEN/NOTIFY
status data
pg_replslot Subdirectory containing replication slot data
pg_serial Subdirectory containing information about
committed serializable transactions
Log All error logs kept in this directory.
pg_snapshots Subdirectory containing exported snapshots
pg_stat Subdirectory containing permanent files for the
statistics subsystem
pg_stat_tmp Subdirectory containing temporary files for the
statistics subsystem
pg_subtrans Subdirectory containing subtransaction status
data
pg_tblspc Subdirectory containing symbolic links to
tablespaces
PostgreSQL Administration (Windows & Linux) 27
pg_twophase Subdirectory containing state files for prepared
transactions
pg_wal Subdirectory containing WAL (Write Ahead
Log) files
pg_xact Subdirectory containing transaction commit
status data, transaction metadata logs
Pg_ident.conf User name maps are defined in the ident map
file.user name map can be applied to map the
operating system user name to a database user.
postgresql.auto.conf A file used for storing configuration parameters
that are set by ALTER SYSTEM
postmaster.opts A file recording the command-line options the
server was last started.
postmaster.pid A lock file recording the current postmaster
process ID (PID), cluster data directory path,
postmaster start timestamp, port number, Unix-
domain socket directory path (empty on
Windows), first valid listen_address (IP address
or *, or empty if not listening on TCP), and
shared memory segment ID (this file is not
present after server shutdown)
PG_VERSION A file containing the major version number
of PostgreSQL
Base_Directory:
• Contains databases, that represented as directories named after their
object identifier (OID) not by the database name. Database name and
OID mapping can be found on pg_database table.
• Template 1 always has oid 1.
• Syntax to find oid of database:
Syntax: select datname,oid from pg_database;
PostgreSQL Administration (Windows & Linux) 28
PostgreSQL Configuration Files
Postgresql.conf File
• Postgresql.conf file contains parameters to help configure and
manage performance of the database server.
• Initdb installs a default copy of postgresql.conf and is usually located
in data directory. ( in 13 it is in /etc)
• The file follows one parameter per line format.
• Parameters which requires restart are clearly marked in the file.
• Many parameter needs a server restart to take effect.
PostgreSQL Administration (Windows & Linux) 29
The path of config file is data directory till Postgresql version 12. in
version 13 config files has been moved to etc/postgresql/<version> folder
in linux/ubuntu.
We can query the path by below command.
psql -U postgres -c 'SHOW config_file'
or from ubuntu user :
sudo -u postgres psql -c 'SHOW config_file'
The parameters in postgresql.conf file can be checked by SQL without
opening the file.
i.e. : SHOW max_connections;
Or we can check in more detail by the query below
select name,source_boot_val, sourcefile,pending_restart from pg_setting
where name=’max_connections’;
PostgreSQL Administration (Windows & Linux) 30
Pending_restart (t-true, f-false) means parameter was changed but restart is
not done so old value is applicable currently. Activated by reloading the
config I.e select pg_reload_conf();
pg_settings table can be described by command \d pg_settings.
Another catalog table is pg_file_Settings gives us the summary to plan the
parameter change.
PG_setting TABLE:
• Pg_settings table provides access to run-time parameters of the
server.
• It is a alternate interface to SHOW command.
• Pg_file_settings provides a summary of the contents of the server's
configuration file.
PostgreSQL Administration (Windows & Linux) 31
• This view is helpful for checking whether planned changes in the
configuration files will work
• Each “name = value” entry appearing in the files has a
corresponding applied column.
PostgreSQL.auto.conf file:
• This file hold settings provided through ALTER SYSTEM
command.
• Settings in postgresql.auto.conf overrides the settings in
postgresql.conf.
• ”Alter system” command provides a SQL-accessible means of
changing global defaults.
• Syntax : ALTER SYSTEM SET configuration_parameter = 'value'
• Syntax to reset : ALTER SYSTEM RESET configuration_parameter;
• Syntax to reset all : ALTER SYSTEM RESET ALL;
• It does not make any change in main config file postgresql.conf but it
changes postgresql.auto.conf. When we restart or reload the conf ,
the configuration applied as in postgresql.conf if same parameter is
not avilable in postgresql.auto.conf.
pg_ident.conf file:
• Configuration to indicate which map to use for each individual
connection.
• User name maps are defined in the ident map file.
• Pg_ident.conf file is read on start-up and any changes needs pg_ctl
reload
PostgreSQL Administration (Windows & Linux) 32
• Operating system user that initiated the connection might not be the
same as the database user.
• User name map can be applied to map the operating system user
name to a database user.
• pg_ident.conf is used in conjunction with pg_hba.conf.
Sample :
# MAP IDENT POSTGRESQL_USERNAME
sales rmartin sales
sales jpenny sales
audit auditor sales
audit auditor postgres
The file shown in allows either of the system users rmartin or jpenny to
connect as the PostgreSQL sales user, and allows the system user named
auditor to connect to PostgreSQL as either sales, or postgres.
pg_hba.conf File :
• Enables client authentication between the PostgreSQL server and the
client application.
• HBA means host based authentication. Postmaster checks the
authorization in pg_hba.conf configuration file while connecting to
new connection.
• When postgreSQL receives a connection request it will check
the ”pg_hba.conf” file to verify that the machine from which the
PostgreSQL Administration (Windows & Linux) 33
application is requesting a connection has rights to connect to the
specified database.
• PostgreSQL rejects a connection if an entry is not found in
pg_hba.conf file.
Sample Of the pg_hba.conf File :
TYPE DATABASE USER ADDRESS
METHOD
# IPv4 local connections:
host all all 127.0.0.1/32 md5
(/32 is a network mask)
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
Host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
TYPE
Host: is used to specify remote hosts that are allowed to connect to the
PostgreSQL server. PostgreSQL's postmaster backend must be running
with the -i option (TCP/IP) in order for a host entry to work correctly.
Local : is semantically the same as a host entry. However, you do not need
to specify a host that is allowed to connect. The local entry is used for
PostgreSQL Administration (Windows & Linux) 34
client connections that are initiated from the same machine that the
PostgreSQL server is operating on.
Hostssl: is user to specify hosts (remote or local) that are allowed to
connect to the PostgreSQL server using SSL.
DATABASE
This is the database name that the specified host is allowed to connect to.
The database keyword has three possible values:
All : keyword specifies that the client connecting can connect to any
database the PostgreSQL server is hosting.
Same user : keyword specifies that the client can only connect to a
database that matches the clients authenticated user name.
Name : Client can only connect to the database as specified by name.
IP ADDR/NET MASK
The ip_addr and netmask fields specify either a specific IP address, or
range of IP addresses, that are allowed to connect to the PostgreSQL
server.
Range can by specified by describing an IP network with an associated
netmask.
For single IP address the netmask field should be set to 255.255.255.255.
AUTHENTICATION
The Authentication method specifies the type of authentication the server
should use for a user trying to connect to PostgreSQL.
PostgreSQL Administration (Windows & Linux) 35
Trust :This method allows any user from the defined host to connect to a
PostgreSQL database without the use of a password, as any PostgreSQL
user. You are trusting the host-based authentication with the use of this
method, and any user on the specified host. This is a dangerous condition
if the specified host is not a secure machine, or provides access to users
unknown to you.
Reject : This method automatically denies access to PostgreSQL for that
host or user. This can be a prudent setting for sites that you know are never
allowed to connect to your database server.
Password :This method specifies that a password must exist for a
connecting user. The use of this method will require the connecting user to
supply a password that matches the password found in the database.
Crypt : This method is similar to the password method. When using crypt,
the password is not sent in clear text, but through a simple form of
encryption. The use of this method is not very secure, but is better than
using the clear text password method.
Krb4, krb5 : This methods are used to specify Version 4 or 5 of the
Kerberos authentication system.
Ident : This method specifies that an ident map should be used when a
host is requesting connections from a valid IP address listed in the
pg_hba.conf file. This method requires one option.
The required option may be either the special term sameuser, or a named
map that is defined within the pg_ident.conf file.
PostgreSQL Administration (Windows & Linux) 36
pg_hba modification
• Stop postgresql on the source machine.
• Edit pg_hba.conf file and add the entry of client.
• Change the authentication method to Trust or md5(depending on
requirement)
• Edit/ensure parameter in postgresql.conf to listen_addresses = '*‘ or
ip address.
• Start postgres on the source machine.
• Connection psql -U postgres –h hostname from client.
• Depending on the authentication method chosen the client may or
may not prompt for password.
PostgreSQL Administration (Windows & Linux) 37
DATABASE - CREATE / DROP
Module Objective:
• Create database – Psql / createdb utility
• Drop database – Psql/ dropdb utility
• Create user – Psql/ createuser utility/ Interactive
• Drop user - Psql/ dropuser utility
• Privileges in PostgreSQL
• Grants and Revoke Access
• What is schema and its benefits?
• Create/ drop schema
• Schema Search Path
Create database Psql / createdb utility:
• Database is an organized collection of structured information, or
data, typically stored and accessed electronically from a computer
system.
• Syntax from psql : Create database databasename owner
ownername;
• If we requires to create clone ( i.e. SNAPSHOT) of database, we can
add WITH TEMPLATE <dbname_to_be_cloned> in the above
syntax.
• Syntax from commandline : Createdb <dbname>.
PostgreSQL Administration (Windows & Linux) 38
• Syntax for createdb help : createdb –help
* CreateDB utility is found in bin folder
Example :
from psql prompt: create database siddhu owner postgres;
With createdb utility on OS command prompt :
$ createdb siddhu -O postgres
Verifying newly created db’s OID on disk:
psql : select name,oid from pg_database;
check the OID number in data directory folder.
Drop database Psql /dropdb utility:
We can’t drop the database which we are connected.
Example :
scott=# drop database scott;
ERROR: cannot drop the currently open database
Syntax from psql : Drop database siddhu;
Syntax from command line : dropdb -u username siddhu.
Syntax for dropdb help : dropdb –help
psql command to check current connection information: \conninfo
User in PostgreSQL:
PostgreSQL Administration (Windows & Linux) 39
• Db users and Operating users are completely separate.
• Users name should be unique and should not start with pg_.
• Postgres super user is created by default on installation of postgresql
• Postgres user has all the privileges with grant option.
• Only super users or users with create role privilege can create a
user.
• Database users are global across the cluster.
Create user – Psql/ createuser utility/ Interactive
Syntax from psql : create user scott login superuser password 'welcome';
Syntax from command line : createuser <username>
Syntax for interactive user creation from command line :
Example (create user) :
createuser --interactive joe
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) y
Shall the new role be allowed to create more new roles? (y/n) y
Syntax for createuser help : createuser –help
“--Interactive option will ask the required parameter while creating user.”
PostgreSQL Administration (Windows & Linux) 40
( Note : postgres os user does not exist in windows so we have to
mention username and password with create db command. But in
linux we can directly run the createuser command due to already
logged in to postgres user.)
postgres# create user siddhu login superuser password ‘outcome’;
This will create the super user. We can check users details by meta
command
\du.
User creation with –interactive switch in command. By this we can assign
roles while user creation.
$ Createuser –-interactive
Restrict user to connect particular DB :
postgres# revoke connect on database mydb from public.
PostgreSQL Administration (Windows & Linux) 41
Drop user - Psql/ dropuser utility:
Syntax from psql : drop user <username>
Syntax from command line : dropuser <username>
Dropping a user with objects or privileges will return an error.
Example :
postgres=# drop user test1;
ERROR: role "test1" cannot be dropped because some objects depend
on it
Assign the user privileges to another user before dropping the user.
Example :
REASSIGN OWNED BY user to postgres;
Drop role username;
Privileges in PostgreSQL:
• Privilege is a right to execute a particular type of SQL statement, or a
right to access another user's object.
• There are two types of privileges – Cluster level and Object level
• Cluster Level Privileges are granted by super user. It can be
granted during create user or by altering an existing user.
• Object Level Privileges are granted by super user or the owner of
the object or someone with grant privileges.
PostgreSQL Administration (Windows & Linux) 42
• Privileges allow a user to perform particular actions on a database
object, such as tables, view or sequence.
Grant Examples:
• Grant CONNECT to the database:
GRANT CONNECT ON DATABASE database_name TO username;
• Grant USAGE on schema:
GRANT USAGE ON SCHEMA schema_name TO username;
• Grant on all tables for DML statements: SELECT, INSERT,
UPDATE, DELETE
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES
IN SCHEMA schema_name TO username;
• Grant all privileges on all tables in the schema:
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA
schema_name TO username;
• Grant all privileges on all sequences in the schema:
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA
schema_name TO username;
• Grant permission to create database: cluster level
ALTER USER username CREATEDB;
• Make a user superuser:
ALTER USER myuser WITH SUPERUSER;
• Remove superuser status:
PostgreSQL Administration (Windows & Linux) 43
ALTER USER username WITH NOSUPERUSER;
• Column Level access.:
GRANT SELECT (col1), UPDATE (col1) ON mytable TO user;
Cluster Level Grant:
Revoke Examples:
• Revoke Delete/update privilege on table from user
PostgreSQL Administration (Windows & Linux) 44
REVOKE DELETE, UPDATE ON products FROM user;
• Revoke all privilege on table from user
REVOKE ALL ON products FROM user;
• Revoke select privilege on table from all users(Public)
REVOKE SELECT ON products FROM PUBLIC;
Schema & its Benefits
• Schema is a name space that contains named objects (tables, data
types, functions, and operators).
• One database can have multiple schemas.
• Schemas helps us in separation of data between different
applications.
• Organize database objects into logical groups to make them more
manageable.
• Applications can be put into separate schemas so that they cannot
collide with the names of other objects.
• By this , One Database can be used by multiple users without
interfering with each other.
Create & Drop Schema:
• Create Schema
CREATE schema <schema_name>;
• Create Schema for a user, the schema will also be named as the user
Create schema authorization <username>;
PostgreSQL Administration (Windows & Linux) 45
• Create Schema named John, that will be owned by brett
CREATE schema IF NOT EXISTS john AUTHORIZATION brett;
• Drop a Schema
Drop schema <schema_name>;
NOTE:: We cannot drop schema if there are any object associate with
it.
Schema Search Path :
Search path is used to determine what order PostgreSQL should
search to find the objects.
Show searchpath can be used to find the current search path.
Example :
postgres=# show search_path;
search_path
-----------------
"$user", public
( 1 row)
Default "$user" is a special option that says if there is a schema that
matches the current user (i.e SELECT SESSION_USER;), then search
within that schema.
The session_user is normally the user who initiated the current database
connection; but superusers can change this setting with SET SESSION
AUTHORIZATION. Both are same.
PostgreSQL Administration (Windows & Linux) 46
Search path can be set at session level , user level, database level and
cluster level
Example :
Test1=# SET search_path TO test1,public;
Test1=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------+-------+----------
test1 | abc | table | test1
(1 rows)
Database level setting : alter database mydb set search_path="$user",
public;
User Level Setting : alter role johnny set search_path = "$user", public;
Retrive the search_path Settings:
SELECT r.rolname, d.datname, rs.setconfig
FROM pg_db_role_setting rs
LEFT JOIN pg_roles r ON r.oid = rs.setrole
LEFT JOIN pg_database d ON d.oid = rs.setdatabase
WHERE r.rolname = 'myrole' OR d.datname = 'mydb';
PostgreSQL tablespace size
SELECT pg_size_pretty ( pg_tablespace_size ('pg_default') );
Query to Find all active sessions and queries:
SELECT pid, datname, usename, application_name, client_hostname
,client_port, backend_start, query_start, query, state FROM
pg_stat_activity WHERE state = 'active';
PostgreSQL Administration (Windows & Linux) 47
Queries to Find slow, long-running, and Blocked Queries
SELECT pid, user, pg_stat_activity.query_start, now() -
pg_stat_activity.query_start AS query_time, query, state,
wait_event_type, wait_event FROM pg_stat_activity WHERE (now() -
pg_stat_activity.query_start) > interval '10 minutes';
Query to find blocking session
SELECT activity.pid, activity.usename, activity.query, blocking.pid
AS blocking_id, blocking.query AS blocking_query FROM
pg_stat_activity AS activity JOIN pg_stat_activity AS blocking ON
blocking.pid = ANY(pg_blocking_pids(activity.pid));
Viewing locks with table names and queries
select relname as relation_name, query, pg_locks.* from pg_locks join
pg_class on pg_locks.relation = pg_class.oid join pg_stat_activity on
pg_locks.pid = pg_stat_activity.pid;
Killing/cancelling a long running Postgres query:
You can find the pid using pg_stat_activity
Option #1 (graceful):
SELECT pg_cancel_backend(<PID>);
Option #2 (forceful):
SELECT pg_terminate_backend(<PID>);
Terminate all queries
PostgreSQL Administration (Windows & Linux) 48
If you want to terminate all running queries, the following statement can
be executed:
SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE state
= 'active' and pid <> pg_backend_pid();
Query to find high cpu usage and which query is causing it.
select ((total_plan_time + total_exec_time) / 1000 / 3600) as total_hours,
((total_plan_time + total_exec_time) / 1000) as total_seconds,
((total_plan_time + total_exec_time) / calls) as avg_millis, calls
num_calls, query from pg_stat_statements order by 1 desc limit 10;
You can also use pg_activity utility to monitor the cpu usage.
Monitoring CPU and memory usage from Postgres
We'll create two database tables that will allow you to query CPU and
memory usage from within the database connection. This way your
applications can monitor the health of the servers without needing to worry
about another connection or another protocol.
You can run these commands on the master database and they will
propagate to all the slave databases as well.
First, load the file foreign data wrapper and create the foreign data server:
CREATE EXTENSION file_fdw;
CREATE SERVER fileserver FOREIGN DATA WRAPPER file_fdw;
PostgreSQL Administration (Windows & Linux) 49
Then we'll create the table that loads CPU loadavg from the /proc/loadavg
file:
CREATE FOREIGN TABLE loadavg
(one text, five text, fifteen text, scheduled text, pid text)
SERVER fileserver
OPTIONS (filename '/proc/loadavg', format 'text', delimiter ' ');
Creating the table that will let you query memory info is similar:
CREATE FOREIGN TABLE meminfo
(stat text, value text)
SERVER fileserver
OPTIONS (filename '/proc/meminfo', format 'csv', delimiter ':');
Now you can run SELECT queries to see the info!
postgres=# SELECT * FROM loadavg;
you can also query
SELECT * FROM meminfo;
List all database users.
select * from pg_user;
All database and their size, with/without indexes
select datname, pg_size_pretty(pg_database_size(datname))
from pg_database
order by pg_database_size(datname) desc;
Cache hit rates (ratio should not be less than 0.99)
SELECT sum(heap_blks_read) as heap_read, sum(heap_blks_hit) as
heap_hit, (sum(heap_blks_hit) - sum(heap_blks_read)) /
sum(heap_blks_hit) as ratio
FROM pg_statio_user_tables;
PostgreSQL Administration (Windows & Linux) 50
Table index usage rates (should not be less than 0.99)
SELECT relname, 100 * idx_scan / (seq_scan + idx_scan)
percent_of_times_index_used, n_live_tup rows_in_table
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC; – division by 0
How many indexes are in cache
SELECT sum(idx_blks_read) as idx_read, sum(idx_blks_hit) as idx_hit,
(sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) as ratio
FROM pg_statio_user_indexes;
Show unused indexes:
SELECT relname AS table_name, indexrelname AS index_name,
idx_scan, idx_tup_read, idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelname::regclass))
FROM pg_stat_all_indexes
WHERE schemaname = 'public'
AND idx_scan = 0
AND idx_tup_read = 0
AND idx_tup_fetch = 0
ORDER BY pg_relation_size(indexrelname::regclass) DESC;
Find cardinality of index:
SELECT relname, relkind, reltuples as cardinality, relpages
FROM pg_class
WHERE relname LIKE 'tableprefix%';
or
Find cardinality of index
SELECT schema_name,
PostgreSQL Administration (Windows & Linux) 51
object_name,
object_type,
cardinality,
pages
FROM (
SELECT pg_catalog.pg_namespace.nspname AS schema_name,
relname as object_name,
relkind as object_type,
reltuples as cardinality,
relpages as pages
FROM pg_catalog.pg_class
JOIN pg_catalog.pg_namespace ON relnamespace =
pg_catalog.pg_namespace.oid
)t
WHERE schema_name NOT LIKE 'pg_%'
and schema_name <> 'information_schema'
--and schema_name = '$schema_name'
--and object_name = '$object_name'
ORDER BY pages DESC, schema_name, object_name;
Cardinality is an indicator that refers to the uniqueness of all values in a
column. Low cardinality means a lot of duplicate values in that column.
For example, a column that stores the gender values has low cardinality. In
contrast, high cardinality means that there are many distinct values.
Show Table Bloats
with foo as (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+COUNT(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND
s2.tablename = s.tablename
PostgreSQL Administration (Windows & Linux) 52
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting('block_size')::NUMERIC) AS bs,
CASE WHEN SUBSTRING(v,12,3) IN ('8.0','8.1','8.2') THEN 27
ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
), rs as (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr
%ma END)))::NUMERIC AS datahdr,
(maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma
ELSE nullhdr%ma END))) AS nullhdr2
FROM foo
), sml as (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))
+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS
ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::FLOAT)),0) AS
iotta -- very rough approximation, assumes all cols
FROM rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname
= rs.schemaname AND nn.nspname <> 'information_schema'
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
)
PostgreSQL Administration (Windows & Linux) 53
SELECT
current_database(), schemaname, tablename, /*reltuples::bigint,
relpages::bigint, otta,*/
ROUND((CASE WHEN otta=0 THEN 0.0 ELSE
sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-
otta)::BIGINT END AS wastedbytes,
iname, /*ituples::bigint, ipages::bigint, iotta,*/
ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE
ipages::FLOAT/iotta END)::NUMERIC,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS
wastedibytes
FROM sml
ORDER BY wastedbytes DESC
PSQL Commands:
Connect to Psql
Psql is a terminal-based front-end to PostgreSQL.
It enables the users to query postgreSQL interactively and see the query
results.
Connect to Specific Database with user and password
Syntax: psql -d database -U user –W (-d =Database,-U = User, -W =
Password)
Connect to Database on a different host/machine.
Syntax : psql -h host -d database -U user –W
Connect using SSL Mode
Syntax : psql -U user -h host "dbname=db sslmode=require"
PostgreSQL Administration (Windows & Linux) 54
Psql Commands:
Switch connection to a new database
postgres=# \c test1
You are now connected to database "test1" as user "postgres".
List available databases
postgres=# \l
List available tables
postgres=# \ dt
Describe a table
postgres=# \d table_name
List available schema (+ to get more info)
postgres=# \dn
List available functions(+ to get more info)
postgres=# \df
List available views(+ to get more info)
postgres=# \dv
List users and their roles(+ to get more info)
postgres=# \du
List available sequence(+ to get more info)
postgres=# \ds
PostgreSQL Administration (Windows & Linux) 55
Execute the previous command
postgres=# \g
Command history
postgres=# \s
Save Command History to file:
postgres=# \s filename
Get help on psql commands
postgres=# \?
Turn on\off query execution time
postgres=# \timing
Edit statements in editor
postgres=# \e
Edit Functions in editor
postgres=# \ef
set output from non-aligned to aligned column output.
postgres=# \a
Formats output to HTML format.
postgres=# \H
Connection Information
postgres=# \conninfo
Quit psql
PostgreSQL Administration (Windows & Linux) 56
postgres=# \q
Run sql statements from a file.
psql -d test1 -U test1 -f test1.sql ( command line)
Send the output to a file.
postgres=# \o <filename>
Save query buffer to filename.
postgres=# \w filename
Turn off auto commit on session level
\set AUTOCOMMIT off
PostgreSQL SYSTEM_CATALOGS
The system catalogs are the place where a relational database management
system stores schema metadata, such as information about tables and
columns, and internal bookkeeping information. PostgreSQL's system
catalogs are regular tables.
Name Description
pg_database Stores general database info
pg_stat_database Contains stats information of
database
pg_tablespace Contains Tablespace information
pg_operator Contains all operator information
pg_available_extensio List all available extensions
ns
PostgreSQL Administration (Windows & Linux) 57
pg_shadow List of all database users. pg_user
is a publicly readable view on
pg_shadow that blanks out the
password field.
pg_stats Planner stats
pg_timezone_names Time Zone names
pg_locks Currently held locks
pg_tables All tables in the database
pg_settings Parameter Settings
pg_user_mappings All user mappings
pg_indexes All indexes in the database
pg_views All views in the database.
• Find Current Schema
postgres=# select current_Schema();
• Current User
postgres=# select current_user;
• Current Database
postgres=# select current_database();
• Current setting of any parameter in PostgreSQL
postgres=# select current_setting('max_parallel_workers');
• Current User process id
postgres=# select pg_backend_pid();
• Find Postmaster start time
postgres=# select pg_postmaster_start_time();
PostgreSQL Administration (Windows & Linux) 58
• PostgreSQL Version
postgres=# select version ();
• Backup is running or not
postgres=# select pg_is_in_backup();
• Date & Time in PostgreSQL with time zone:
postgres=# select now () as current;
• Current Date and Time without Timezone
postgres=# SELECT NOW ()::timestamp;
• Add 1 hour to existing date and time
postgres=# SELECT (NOW () + interval '1 hour') AS
an_hour_later;
• To Find next day date and time
postgres=# SELECT (NOW () + interval '1 day') AS
this_time_tomorrow;
• To deduct 2 hours and 30 minutes from current time
postgres=# SELECT now() - interval '2 hours 30 minutes' AS
two_hour_30_min_go;
TABLE INHERITANCE & PARTITIONING
Module Objectives:
• Table Inheritance
• Table Partitioning
• Copy Table
PostgreSQL Administration (Windows & Linux) 59
Table Inheritance:
• Table inheritance allows child table to inherit all the columns of
the parent master table.
• A child table can have extra fields of its own in addition to the
inherited columns.
• Query references all rows of that master table plus all of its children
tables.
• “Only” keyword can be used to indicate that the query should apply
only to a particular table and not any tables.
• Any update or delete on parents table without only affects the records
in child table.
Example:
Create Table:
create table orders(orderno serial, flightname varchar(100),boarding
varchar(100),status varchar(100),source varchar(100));
create table online_booking (price int) inherits(orders);
create table agent_booking (commision int) inherits(orders);
Insert Records:
insert into orders(flightname,boarding,status,source)
values('aircanada','xyz','ontime','employees');
insert into online_booking(flightname,boarding,status,source,price)
values('nippon','chn','ontime','website',5000);
PostgreSQL Administration (Windows & Linux) 60
insert into online_booking(flightname,boarding,status,source,price)
values('luftansa','chn','ontime','app',3000);
insert into agent_booking(flightname,boarding,status,source,commision)
values('etihad','aud','ontime','agent001',1000);
insert into agent_booking(flightname,boarding,status,source,commision)
values('emirates','dxb','ontime','agent007',1300);
Select Parent Table:
nano=# \set AUTOCOMMIT off
nano=# select * from orders;
orderno | flightname | boarding | status | source
---------+------------+----------+--------+-----------
1 | aircanada | xyz | ontime | employees
2 | nippon | chn | ontime | website
3 | luftansa | chn | ontime | app
5 | etihad | aud | ontime | agent001
6 | emirates | dxb | ontime | agent007
(5 rows)
Update Parent:
nano=# update orders set status='Cancelled';
UPDATE 5
nano=# select * from orders;
orderno | flightname | boarding | status | source
---------+------------+----------+-----------+-----------
1 | aircanada | xyz | Cancelled | employees
2 | nippon | chn | Cancelled | website
3 | luftansa | chn | Cancelled | app
5 | etihad | aud | Cancelled | agent001
PostgreSQL Administration (Windows & Linux) 61
6 | emirates | dxb | Cancelled | agent007
(5 rows)
nano=# rollback;
ROLLBACK
nano=# select * from orders;
orderno | flightname | boarding | status | source
---------+------------+----------+--------+-----------
1 | aircanada | xyz | ontime | employees
2 | nippon | chn | ontime | website
3 | luftansa | chn | ontime | app
5 | etihad | aud | ontime | agent001
6 | emirates | dxb | ontime | agent007
(5 rows)
nano=# update only orders set status='Cancelled';
UPDATE 1
nano=# select * from orders;
orderno | flightname | boarding | status | source
---------+------------+----------+-----------+-----------
1 | aircanada | xyz | Cancelled | employees
2 | nippon | chn | ontime | website
3 | luftansa | chn | ontime | app
5 | etihad | aud | ontime | agent001
6 | emirates | dxb | ontime | agent007
(5 rows)
Delete:
nano=# delete from orders;
DELETE 5
nano=# select * from orders;
orderno | flightname | boarding | status | source
---------+------------+----------+--------+--------
(0 rows)
PostgreSQL Administration (Windows & Linux) 62
nano=# rollback;
WARNING: there is no transaction in progress
ROLLBACK
nano=# select * from orders;
orderno | flightname | boarding | status | source
---------+------------+----------+--------+--------
(0 rows)
Drop Table:
nano=# drop table orders;
ERROR: cannot drop table orders because other objects depend on it
DETAIL: table online_booking depends on table orders
table agent_booking depends on table orders
HINT: Use DROP ... CASCADE to drop the dependent objects too.
nano=# drop table orders cascade;
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to table online_booking
drop cascades to table agent_booking
DROP TABLE
Table Partitioning :
• Table Partitioning means splitting a table into smaller pieces.
• Table Partitioning holds many performance benefits for tables that
hold large amount of data.
• PostgreSQL allows table partitioning via table inheritance.
• Each Partition is created as a child table of a single parent table.
• PostgreSQL implements range and list partitioning methods.
Example:
PostgreSQL Administration (Windows & Linux) 63
Create Table:
create table bookings(flightno varchar(200),flightname
varchar(200),booking_date timestamp);
create table jan_bookings(check(booking_date >= date '2020-01-01' and
booking_date <= '2020-01-31')) inherits(bookings);
create table feb_bookings(check(booking_date >= date '2020-02-01' and
booking_date <= '2020-02-29')) inherits(bookings);
nano=# \d+ bookings
Table "public.bookings"
Column | Type | Collation | Nullable | Default | Storage |
Stats target | Description
--------------+-----------------------------+-----------+----------+---------
+----------+--------------+-------------
flightno | character varying(200) | | | | extended |
|
flightname | character varying(200) | | | | extended |
|
booking_date | timestamp without time zone | | | | plain |
|
Child tables: feb_booking,
jan_booking
Access method: heap
Create Index:
nano=# create index booking_jan_idx on jan_booking using
btree(booking_date);
CREATE INDEX
nano=# create index booking_feb_idx on feb_booking using
btree(booking_date);
CREATE INDEX
Create Function:
PostgreSQL Administration (Windows & Linux) 64
create or replace function on_insert() returns trigger as $$
begin
if(new.booking_date >= date '2020-01-01' and new.booking_date <=date
'2020-01-31') then
insert into jan_booking values(new.*);
elseif (new.booking_date >= date '2020-02-01' and new.booking_date
<=date '2020-02-29') then
insert into feb_booking values(new.*);
else
raise exception 'Enter valid booking date';
end if;
return null;
end;
$$ LANGUAGE plpgsql;
Create Trigger:
create trigger booking_entry before insert on bookings for each row
execute procedure on_insert();
CREATE TRIGGER booking_entry
BEFORE INSERT
ON bookings
FOR EACH ROW
EXECUTE PROCEDURE on_insert();
Insert Records:
nano=# insert into bookings values('dxb102','emirates','2020-02-09');
INSERT 0 0
nano=# insert into bookings values('dxb103','emirates','2020-02-15');
INSERT 0 0
nano=# insert into bookings values('auh345','etihad','2020-01-10');
INSERT 0 0
nano=# select * from bookings;
PostgreSQL Administration (Windows & Linux) 65
flightno | flightname | booking_date
----------+------------+---------------------
dxb101 | etihad | 2020-01-25 00:00:00
auh345 | etihad | 2020-01-10 00:00:00
dxb102 | emirates | 2020-02-09 00:00:00
dxb103 | emirates | 2020-02-15 00:00:00
(4 rows)
nano=# select * from only bookings;
flightno | flightname | booking_date
----------+------------+--------------
(0 rows)
nano=# select * from jan_booking;
flightno | flightname | booking_date
----------+------------+---------------------
dxb101 | etihad | 2020-01-25 00:00:00
auh345 | etihad | 2020-01-10 00:00:00
(2 rows)
nano=# select * from feb_booking;
flightno | flightname | booking_date
----------+------------+---------------------
dxb102 | emirates | 2020-02-09 00:00:00
dxb103 | emirates | 2020-02-15 00:00:00
(2 rows)
Validate Function with wrong entry:
nano=# insert into bookings values('auh234','etihad','2020-03-12');
ERROR: Enter valid booking date
CONTEXT: PL/pgSQL function on_insert() line 8 at RAISE
PostgreSQL Administration (Windows & Linux) 66
COPY TABLE With or Without Data:
Copy Table is used to copy the structure of a table along with data.
Unlike Inheritance table , copy table does not have any relationship with
the base table.
Syntax with data:
CREATE TABLE new_table AS TABLE existing_table;
Syntax without data:
CREATE TABLE new_table AS TABLE existing_table WITH NO
DATA;
Example:
Create Table:
nano=# create table train_bookings(trainno serial,trainname
varchar(200),destination varchar(100));
Insert Records:
nano=# insert into train_bookings(trainname,destination)
values('express','toronto');
INSERT 0 1
nano=# insert into train_bookings(trainname,destination)
values('semiexpress','montreal');
INSERT 0 1
nano=# insert into train_bookings(trainname,destination)
values('goods','calgary');
INSERT 0 1
nano=# select * from train_bookings;
trainno | trainname | destination
---------+-------------+-------------
1 | express | toronto
PostgreSQL Administration (Windows & Linux) 67
2 | semiexpress | montreal
3 | goods | calgary
(3 rows)
Create Table AS:
nano=# create table train_dest as table train_bookings;
SELECT 3
nano=# select * from train_dest;
trainno | trainname | destination
---------+-------------+-------------
1 | express | toronto
2 | semiexpress | montreal
3 | goods | calgary
(3 rows)
nano=# \d train_bookings
Table "public.train_bookings"
Column | Type | Collation | Nullable | Default
-------------+------------------------+-----------+----------
+-------------------------------------------------
trainno | integer | | not null |
nextval('train_bookings_trainno_seq'::regclass)
trainname | character varying(200) | | |
destination | character varying(100) | | |
nano=# \d train_dest
Table "public.train_dest"
Column | Type | Collation | Nullable | Default
-------------+------------------------+-----------+----------+---------
trainno | integer | | |
trainname | character varying(200) | | |
destination | character varying(100) | | |
Insert into Destination:
PostgreSQL Administration (Windows & Linux) 68
nano=# insert into train_dest (trainname,destination)
values('passenger','yukon');
INSERT 0 1
nano=# select * from train_bookings;
trainno | trainname | destination
---------+-------------+-------------
1 | express | toronto
2 | semiexpress | montreal
3 | goods | calgary
(3 rows)
nano=# select * from train_dest;
trainno | trainname | destination
---------+-------------+-------------
1 | express | toronto
2 | semiexpress | montreal
3 | goods | calgary
| passenger | yukon
(4 rows)
Drop Table:
nano=# drop table train_dest;
DROP TABLE
nano=# create table train_dest as table train_bookings with no data;
CREATE TABLE AS
nano=# select * from train_dest;
trainno | trainname | destination
---------+-----------+-------------
(0 rows)
TABLE SPACES IN PostgreSQL
Module Objective:
• Tablespace & its advantages
PostgreSQL Administration (Windows & Linux) 69
• PostgreSQL default tablespaces
• Create tablespaces
• Move table from one tablespace to another
• Drop tablespaces
• Temporary tablespaces
Tablespace & its advantages
• PostgreSQL stores data logically in tablespaces and physically in
datafiles.
“Tablespace does not holds any data. Data is still located in the disk.
Tablespace just map a logical name for the physical location of the
data on the disk. It just a pointer to the location of the data.
Tablespace can be used to segregate the data i.e. data in drive a and
indexes in drive d. ”
• PostgreSQL uses a tablespace to map a logical name to a physical
location on disk.
• Tablespace allows the user to control the disk layout of PostgreSQL.
• Statistics of database objects usage to optimize the performance of
databases.
• Allocate data storage across devices to improve performance .
• WAL files object on fast media and archive data on slow media.
• In Iinux , postgres user must have the permission to read write on the
folder.
DEFAULT TABLESPACE:
PostgreSQL Administration (Windows & Linux) 70
• Default comes with two out of the box tablespaces namely
pg_default and pg_global
• pg_default tablespace stores all user data.
• pg_global tablespace stores all global data.
• pg_default tablespace is the default tablespace of the template1 and
template0 databases.
• All newly created database uses pg_default tablespace, unless
overridden by a TABLESPACE clause while CREATING
DATABASE.
• Location of Default Tablespaces is data directory.
DROP TABLESPACE:
Dropping a tablespace all the reference from the system automatically.
We cannot drop a tablespace which is not empty.
Find objects associate with the tablespace
Syntax : select * from pg_tables where tablespace = 'hrd';
Drop tablespace
Syntax : drop tablespace hrd;
Query pg system catalog view to check the tablespace is dropped.
Syntax : select * from pg_tablespace;
Examples:
• Syntax for creating tablespace: (ensure the location exist)
create tablespace hrd location '/opt/app/hrd/';
PostgreSQL Administration (Windows & Linux) 71
• Syntax for creating a table on a newly created tablespace
create table test1(studid int,stuname varchar(50)) tablespace
hrd;
• Query to find which tablespace the table belong to
select * from pg_tables where tablespace='hrd';
or
select * from pg_tables where tablename=‘test1';
MOVE TABLE
BETWEEN
TABLE
SPACES:
• Move tables
from one
tablespace to
another
Syntax:
alter table
test1 set
tablespace
pg_default
• Check whether the table is moved successfully to another tablespace
Syntax: select * from pg_tables where tablename=‘test1’
• Find physical location of the table
Syntax: select pg_relation_filepath(‘test1');
• Find physical location of the tablespace
Syntax: postgres#\dt
PostgreSQL Administration (Windows & Linux) 72
Temporary tablespace:
• Temporary tables and indexes are created by PostgreSQL when
it needs to hold large datasets temporarily for completing a
query. EX: Sorting
• Temporary tablespace does not store any data and their no persistent
file left when we shutdown database.
How to create temporary tablespace
• Syntax : CREATE TABLESPACE temp01 OWNER ownername
LOCATION '\opt\app\hrd\'
PostgreSQL Administration (Windows & Linux) 73
• Set temp_tablespaces=temp01 in postgresql.conf and reloaded
configuration.
• PG will automatically create a subfolder in the above location when a
temp table is created.
• When we shutdown the database the temp files will be delete
automatically.
BACKUPS & RESTORES
• Backup is a copy of data taken from the database and can be used to
reconstruct in case of a failure.
• Backups can be divided into Logical backups and Physical
backups.
• Logical Backups are simple and the textual representation of the
data in the databases.
• These text statements can be used to recreate postgres cluster,
database or table.
PostgreSQL Administration (Windows & Linux) 74
• Physical backups are backups of the physical files used in storing
and recovering of database, such as datafiles, wal files and archive
files.
• Physical backups are further divided as online backup and offline
backup.
Logical Backup:
• Logical Backups are simple and the textual representation of the data
in the databases.
• It supports various output forms like plain text(default),tar and
custom binary format.
• Sql dumps creates a consistent copy of database as of the time of
execution.
• Small database are perfect candidates for logical backups.
• pg_dump (For one db) and pg_dumpall (All Dbs in cluster)
utilities are used to perform logical dumps.
• pg_dump --help displays the options which can be used to
customize of dumps.
Pg_dump :
Backup single database from postgres instance
Syntax: pg_dump test1 > /var/lib/pgsql/12/backups/test1backup
Pg_dumpall -- extract a PostgreSQL database cluster into a script file
Syntax: pg_dumpall -U postgres >
/var/lib/pgsql/12/backups/clusterall.sql
PostgreSQL Administration (Windows & Linux) 75
We can use any standard editor to view the extracted file ( Vi or notepad)
pg_dumpall --help displays the options which can be used to customize of
dumps.
Compressed & Split Dumps:
Dumps grows exponentially when dealing with large databases
We can use any standard compression utility to compress the dump like gz
( Linux only)
Syntax : pg_dump test1 | gzip
>/var/lib/pgsql/11/backups/test1backup.gz
We can split the dumps into smaller chunks of desirable size for easy
maintenance.
Syntax :
pg_dump test1 | split -b 1k - /var/lib/pgsql/11/backups/test1backup
Compressed & split backup
$ pg_dump -h localhost -U postgres -W -d mydb | gzip | split -b 100m –
mydb.sql.gz
PostgreSQL Administration (Windows & Linux) 76
Restore database using psql interface:
• Data restore is the process of copying backup data from secondary
storage and restoring it to its original location or a new location.
• Restore is performed to return data that has been lost, stolen or
damaged to its original condition or to move data to a new location.
• Restoring database using pg_dump backup
Syntax : psql -U test1 -d test1
</var/lib/pgsql/11/backups/test1backup
*NOTE : We have to create empty database with same name before
restore.
PostgreSQL Administration (Windows & Linux) 77
Pg_Restore
Pg_restore is a utility for restoring a PostgreSQL database from an archive
created by pg_dump in one of the non-plain-text formats.
Pg_restore is a useful utility use to restore a database or a single table.
Syntax to take Custom Format Archive file for pg_restore:
# pg_dump -Fc mydb > db.dump
Syntax for pg_restore:
# pg_restore –d test1 db.dump
We can restore a single table from a full pg_dump file without restoring
the entire database.
PostgreSQL Administration (Windows & Linux) 78
This scenario is really helpful when we lose a particular table accidentally
or due to user mistake.
Syntax:
$ pg_restore -t <tablename> -d <DBNAME> pathtodump.dump
List content / objects in dump
$ pg_restore -l <dump file name> -- non plain text
File System backup – Offline mode :
• The database server must be shut down in order to get a usable
backup.
• The database server must be shutdown before restoring the data.
• Partial restore or Single table restore not possible.
• This approach is suitable only for complete backup or complete
restoration of the entire database cluster.
• “Consistent snapshot" of the data directory is considered a better
approach than file system level backup.
• Syntax : tar –cvzf backup.tar /usr/local/pgsql/data
PostgreSQL Administration (Windows & Linux) 79
Windows:
Linux :
Restore Splited backup files :
PostgreSQL Administration (Windows & Linux) 80
$ cat clusterallbkp* | psql -h localhost -U postgres -W -d mydb
Compressed splited files restored
$ cat mydb.sql.gz* | gunzip | psql -h localhost -U postgres -W -d mydb
Convert binary dump to plain text
$ pg_Restore binbkp.dump -f plainbkp.sql
Continuous Archiving and PITR :
• Continuous Archiving is the process of archiving Write Ahead Log
(WAL) files.
• Point-in-Time-Recovery(PITR) refers to PostgreSQL’s ability to start
from the restore of a full backup and apply archived WAL files up to
a specified timestamp.
Steps to setup up Continuous Archiving and PITR
Setting Up WAL Archiving
Making a Base Backup
Making a Base Backup Using the Low Level API
Recovering Using a Continuous Archive Backup
First we check the status that archive in ON or Off. Below command will
be used in psql.
#show archive_mode
returns On or Off.
If this is off , we need to make it on by editing the postgresql.conf. Before
doing this, need to ensure that postgres cluster is shutdown (#pg_ctl stop).
PostgreSQL Administration (Windows & Linux) 81
Check the cluster status by psql# pg_ctl status.
If the cluster is stopped , we can set the below parameter in postgresql.conf
file.
1. wal_level – replica or archive
2. archive_mode = on
3. archive_command = ‘copy “%p” “c:\\archive_location\\%f”’
##for Windows
3. archive_command= ‘cp -i %p /opt/archivedir/%f‘ ## for
linux
archive_command is the most imp parameter which will copy the wal
files from one location to another safe location ie. Archive directory.
After the parameter is set , start the cluster psql # pg_ctl start.
Check the destination location if any files are there, if not we can do a test
by firing the below command in psql
# select pg_start_backup(‘test1’); -- we are telling postgresql that backup
is started.
check the archive folder.
#select pg_stop_backup();
check the archive folder the files copied.
To stop archiving , comment # the parameter in postgresql.conf.
Online backup using low level API :
• Ensure that WAL archiving is enabled and working.
• Select either Exclusive or Non-Exclusive low level API backup.
PostgreSQL Administration (Windows & Linux) 82
• As super user or user with Execute priviliges issue command “select
pg_start_backup(‘label’,false, false)”
• pg_start_backup performs a checkpoint right away and it takes
sometime to finish.
• If we want the control to return faster use “select
pg_start_backup(’label’,true); “ it performs checkpoint faster.
• Use file system utility to backup the data directory.
• Run “select pg_stop_backup();” to stop the backup mode.
• All wal segments would have been archived to form a complete set
of backup files.
However, archive files alone will not help us in restoring a database in case
of a failure.The core of any backup and restore strategy is full backup or
the base backup.Archive files work in conjunction with full backup in
order to restore a database until a certain time.
For example.
I have taken a full backup or a base backup on Sunday and my system
crashed somewhere on Monday 10 am.
Now, in order to recover my system until 10 a.m. on Monday, I have to
restore my full backup along with all the archive files till Monday 10 am.
If I don't have a full back up, then all the archive files are meaningless.
I cannot restore my system.
so the base back up is very critical and important.
In this section, we are going to learn how to take a full online backup.
PostgreSQL Administration (Windows & Linux) 83
or, full online basebackup. This is a preferred way of backing up any
production system or any important system. In this type of backup, the
system is online and available for use, the users are connected and they are
doing their transactions while the backup is happening in the background.
The reason this type of backup is preferred because in real time, we cannot
afford to have a down time, the system availability is a huge criteria and
the expectation from the business is that the system should be available 24
by seven.
We cannot shut down a system for us to take a backup.
There are two ways how we can take online backup or online backup.
One is through a low level API, which is what we are going to discuss
now.
Another one is using a tool called pg_basebackup.
How to take Low level API backup :
psql # select pg_start_backup(‘backup2’,false,false);
backup2 – backup label.
False - That I'll give the parameter false, which is to inform PG that it can
take its time for doing the IO and not return the control back to the user
immediately.
False – it informs to PG that I want to do a non exclusive backup.
Now run the command
psql # \! tar cvzf backup2.tar “c:\program files\postgresql\12\data” -
windows
PostgreSQL Administration (Windows & Linux) 84
psql # \! tar cvzf backup2.tar /var/lib/pgsql/12/data - Linux
it will take the online backup in backup2.tar file.
In above image we can see the permission denied error, actually this log
file is in use. When we stop the backup it will automatically copy to
archive folder.
Stop the backup:
psql # select pg_stop_backup(‘f’); -- f means non exclusive backup , other
wise t.
PostgreSQL Administration (Windows & Linux) 85
You can use this information when you are doing a point in time recovery.
You can specify the stop back up time and say that I want to recover my
database.
Now we have the complete online backup with the last wal files. If we
want to restore it we can untar the backup2.tar copy the wal file in folder
and start cluster.
Pg_basebackup:
• pg_basebackup is used to take base backups of a running
PostgreSQL database cluster.(Online).
• This backup can be used for PITR or Replication.
• It automatically puts and take out the database from backup mode.
• Backups are always taken of the entire database cluster and cannot be
used for single database or objects.
Syntax: pg_basebackup –D <backup directory location>
There are no direct command for incremental or differential backup in
postgreSQL. But it can be managed. If the databases are small, we ideally
prefer taking full backup, which is pg_basebackup backup fully on a daily
basis.
But for a larger database. This is not possible because the backup me takes
hours to get completed.
PostgreSQL Administration (Windows & Linux) 86
So the other scenario is. We get a full backup on sunday. Then we have all
our Wal files which are created and archived.
On Monday, by the end of the day, end of business by five or five thirty, I
would go ahead and take a backup of all the archive files, which are
generated from my last backup to end up in this Monday.
repeat the same on Tuesday, Wednesday, Thursday, Friday, Saturday. Then
all this backup, the full backup plus all the archive files , I take an entire
backup on tape for that particular week, so I have a full backup and all
archive for that particularly week.
then I flush everything out and Sunday we start the routine again.
Below are pg_basebackup command.
C:/> pg_basebackup -h localhost -U posrgres -p 5432 -D “C:\\
mybasebackup\” -z -P -Ft -Xs
-h : host
-U: user
-p : port
-D : destination folder
-z : zip
-P : show progress
-Ft : format tar.
-Xs: also archive the transaction which is happening during the
backup itself.
PostgreSQL Administration (Windows & Linux) 87
See the basebackup and wal files during backup running.
For non zip format :
#pg_basebackup -U postgres -D “C:\\mybasebackup\” -P -Fp -Xs
– windows
#pg_basebackup -D /opt/mybasebackup/ -P -Fp -Xs – linux
PITR and Recovery Configuration:
• Point-in-Time-Recovery(PITR) refers to PostgreSQL’s ability to
restore and recover a postgresql cluster until a specified point.
Now, for example, I assume there is disk failure or got corrupted in
data or drop the database accidentally. It becomes a job of a DBA to
recover and restore a database until the time of the failure. So in order to
do that, I need to have strong and reliable backup strategy, which will help
me to restore my database.
• Restore Command : This command is used to retrieve archive
segment of the WAL Files.
Example (postgresql.conf):
restore_command = 'cp /mnt/server/archivedir/%f "%p“’ # Linux
PostgreSQL Administration (Windows & Linux) 88
restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' #
Windows
Recovery_Target = immediate ( This parameter specifies the
recovery should end as soon as a consistent state is reached).
Recover_Target_Lsn – This parameter specifies the LSN of the
write-ahead log location up to which the recovery will proceed. It is log
Sequence number (LSN) we can instruct PG to restore until the specified
log sequence number generated while log generation.
To see the current WAL LSN number below is the command.
Psql # select
pg_current_wal_lsn(),pg_wal_file_name(pg_current_wal_lsn());
#output
• Recovery_Target_Name = This parameter specifies the named
restore point(create with pg_create_restore_point)) to which recovery
will proceed.
• Recovery_Target_Time = This parameter specifies the time stamp up
to which recovery will proceed. (in 95% cases it uses)
• Recovery_Target_Xid = The parameter specifies the transaction ID
upto which recovery will proceed.
PostgreSQL Administration (Windows & Linux) 89
• Recovery_Target_Inclusive = Specifies whether to stop the recovery
just after the target is reached(on) or just before the recovery
target(off). Default is On. Note : Lsn, name, time, xid use only one
at the time we cant use all of them.
Steps to perform PITR
• Stop the server, if it's running. ( #pg_ctl stop)
• copy the whole old cluster data directory and any tablespaces to a
temporary location.
• Remove all directories and files from cluster data directory.
• Restore the database files from your file system backup or base
backup. (don’t move)
• Remove old files present in pg_wal directory in the file system
backup.
• If there are any unarchived WAL segment files recovered from
crashed cluster, copy them to pg_wal directory.
• Set recovery configuration settings in postgresql.conf and create a
file recovery.signal in the cluster data directory. Before version 12 we
used to have a file recovery.conf. We used to set all the recovery
parameters in recovery.conf file. we use to create it and save it.
whenever PG recovers, starts the recovery , It will first look for this
recovery.conf file whether it is available or not. If it is available, it
will read the parameter and start the recovery.This is how it was
before 12 version. From version 12 these all parameters included in
postgresql.conf. Only empty file recovery.signal need to be in data
directory. When PG starts the recovery, it will check data folder that
there is any file recovery signal. If it finds a file called
recovery.signal, it will go ahead and start the recovery.
PostgreSQL Administration (Windows & Linux) 90
• Temporarily modify pg_hba.conf to prevent ordinary users from
connecting. Stop / block all the connections.
• Start the server (pg_ctl start) . The server will go into recovery mode
and proceed to read through the archived WAL files it needs.
• Upon completion of the recovery process, the server will remove
recovery.signal.
• Inspect the contents of the database to ensure the recovered database
is in desired state.
• Modify pg_hba.conf to allow users to connect to the database.
PostgreSQL Administration (Windows & Linux) 91
How to recover database using pg_basebackup & how to perform
PITR using pg_basebackup
Prerequisite:
Ensure Archive Mode is turned on and archiving is happening.
How to recover database using online pg_basebackup:
Example:
Step 1:
Create a table and insert few records, update or delete ( perform some
transaction which can be archived)
I have created a table named test1 and did few operations on it like insert
and update.
Currently my table has 5 rows.
Step 2:
Verify archive log folder whether all wal files are archived.
PostgreSQL Administration (Windows & Linux) 92
Step 3:
Perform a log switch to archive the current log
As you can see the current log is archived as well.
Step 4:
Take a pg_basebackup of the entire cluster. Using the below command.
This will generate 2 tar files
One is of the entire data and other one is wal files(pg_wal.tar)
$ pg_basebackup -Ft -D var/pgtmp – example syntax
Step 5:
Stop the cluster and delete the data folder( we are going to mimic a crash
here were we lost our data folder).
PostgreSQL Administration (Windows & Linux) 93
I removed the data folder using rm –rf and you can see that there is no
data folder in the location.
Step 6: Now we are going to restore the data folder using the backup
which we took.
First create data folder in the same location where we removed.
Second move in to the data folder and create pg_wal folder.
Let us start with the restore of data folder and wal files using the
pg_basebackup which we took.
PostgreSQL Administration (Windows & Linux) 94
This will create all the data inside the data folder. Log in to data folder and
check whether all the folder and files are there.
Now let us restore the wal files from wal backup.
You can see that the wal files are restored in the pg_wal folder.
Step 7: Now we need to ensure that the database is consistent and tell our
database server to copy files from our archived location to WAL file
location. For this we need to edit postgresql.conf file.
Add the following entry in postgresql.conf
Restore_command = 'cp/var/lib/pgsql/12/archive_logs/%f %p'
PostgreSQL Administration (Windows & Linux) 95
Step 8: Start the cluster. (Remember all this while the cluster was down).
You may get error like this when you start ( Permission error on data
directory)
Just change the permission on data directory to 700
And start the cluster again
The cluster started successfully. Let us check whether the table test1 with 5
records exist or not.
PostgreSQL Administration (Windows & Linux) 96
So we have successfully deleted and restored our database using
pg_basebackup.
PITR
Let us try now to a PITR.
Step 1: We will use the same table to perform PITR. I will be adding few
more row to the existing table to generate archives.
Switch the current archive log.
Now check the archive log folder whether we got any new archives.
PostgreSQL Administration (Windows & Linux) 97
We can see that there are many new archives which are generated in the
archive log folder.
Step 2:
Take a fresh pg_basebackup.
Step 3:
Now insert few more rows in the test1 table after that backup. Before my
row count was 10 now it is 15.
PostgreSQL Administration (Windows & Linux) 98
My task is restore the database when the table was with 10 records.
Step 4:
Now I will mimic a crash by deleting my data directory.
I have removed my data directory.
Step 5:
Restore the database from the backup which we took @ step 2.
Make sure we create the data and pg_wal folder before we start the restore.
Start the restore operation
PostgreSQL Administration (Windows & Linux) 99
ONLY DATA DIRECTORY SHOULD BE RESTORED. DON’T
RESTORE PG_WAL TAR.
Recovery file will guide the Point in time to backup.
Step 6:
I have to recover my database till the point of 10 records. So I will check
the archive log which was generated by that time.
So I am going to recover my database till 16:24 before I did my
pg_basebackup. There are few additional archive files @ 16:30 and 16:32
which hold the new 5 records. I don’t want that.
Step 7:
Create recovery.signal file inside /data folder.
And add the following entries in the file
PostgreSQL Administration (Windows & Linux) 100
And save the file using wq!. These two parameters are very important for
PITR . If we don’t add these two parameters all the archive files will be
applied and instead of 10 rows we will get 15 rows.
Copy the two commands in postgresql.conf file as well as
Step 8:
Now start the cluster and it will recover the database till the specified time.
Ensure to
Change the permission on data folder to 700
Step 9:
Start the cluster
PostgreSQL Administration (Windows & Linux) 101
Step 10:
Check how many rows are there in table test1.
Hence we have successfully restored our database using Point in time
recovery.
Maintenance and its options
• All databases requires some kind of maintenance tasks to be
performed regularly to achieve optimum performance.
• Maintenance task are ongoing and repetitive which are ideally
automated and schedule from cron scripts(linux) and task
scheduler(windows).
• Postgresql provides the following maintenance option:
- Updating Planner Statistics/Analyze
- Vacuum
- Routine Reindexing
- Clustering
Updating Planner Statistics:
• PostgreSQL query planner relies on statistical information about the
contents of tables in order to generate good plans for queries.
whenever we executes a query it travels a path in order to get the
results. The path defines that how fast and how effective the results
can be given. If the stats about the table i.e. count, row count, size is
PostgreSQL Administration (Windows & Linux) 102
not proper than optimizer can not generate the good plans and query
execution may take the more time then usual. Because of the
optimizer is not aware of how many rows and columns are there in
table.
• These stats are gathered by Analyze command, which keeps the
stats up- to-date about the current state of the table.
• Analyze command collects information about size, row count,
average row size and row sampling information.
• Inaccurate or stale stats can mislead optimizer to choose plans
which might degrade database performance.
• Tables with heavy update/delete need to be analyzed on a regular
basis to ensure optimal performance is achieved.
• We can run Analyze command automatically by enabling
autovaccum daemon or can run the analyze command manually.
Command to enable , disable and status of autovacuum for any table.
# alter table table_name set (autovacuum_enabled=false) ; -- to
disable
# alter table table_name set (autovacuum_enabled=true) ; -- to
enable ,
#show autovacuum; – from postgresql.conf.
Command to check last analyze and vacuums:
#select * from pg_stat_all_tables where schemaname='public';
#SELECT schemaname, relname, last_analyze FROM pg_stat_all_tables
WHERE relname = 'a';
PostgreSQL Administration (Windows & Linux) 103
#SELECT schemaname, relname, last_vacuum, last_autovacuum,
vacuum_count, autovacuum_count FROM pg_stat_user_tables;
Note: pg_stat_all_tables includes system catalog tables also. But
pg_stat_user_tables is having only non system tables;
check autovacuum process on OS level:
#ps -axww | grep autovacuum
Check the logs:
#cat /var/log/postgresql/postgresql-13-main.log
Note: if the database was ever shutdown uncleanly, all the stats are
reset. You could check pg_stat_database. Or they can be reset manually,
either for the database or for individual tables.
#select datname,stats_reset from pg_stat_database;
Some Monitoring SQLs
-- ##Sessions and details
SELECT
pid,
datname,
usename,
application_name,
client_addr,
pg_catalog.to_char(backend_start, 'YYYY-MM-DD HH24:MI:SS TZ')
AS backend_start,
state,
PostgreSQL Administration (Windows & Linux) 104
wait_event_type || ': ' || wait_event AS wait_event,
pg_catalog.pg_blocking_pids(pid) AS blocking_pids,
query,
pg_catalog.to_char(state_change, 'YYYY-MM-DD HH24:MI:SS TZ')
AS state_change,
pg_catalog.to_char(query_start, 'YYYY-MM-DD HH24:MI:SS TZ') AS
query_start,
backend_type,
CASE WHEN state = 'active' THEN ROUND((extract(epoch from
now() - query_start) / 60)::numeric, 2) ELSE 0 END AS active_since
FROM
pg_catalog.pg_stat_activity
ORDER BY pid;
SELECT relname, relpages FROM pg_class ORDER BY relpages DESC;
--#Database size
SELECT pg_size_pretty(pg_database_size('uidservices'));
SELECT datname as db_name,
pg_size_pretty(pg_database_size(datname)) as db_usage FROM
pg_database
order by 2 ;
create extension pageinspect;
--# get raw page values
SELECT t_xmin,
t_xmax,
tuple_data_split('adv_txn_log_012021'::regclass,
t_data,
t_infomask,
t_infomask2, t_bits) FROM
heap_page_items(get_raw_page('adv_txn_log_012021', 0));
select txid_current();
PostgreSQL Administration (Windows & Linux) 105
-- #Live and dead tuples, if dead tuples>50% of live tuples table need to be
vacuumed.
select schemaname,
relname,
pg_size_pretty(pg_relation_size(schemaname|| '.' || relname)) as size,
n_live_tup,
n_dead_tup,
CASE WHEN n_live_tup > 0 THEN round((n_dead_tup::float /
n_live_tup::float)::numeric, 4) END AS dead_tup_ratio,
last_autovacuum,
last_autoanalyze
from pg_stat_user_tables
order by dead_tup_ratio desc NULLS LAST;
-- db level vacuum requirement if consumed_txid_pct>70% vacuum is
needed
SELECT
datname,
age(datfrozenxid) AS frozen_xid_age,
ROUND(
100 *(
age(datfrozenxid)/ 2146483647.0 :: float
)
) consumed_txid_pct,
current_setting('autovacuum_freeze_max_age'):: int - age(datfrozenxid)
AS remaining_aggressive_vacuum
FROM
pg_database
WHERE
datname NOT IN (
'template0', 'template1'
);
--Connect to the database and run the following query to list tables that are
currently processed by the autovacuum daemon:
PostgreSQL Administration (Windows & Linux) 106
SELECT p.pid,
p.datname,
p.query,
p.backend_type,
a.phase,
a.heap_blks_scanned / a.heap_blks_total::float * 100 AS "% scanned",
a.heap_blks_vacuumed / a.heap_blks_total::float * 100 AS "%
vacuumed",
pg_size_pretty(pg_table_size(a.relid)) AS "table size",
pg_size_pretty(pg_indexes_size(a.relid)) AS "indexes size",
pg_get_userbyid(c.relowner) AS owner
FROM pg_stat_activity p
JOIN pg_stat_progress_vacuum a ON a.pid = p.pid
JOIN pg_class c ON c.oid = a.relid
WHERE p.query LIKE 'autovacuum%';
SELECT
*,
relid :: regclass,
heap_blks_scanned / heap_blks_total :: float * 100 "% scanned",
heap_blks_vacuumed / heap_blks_total :: float * 100 "% vacuumed"
FROM
pg_stat_progress_vacuum;
-- top 10 tables transaction ID utilization if consumed_txid_pct>70%
vacuum is needed
SELECT c.relname AS table_name,
age(c.relfrozenxid) AS frozen_xid_age,
ROUND(100 * (age(c.relfrozenxid) / 2146483647)) AS
consumed_txid_pct,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_size
FROM pg_class c
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE c.relkind IN ('r', 't', 'm')
AND n.nspname NOT IN ('pg_toast')
PostgreSQL Administration (Windows & Linux) 107
ORDER BY 2 DESC
LIMIT 10;
--## table extrasize
SELECT current_database(), schemaname, tblname, bs*tblpages AS
real_size,
(tblpages-est_tblpages)*bs AS extra_size,
CASE WHEN tblpages > 0 AND tblpages - est_tblpages > 0
THEN 100 * (tblpages - est_tblpages)/tblpages::float
ELSE 0
END AS extra_pct, fillfactor,
CASE WHEN tblpages - est_tblpages_ff > 0
THEN (tblpages-est_tblpages_ff)*bs
ELSE 0
END AS bloat_size,
CASE WHEN tblpages > 0 AND tblpages - est_tblpages_ff > 0
THEN 100 * (tblpages - est_tblpages_ff)/tblpages::float
ELSE 0
END AS bloat_pct, is_na
-- , tpl_hdr_size, tpl_data_size, (pst).free_percent +
(pst).dead_tuple_percent AS real_frag -- (DEBUG INFO)
FROM (
SELECT ceil( reltuples / ( (bs-page_hdr)/tpl_size ) ) + ceil( toasttuples / 4
) AS est_tblpages,
ceil( reltuples / ( (bs-page_hdr)*fillfactor/(tpl_size*100) ) ) +
ceil( toasttuples / 4 ) AS est_tblpages_ff,
tblpages, fillfactor, bs, tblid, schemaname, tblname, heappages,
toastpages, is_na
-- , tpl_hdr_size, tpl_data_size, pgstattuple(tblid) AS pst -- (DEBUG
INFO)
FROM (
SELECT
( 4 + tpl_hdr_size + tpl_data_size + (2*ma)
- CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size
%ma END
PostgreSQL Administration (Windows & Linux) 108
- CASE WHEN ceil(tpl_data_size)::int%ma = 0 THEN ma ELSE
ceil(tpl_data_size)::int%ma END
) AS tpl_size, bs - page_hdr AS size_per_block, (heappages +
toastpages) AS tblpages, heappages,
toastpages, reltuples, toasttuples, bs, page_hdr, tblid, schemaname,
tblname, fillfactor, is_na
-- , tpl_hdr_size, tpl_data_size
FROM (
SELECT
tbl.oid AS tblid, ns.nspname AS schemaname, tbl.relname AS
tblname, tbl.reltuples,
tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages,
coalesce(toast.reltuples, 0) AS toasttuples,
coalesce(substring(
array_to_string(tbl.reloptions, ' ')
FROM 'fillfactor=([0-9]+)')::smallint, 100) AS fillfactor,
current_setting('block_size')::numeric AS bs,
CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|
ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma,
24 AS page_hdr,
23 + CASE WHEN MAX(coalesce(s.null_frac,0)) > 0 THEN ( 7 +
count(s.attname) ) / 8 ELSE 0::int END
+ CASE WHEN bool_or(att.attname = 'oid' and att.attnum < 0)
THEN 4 ELSE 0 END AS tpl_hdr_size,
sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 0) ) AS
tpl_data_size,
bool_or(att.atttypid = 'pg_catalog.name'::regtype)
OR sum(CASE WHEN att.attnum > 0 THEN 1 ELSE 0 END) <>
count(s.attname) AS is_na
FROM pg_attribute AS att
JOIN pg_class AS tbl ON att.attrelid = tbl.oid
JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats AS s ON s.schemaname=ns.nspname
AND s.tablename = tbl.relname AND s.inherited=false AND
s.attname=att.attname
LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
PostgreSQL Administration (Windows & Linux) 109
WHERE NOT att.attisdropped
AND tbl.relkind in ('r','m')
GROUP BY 1,2,3,4,5,6,7,8,9,10
ORDER BY 2,3
) AS s
) AS s2
) AS s3
-- WHERE NOT is_na
-- AND tblpages*((pst).free_percent +
(pst).dead_tuple_percent)::float4/100 >= 1
ORDER BY schemaname, tblname;
-- #index usage
SELECT
t.schemaname,
t.tablename,
c.reltuples::bigint AS num_rows,
pg_size_pretty(pg_relation_size(c.oid)) AS table_size,
psai.indexrelname AS index_name,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END AS "unique",
psai.idx_scan AS number_of_scans,
psai.idx_tup_read AS tuples_read,
psai.idx_tup_fetch AS tuples_fetched
FROM
pg_tables t
LEFT JOIN pg_class c ON t.tablename = c.relname
LEFT JOIN pg_index i ON c.oid = i.indrelid
LEFT JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
WHERE
t.schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY 1, 2;
--#txid before wraparround
SELECT
PostgreSQL Administration (Windows & Linux) 110
oid::regclass::text AS table,
age(relfrozenxid) AS xid_age,
mxid_age(relminmxid) AS mxid_age,
least(
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_freeze_max_age') -
age(relfrozenxid),
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_multixact_freeze_max_age') -
mxid_age(relminmxid)
) AS tx_before_wraparound_vacuum,
pg_size_pretty(pg_total_relation_size(oid)) AS size,
pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum
FROM pg_class
WHERE relfrozenxid != 0
AND oid > 1
ORDER BY tx_before_wraparound_vacuum;
--## index bloat calculating superuser
SELECT current_database(), nspname AS schemaname, tblname,
idxname, bs*(relpages)::bigint AS real_size,
bs*(relpages-est_pages)::bigint AS extra_size,
100 * (relpages-est_pages)::float / relpages AS extra_pct,
fillfactor,
CASE WHEN relpages > est_pages_ff
THEN bs*(relpages-est_pages_ff)
ELSE 0
END AS bloat_size,
100 * (relpages-est_pages_ff)::float / relpages AS bloat_pct,
is_na
-- , 100-(pst).avg_leaf_density AS pst_avg_bloat, est_pages,
index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth,
reltuples, relpages -- (DEBUG INFO)
PostgreSQL Administration (Windows & Linux) 111
FROM (
SELECT coalesce(1 +
ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)
), 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
) AS est_pages,
coalesce(1 +
ceil(reltuples/floor((bs-pageopqdata-pagehdr)*fillfactor/(100*(4+nulldatah
drwidth)::float))), 0
) AS est_pages_ff,
bs, nspname, tblname, idxname, relpages, fillfactor, is_na
-- , pgstatindex(idxoid) AS pst, index_tuple_hdr_bm, maxalign,
pagehdr, nulldatawidth, nulldatahdrwidth, reltuples -- (DEBUG INFO)
FROM (
SELECT maxalign, bs, nspname, tblname, idxname, reltuples,
relpages, idxoid, fillfactor,
( index_tuple_hdr_bm +
maxalign - CASE -- Add padding to the index tuple header to
align on MAXALIGN
WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
ELSE index_tuple_hdr_bm%maxalign
END
+ nulldatawidth + maxalign - CASE -- Add padding to the data to
align on MAXALIGN
WHEN nulldatawidth = 0 THEN 0
WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
ELSE nulldatawidth::integer%maxalign
END
)::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
-- , index_tuple_hdr_bm, nulldatawidth -- (DEBUG INFO)
FROM (
SELECT n.nspname, ct.relname AS tblname, i.idxname, i.reltuples,
i.relpages,
i.idxoid, i.fillfactor, current_setting('block_size')::numeric AS bs,
CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
PostgreSQL Administration (Windows & Linux) 112
WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|
ppc64|ia64|amd64' THEN 8
ELSE 4
END AS maxalign,
/* per page header, fixed size: 20 for 7.X, 24 for others */
24 AS pagehdr,
/* per page btree opaque data */
16 AS pageopqdata,
/* per tuple header: add IndexAttributeBitMapData if some cols
are null-able */
CASE WHEN max(coalesce(s.stanullfrac,0)) = 0
THEN 2 -- IndexTupleData size
ELSE 2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size +
IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
END AS index_tuple_hdr_bm,
/* data len: we remove null values save space using it fractionnal
part from stats */
sum( (1-coalesce(s.stanullfrac, 0)) * coalesce(s.stawidth, 1024))
AS nulldatawidth,
max( CASE WHEN a.atttypid = 'pg_catalog.name'::regtype
THEN 1 ELSE 0 END ) > 0 AS is_na
FROM (
SELECT idxname, reltuples, relpages, tbloid, idxoid, fillfactor,
CASE WHEN indkey[i]=0 THEN idxoid ELSE tbloid END AS
att_rel,
CASE WHEN indkey[i]=0 THEN i ELSE indkey[i] END AS
att_pos
FROM (
SELECT idxname, reltuples, relpages, tbloid, idxoid, fillfactor,
indkey, generate_series(1,indnatts) AS i
FROM (
SELECT ci.relname AS idxname, ci.reltuples, ci.relpages,
i.indrelid AS tbloid,
i.indexrelid AS idxoid,
coalesce(substring(
array_to_string(ci.reloptions, ' ')
PostgreSQL Administration (Windows & Linux) 113
from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor,
i.indnatts,
string_to_array(textin(int2vectorout(i.indkey)),' ')::int[] AS
indkey
FROM pg_index i
JOIN pg_class ci ON ci.oid=i.indexrelid
WHERE ci.relam=(SELECT oid FROM pg_am WHERE
amname = 'btree')
AND ci.relpages > 0
) AS idx_data
) AS idx_data_cross
)i
JOIN pg_attribute a ON a.attrelid = i.att_rel
AND a.attnum = i.att_pos
JOIN pg_statistic s ON s.starelid = i.att_rel
AND s.staattnum = i.att_pos
JOIN pg_class ct ON ct.oid = i.tbloid
JOIN pg_namespace n ON ct.relnamespace = n.oid
GROUP BY 1,2,3,4,5,6,7,8,9,10
) AS rows_data_stats
) AS rows_hdr_pdg_stats
) AS relation_stats
ORDER BY nspname, tblname, idxname;
Explain plan and Query Execution Cost:
Explain plans, statement displays execution plan chosen by the
optimizer, for select, update, insert and delete statement, which means
whenever we execute a query, it goes through a certain path in order
to create the output. We want to know how much time or how much
cost the query is going to take and what path the query is going to
traverse in order to get the output. So in that scenario, I can use
option call explain in order to see what the query is doing.
PostgreSQL Administration (Windows & Linux) 114
EXPLAIN PLAN statement displays execution plans chosen by the
optimizer for SELECT, UPDATE, INSERT, and DELETE statements.
Example : explain select * from <tablename>;
Aggregate (cost=1.12..1.14 rows=1 width=8)
-> Seq Scan on august (cost=0.00..1.10 rows=10 width=4)
(2 rows)
Cost of Query execution
Cost = number of pages * seq_page_cost + number of rows*
cpu_tuple_cost
PostgreSQL Administration (Windows & Linux) 115
Example explain plan :
see above to display all records from table tel_directory it will be going
thru sequential scan and take 0.00 initial cost and 82.00 actual cost.
Below is the example with filter applied.
PostgreSQL Administration (Windows & Linux) 116
Explain With index:
as you see in above image after creating index the cost is reduced and path
is also change. So by this we can check whether index is using effectively
or not.
Data Fragmentation:
• Fragmentation is often called bloat in PostgreSQL.
• PostgreSQL in line with Multiversion concurrency control (MVCC)
does not UPDATE in place or DELETE a row directly from the disk.
• These rows are marked as old versions.
PostgreSQL Administration (Windows & Linux) 117
• As the old version become obsolete and keep piling up. This causes
fragmentation and bloating in the table.
• Tables or Indexes become bigger than their actual size.
How to detect bloat:
For the routine check we can use the Extension pgstattuple module.
#Create extension pgstattuple; – - Run this command to install extension.
The following sql shows table txn_log is heavily fragmented with 98.71 %
free space, and tuple percent is only 9%.
#postgres=> SELECT * FROM pgstattuple('txn_log');
table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_Olen | dead_tuple_percent |
free_space | free_percent
-------------+-------------+-----------+---------------+------------------+----------------+--------------------+-------------
+--------------
10459709440 | 58988 | 9583432 | 0.09 | 7876 | 1430999 | 0.01 |
10324502496 | 98.71
(1 row)
postgres=> select count(*) from txn_log;
count
-------
58988
(1 row)
pgstattuple output columns:
Column Type Description
Physical relation length
table_len bigint
in bytes
tuple_count bigint Number of live tuples
Total length of live
tuple_len bigint
tuples in bytes
tuple_percent float8 Percentage of live
PostgreSQL Administration (Windows & Linux) 118
Column Type Description
tuples
dead_tuple_co
bigint Number of dead tuples
unt
Total length of dead
dead_tuple_len bigint
tuples in bytes
dead_tuple_pe Percentage of dead
float8
rcent tuples
Total free space in
free_space bigint
bytes
Percentage of free
free_percent float8
space
USER pgstateindex to get index bloat.
#SELECT * FROM pgstatindex('idx_customers');
version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages |
avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------
+--------------------
3| 3 | 1030766592 | 25480 | 835 | 101582 | 0| 23408 | 77.48 | 67.07
(1 row)
pgstatindex output columns:
Column Type Description
integ
version Btree version number
er
integ
tree_level Tree level of the root page
er
bigin
index_size Total number of pages in index
t
bigin
root_block_no Location of root block
t
PostgreSQL Administration (Windows & Linux) 119
Column Type Description
bigin Number of “internal” (upper-
internal_pages
t level) pages
bigin
leaf_pages Number of leaf pages
t
bigin
empty_pages Number of empty pages
t
bigin
deleted_pages Number of deleted pages
t
avg_leaf_dens float
Average density of leaf pages
ity 8
leaf_fragment float
Leaf page fragmentation
ation 8
Query with simplified output :
SELECT pg_size_pretty(pg_relation_size('txn_req_log_022022')) as
table_size, pg_size_pretty
(pg_relation_size('activity_req_log_pkey_txn_req_log_022022')) as
index_size, (pgstattuple ('txn_req_log_022022')).dead_tuple_percent;
Query to get tuples ratio:
select schemaname,
relname,
pg_size_pretty(pg_relation_size(schemaname|| '.' || relname)) as size,
n_live_tup,
n_dead_tup,
CASE WHEN n_live_tup > 0 THEN round((n_dead_tup::float /
PostgreSQL Administration (Windows & Linux) 120
n_live_tup::float)::numeric, 4) END AS dead_tup_ratio,
last_autovacuum,
last_autoanalyze
from pg_stat_user_tables
order by dead_tup_ratio desc NULLS LAST;
Vacuum:
Whenever we update or delete a record. Those record doesn't get
deleted from the disk, they are marked as old version, and this does set
in the same table occupying space. PostgreSQL maintains both the
past image and the latest image of a row in its own Table. It means,
UNDO is maintained within each table. And this is done through
versioning. Now, we may get a hint that, every row of PostgreSQL
table has a version number. And that is absolutely correct. In order to
understand how these versions are maintained within each table, you
should understand the hidden columns of a table (especially xmin) in
PostgreSQL.
When you describe a table, you would only see the columns you have
added. However, if you look at all the columns of the table in pg_attribute,
you should see several hidden columns.
# SELECT attname, format_type (atttypid, atttypmod) FROM pg_attribute
WHERE attrelid::regclass::text='scott.employee' ORDER BY attnum;
attname | format_type
----------+------------------------
tableoid | oid
cmax | cid
xmax | xid
cmin | cid
PostgreSQL Administration (Windows & Linux) 121
xmin | xid
ctid | tid
emp_id | integer
emp_name | character varying(100)
dept_id | integer
(9 rows)
tableoid : Contains the Object ID of the table that contains this row. Used
by queries that select from inheritance hierarchies.
xmin : The transaction ID(xid) of the inserting transaction for this row
version. Upon update, a new row version is inserted. Let’s see the
following log to understand the xmin more.
# select txid_current();
txid_current
--------------
646 – Remember this
(1 row)
dataxorg =# INSERT into scott.employee VALUES (9,'avi',9);
INSERT 0 1
# select xmin,xmax,cmin,cmax,* from scott.employee where emp_id = 9;
xmin | xmax | cmin | cmax | emp_id | emp_name | dept_id
------+------+------+------+--------+----------+---------
647 | 0 | 0 | 0 | 9 | avi | 9
(1 row)
As you see in the above log, the transaction ID was 646 for the command
=> select txid_current(). Thus, the immediate INSERT statement got a
transaction ID 647. Hence, the record was assigned an xmin of 647. This
means, no running transaction ID that has started before the ID 647, can
see this row. In other words, already running transactions with txid less
than 647 cannot see the row inserted by txid 647.
PostgreSQL Administration (Windows & Linux) 122
With the above example, you should now understand that every tuple has
an xmin that is assigned the txid that inserted it.
xmax : This values is 0 if it was not a deleted row version. Before the
DELETE is committed, the xmax of the row version changes to the ID of
the transaction that has issued the DELETE. Let’s observe the following
log to understand that better.
Before the Delete
------------------
# select xmin,xmax,cmin,cmax,* from scott.employee where emp_id = 10;
xmin | xmax | cmin | cmax | emp_id | emp_name | dept_id
------+------+------+------+--------+----------+---------
649 | 0 | 0 | 0 | 10 | avi | 10
After the Delete
------------------
# select xmin,xmax,cmin,cmax,* from scott.employee where emp_id = 10;
xmin | xmax | cmin | cmax | emp_id | emp_name | dept_id
------+------+------+------+--------+----------+---------
649 | 655 | 0 | 0 | 10 | avi | 10
(1 row)
As you see in the above logs, the xmax value changed to the transaction
ID that has issued the delete. If you have issued a ROLLBACK, or if the
transaction got aborted, xmax remains at the transaction ID that tried to
DELETE it (which is 655) in this case.
when you check the count after DELETE, you would not see the records
that have been DELETED. To see any row versions that exist in the
table but are not visible, we have an extension called pageinspect. The
pageinspect module provides functions that allow you to inspect the
contents of database pages at a low level, which is useful for debugging
purposes. Let’s create this extension to see the older row versions those
have been deleted.
PostgreSQL Administration (Windows & Linux) 123
# SELECT t_xmin, t_xmax, tuple_data_split('scott.employee'::regclass,
t_data, t_infomask, t_infomask2, t_bits) FROM
heap_page_items(get_raw_page('scott.employee', 0));
t_xmin | t_xmax | tuple_data_split
--------+--------+---------------------------------------------
668 | 0 | {"\\x01000000","\\x09617669","\\x01000000"}
668 | 0 | {"\\x02000000","\\x09617669","\\x01000000"}
668 | 0 | {"\\x03000000","\\x09617669","\\x01000000"}
668 | 0 | {"\\x04000000","\\x09617669","\\x01000000"}
668 | 0 | {"\\x05000000","\\x09617669","\\x01000000"}
668 | 669 | {"\\x06000000","\\x09617669","\\x01000000"}
668 | 669 | {"\\x07000000","\\x09617669","\\x01000000"}
668 | 669 | {"\\x08000000","\\x09617669","\\x01000000"}
668 | 669 | {"\\x09000000","\\x09617669","\\x01000000"}
668 | 669 | {"\\x0a000000","\\x09617669","\\x01000000"}
(10 rows)
we could still see 10 records in the table even after deleting 5 records from
it. Also, you can observe here that t_xmax is set to the transaction ID that
has deleted them. These deleted records are retained in the same table to
serve any of the older transactions that are still accessing them.
An UPDATE in PostgreSQL would perform an insert and a delete. Hence,
all the records being UPDATED have been deleted and inserted back with
the new value. Deleted records have non-zero t_xmax value.
Records for which you see a non-zero value for t_xmax may be required
by the previous transactions to ensure consistency based on appropriate
isolation levels.
cmax : The command identifier within the deleting transaction or zero. (As
per the documentation). However, both cmin and cmax are always the
same as per the PostgreSQL source code.
cmin : The command identifier within the inserting transaction. You could
see the cmin of the 3 insert statements starting with 0.
PostgreSQL Administration (Windows & Linux) 124
Every such record that has been deleted but is still taking some space is
called a dead tuple. Once there is no dependency on those dead tuples with
the already running transactions, the dead tuples are no longer needed.
Thus, PostgreSQL runs VACUUM on such Tables. VACUUM reclaims the
storage occupied by these dead tuples. The space occupied by these dead
tuples may be referred to as Bloat. VACUUM scans the pages for dead
tuples and marks them to the free space map (FSM). Each relation apart
from hash indexes has an FSM stored in a separate file called
<relation_oid>_fsm.
ctid : The physical location of the row version within its table. Note that
although the ctid can be used to locate the row version very quickly, a
row's ctid will change if it is updated or moved by VACUUM FULL.
Therefore ctid is useless as a long-term row identifier. A primary key
should be used to identify logical rows.
There is a utility or an option called vacuum that is provided in
postgreSQL.
• VACUUM reclaims storage occupied by dead tuples\rows.
• Tuples that are deleted or obsoleted by an update are not physically
removed from their table; they remain present until a VACUUM is
done.
• There are two type of vacuum in PostgreSQL, Vacuum and
Vacuum full.
Vacuum:
• Plain VACUUM (without FULL) simply reclaims space and makes
it available for re-use.
• Deleted and obsolete tuples are removed when vacuum is done.
PostgreSQL Administration (Windows & Linux) 125
• Extra free space is not returned to the operating system it's just
kept available for re-use in the object itself.
• No exclusive lock on table.
• Frequently updated tables are good candidates for vacuuming.
• VACUUM does an additional task. All the rows that are inserted
and successfully committed in the past are marked as frozen,
which indicates that they are visible to all the current and future
transactions.
• VACUUM does not usually reclaim the space to filesystem unless
the dead tuples are beyond the high water mark. if there are
pages(Blank pages at the end) with no more live tuples after the
high water mark, the subsequent pages can be flushed away to the
disk by VACUUM.
Vacuum Full
• VACUUM FULL rewrites the entire contents of the table into a new
disk file with no extra space.
• Unused space to be returned to the operating system.
• Compacts tables and reclaims more space.
• Takes much longer than regular vacuum and places exclusive lock
the tables. We should do this activity in non business hours.
• full vacuum takes extra disk space, since it writes a new copy of
the table and doesn't release the old copy until the operation is
complete.
PostgreSQL Administration (Windows & Linux) 126
• vacuumdb is a utility for cleaning a PostgreSQL database.
vacuumdb will also generate internal statistics used by the
PostgreSQL query optimizer.
• Syntax : vacuumdb -f –dbname=mydb --verbose
vacuum cannot run in a transaction and cannot run in a function or in a
procedure. Generate the vacuum statements with the list of tables to be
processed with psql CLI and \gexec internal command to run them.
select format('vacuum %s.%s', schemaname, tablename)
from (select schemaname, tablename
from pg_tables where tablename like '%102020%'
) lt;
\gexec
Determine TransactionID Utilization
SELECT
datname,
age(datfrozenxid) AS frozen_xid_age,
ROUND(
100 *(
age(datfrozenxid)/ 2146483647.0 :: float
)
) consumed_txid_pct,
current_setting('autovacuum_freeze_max_age'):: int - age(datfrozenxid)
AS remaining_aggressive_vacuum
FROM
pg_database
WHERE
datname NOT IN (
PostgreSQL Administration (Windows & Linux) 127
'cloudsqladmin', 'template0', 'template1'
);
A value of >80% in the consumed_txid_pct column indicates that tables in
that database are in need of vacuuming to recycle transaction IDs.
Run the following query to list tables that are currently processed by the
autovacuum daemon:
SELECT p.pid,
p.datname,
p.query,
p.backend_type,
a.phase,
a.heap_blks_scanned / a.heap_blks_total::float * 100 AS "% scanned",
a.heap_blks_vacuumed / a.heap_blks_total::float * 100 AS "%
vacuumed",
pg_size_pretty(pg_table_size(a.relid)) AS "table size",
pg_size_pretty(pg_indexes_size(a.relid)) AS "indexes size",
pg_get_userbyid(c.relowner) AS owner
FROM pg_stat_activity p
JOIN pg_stat_progress_vacuum a ON a.pid = p.pid
JOIN pg_class c ON c.oid = a.relid
WHERE p.query LIKE 'autovacuum%';
Output:
-[ RECORD 2 ]+-------------------------------------------
pid | 286964
datname | test_db
query | autovacuum: VACUUM public.my_table
backend_type | autovacuum worker
phase | vacuuming indexes
% scanned | 100
% vacuumed | 0
table size | XX
indexes size | XX
PostgreSQL Administration (Windows & Linux) 128
owner | test_user
-[ RECORD 3 ]+-------------------------------------------
pid | 271948
datname | test_db
query | autovacuum: VACUUM ANALYZE public.sample1
backend_type | autovacuum worker
Each record in the output corresponds to one autovacuum operation
running in the database. Review the output and identify any records where
the vacuum is in the “vacuuming indexes” phase as shown in the phase
field. This indicates an operation that can potentially be sped up by
canceling the autovacuum and performing a manual vacuum instead.
If there are multiple tables eligible for manual vacuuming, focus on the
largest ones first. The larger the table, the longer the autovacuum process
can take. Therefore, applying this procedure on the largest tables first can
produce the biggest gains.
Vacuum then :
SELECT
pg_cancel_backend(286964)
FROM
pg_stat_progress_vacuum
WHERE
relid = 'public.pgbench_accounts' :: regclass;
VACUUM (
TRUNCATE off, INDEX_CLEANUP false,
VERBOSE, FREEZE
) public.my_table;
Monitor the vacuum Process
SELECT
*,
relid :: regclass,
heap_blks_scanned / heap_blks_total :: float * 100 "% scanned",
PostgreSQL Administration (Windows & Linux) 129
heap_blks_vacuumed / heap_blks_total :: float * 100 "% vacuumed"
FROM
pg_stat_progress_vacuum;
After the vacuum completes, you can optionally reindex the table. Our
optimized VACUUM command contained the INDEX_CLEANUP false
clause, which skips the index optimization stage. Bypassing index
optimization doesn’t cause any immediate issues, but if you frequently
vacuum the same tables with INDEX_CLEANUP false, it can lead to
index bloat in the long term. You may want to REINDEX your table
periodically if index bloat becomes a problem.
REINDEX (VERBOSE) TABLE CONCURRENTLY public.my_table;
(To recreate all the indexes of a table, you use the TABLE keyword and specify
the name of the table:)
Recheck the consumed_tcid_pct .
You can obtain TXID information at table level to identify tables that still
need vacuuming. SQL for top 10 tables ordered by transaction ID
utilization.
SELECT c.relname AS table_name,
age(c.relfrozenxid) AS frozen_xid_age,
ROUND(100 * (age(c.relfrozenxid) / 2146483647)) AS
consumed_txid_pct,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_size
FROM pg_class c
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE c.relkind IN ('r', 't', 'm')
AND n.nspname NOT IN ('pg_toast')
ORDER BY 2 DESC
LIMIT 10;
PostgreSQL Administration (Windows & Linux) 130
Transaction ID Wraparound Failures
Every time when we do insert, update or delete a unique transaction ID
number is assigned. Now, this ID number gets incremented automatically
so as you might do a transaction, a number is assigned to that transaction
and the counter will increment automatically.
PostgresQL stores this transaction information about each and every row
in the system. And this information is used to determine whether a row
will be visible or not to other transactions.
Assume a system is running for a long time without any kind of
maintenance, there's no order vacuum, there is no vacuuming happening
then when the system reaches or the transaction number reaches anywhere
close to 4 billion transaction,It will suffer something called Transaction
ID wrapper. The transaction ID, which has to be assigned to each and
every transaction, will become zero because there is no ID to be assigned
after four billion transaction. Once the ID after down to zero or we can say
starts from beginning , PG will not allow any new connection. This is to
maintain sanity of the database.
What happens after four billion transaction is that whatever rows you have
deleted in the past will be visible. Whatever updates you have done to
record the old value will be visible.
The database will enter in inconsistent state. So in order to avoid this, we
have to regularly do a vacuum.
• Multiversion concurrency control (MCC or MVCC), is a
Concurrency Control method commonly used by DBMS to provide
concurrent access to the database.
• MVCC depends on transaction ID numbers.
• Transaction IDs have limited size (32 bits)
PostgreSQL Administration (Windows & Linux) 131
• Cluster that runs for a long time (more than 4 billion transactions)
would suffer transaction ID wraparound.
• The XID or transaction ID will wrap around to zero means starts
from first transaction ID value. This is called catastrophic failure on
the server.
• Transactions that were in the past appear to be in the future — which
means their output become invisible.
• To void this situation it is necessary to vacuum every table in every
database at least once every two billion transactions.
Query to know current status of transaction ID
#select
datname,age(datfrozenxid),current_setting('autovacuum_freeze_max_age')
from pg_database order by 2 desc;
datname age current_setting
iptr_prod 94,638 200000000
template0 94,638 200000000
postgres 94,638 200000000
dvdrental 94,638 200000000
auakua 94,638 200000000
template1 94,638 200000000
sdrh_tmp 231 200000000
vbts 96 200000000
#select txid_current();
PostgreSQL Administration (Windows & Linux) 132
txid_current
-----------------
95117
Check Tables for last vacuumed/ live and dead tuples:
SELECT schemaname, relname, n_live_tup, n_dead_tup, last_autovacuum
FROM pg_stat_all_tables
ORDER BY n_dead_tup
/ (n_live_tup
* current_setting('autovacuum_vacuum_scale_factor')::float8
+ current_setting('autovacuum_vacuum_threshold')::float8)
DESC
LIMIT 10;
--#db_level wraparound risk
SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
, (age(datfrozenxid)::numeric/1000000000*100)::numeric(4,2) as
WRAPAROUND_RISK
FROM pg_database
ORDER BY 2 DESC;
--Table level wraparound risk.
select c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as "TXID age",
PostgreSQL Administration (Windows & Linux) 133
(greatest(age(c.relfrozenxid),age(t.relfrozenxid))::numeric/
1000000000*100)::numeric(4,2) as "% WRAPAROUND RISK"
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm')
ORDER BY 2 DESC;
select name, setting
FROM pg_settings
WHERE name ~ 'vacuum'
AND name ~'_age$'
ORDER BY 1 ASC;
Vacuum Freeze :
• Vacuum freeze is a special kind of vaccum, which marks rows as
frozen.
• Vacuum Freeze marks a table's contents with a very special
transaction timestamp that tells postgres that it does not need to be
vacuumed, ever.
• Postgres reserves a special XID called FrozenTransactionId.
• FrozenTransacationId is always considered older than normal XID
• Vaccum_freeze_min_age controls how old an XID value has to be
before it's replaced with FrozenXID
PostgreSQL Administration (Windows & Linux) 134
• VACUUM normally skips pages that don't have any dead row
versions, but those pages might still have row versions with old XID
values
• vacuum_freeze_table_age ensure all old XIDs have been replaced by
FrozenXID, a scan of the whole table is needed.
• The debian package’s configuration is quite odd, putting the
configuration files in the /etc/postgresql instead of the data area. The
following example is the standalone backend’s
$ /usr/lib/postgresql/13/bin/postgres --single -D
/var/lib/postgresql/13/main/base/
--config-file=/etc/postgresql/13/main/postgresql.conf
PostgreSQL stand-alone backend will start.
Below Settings are to make anti-wraparound autovacuum run more often,
so that individual runs are smaller. Further improve matters for this table if
you set vacuum_freeze_min_age to 0, so that all rows are frozen when
autovacuum runs.
autovacuum_freeze_max_age = 100000000,
autovacuum_multixact_freeze_max_age = 100000000,
vacuum_freeze_min_age = 0
It can be set on the table too:
ALTER TABLE tab SET (
autovacuum_freeze_max_age = 100000000,
autovacuum_multixact_freeze_max_age = 100000000,
vacuum_freeze_min_age = 0
PostgreSQL Administration (Windows & Linux) 135
);
To See how far the table is from TW:
# SELECT
oid::regclass::text AS table,
age(relfrozenxid) AS xid_age,
mxid_age(relminmxid) AS mxid_age,
least(
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_freeze_max_age') -
age(relfrozenxid),
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_multixact_freeze_max_age') -
mxid_age(relminmxid)
) AS tx_before_wraparound_vacuum,
pg_size_pretty(pg_total_relation_size(oid)) AS size,
pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum
FROM pg_class
WHERE relfrozenxid != 0
AND oid > 16384
ORDER BY tx_before_wraparound_vacuum;
PostgreSQL Administration (Windows & Linux) 136
Routine Reindexing:
• Insert, updates and delete operations fragments the index over a
period of time.
• A Fragmented index will have pages where logical order based on
key value differs from the physical ordering inside the data file.
• Heavily fragmented indexes can degrade query performance because
additional I/O is required to locate data to which the index points.
• Reindex rebuilds an index using the data stored in index table and
eliminates empty spaces between pages
• Syntax : reindex index <index_name>;
Detect index fragmentation:
#select * from pgstatindex('idxtmp_cdt');
check the leaf_fragmentation value to get the % of fragmentation.
Run #reindex index idxtmp_cdt; – to achive the 0 fragmentation.
Cluster a Table:
When the data is being inserted into the table we can not restrict it to
be entered in particular order, so the data is collected across multiple
pages and these pages will be recalled when we search the data or
totaling the count because , it is not entered in a particular sequence
and it is scattered across pages. So cluster table is the option to
arrange the data physically on the disk according to index.
• CLUSTER instructs PostgreSQL to cluster the table specified by
table_name based on the index specified by index_name.
PostgreSQL Administration (Windows & Linux) 137
• When a table is clustered, it is physically reordered based on the
index information.
• Clustering is a one-time operation: when the table is subsequently
updated, the changes are not clustered.
• An Access Exclusive lock is acquired .
• Cluster, lowers disk access and speeds up query when accessing a
range of indexed values. Because it has less pages to search.
• Cluster should not be executed during peak hours in production
environment.
• Syntax for cluster:
• CLUSTER table USING index_name;
• Next time you should execute only CLUSTER TABLE because It
knows that which index already defined as CLUSTER.
PostgreSQL does not have direct implementation of CLUSTER index
like Microsoft SQL Server. In PostgreSQL, we have one CLUSTER
command which is similar to Cluster Index.
Example cluster table:
Created a table and insert data without ordering. Now we need to fetch the
data with order by clause to get sorted data. We created an index on ID.
PostgreSQL Administration (Windows & Linux) 138
Now we can get the ordered results without order by clause by clustering
the table according to index.
PostgreSQL Administration (Windows & Linux) 139
Autovacuum Daemon:
• Autovacuum feature is used to automate the execution of
VACUUM and ANALYZE commands.
• Autovacuum checks for tables that have had a large number of
inserted, updated or deleted tuples based on statistics collection.
• Autovacuum launcher is in charge of starting autovacuum worker
processes for all databases.
• Launcher will distribute the work across time , attempting to start
the worker start one worker within each database every
autovacuum_naptime seconds(sleep time).
• Workers check for inserts, update and deletes and execute vacuum
and analyze if needed.
• View Autovaccum settings
# select * from pg_settings where name like '%autovacuum%';
PostgreSQL Administration (Windows & Linux) 140
Uninstalling PostgreSQL Windows:
By executing the uninstall utility In postgreSQL installation folder
postgreSQL canbe uninstalled from windows.
Uninstall popup will ask to uninstall individual components i.e.
PGADMIN, stack builder or library files as required or Uninstall entire
application.
By choosing the “Entire Application” and clicking ok, uninstall process
will starts and uninstall entire application from disk except data directory.
Data directory can be removed manually.
Uninstall PostgreSQL from Linux
Check the potgreSQL is running : #systemctl status postgresql12
if running then stop it : #sysemctl stop postgresql12
Now uninstall : yum remove postgres/* – linux
and remove data folder manually.
Ubuntu : #sudo apt-get --purge remove postgresql
#sudo apt-get purge postgresql*
#sudo apt-get --purge remove postgresql postgresql-doc
postgresql-common
#dpkg -l | grep postgres
#sudo rm -rf /var/lib/postgresql/
#sudo rm -rf /var/log/postgresql/
#sudo rm -rf /etc/postgresql/
PostgreSQL Administration (Windows & Linux) 141
Upgrading PostgreSQL
What is Upgrade
• Upgrading database from one PostgreSQL release to a newer one.
• PostgreSQL version numbers consist of a major and a minor version
number. Ex: 10.1. (major is 10 , 1 is minor)
• Major releases of PostgreSQL, the internal data storage format is
subject to change.
• Minor releases never change the internal storage format and are
always compatible with earlier and later minor releases of the same
major version number.
• Before PostgreSQL version 10.0, version numbers consist of three
numbers. Ex: 9.5.6 (9.5 is major , 6 is minor)
REASON FOR UPGRADE
• Security Fixes
• Enhanced Features
• Resolved Bugs and Other Issues
• Reduced Costs by performance improvement in newer version
• End of Support
PostgreSQL Administration (Windows & Linux) 142
UPGRADE SOLUTIONS
• Upgrading Data via pg_dumpall.
This is traditional method , very effective in order to use but it takes
a lot of time so it is good for smaller databases. It requires downtime
and adequate additional disk space to hold the new and old copies of
dump.
• Upgrading Data via pg_upgrade.
This is the faster upgrade utility provided for all kind of major and
minor upgrade.
• Upgrading Data via Replication.
UPGRADING VIA PG_DUMPALL
1. Take the entire backup.
Ensure all the application closed and all connections should be stopped
before backup to take consistent copy of backup.
Connections can be blocked in pg_hba.conf using reject authentication
method.
windows/> pg_dumpall -U postgres > c:\bkpfolder_old\bkp1.sql
Linux #pg_dumpall > /opt/bkpfolder/bkp1.sql
it will ask the password for each database, if the database count is huge the
environment variable PGPASS can be set to avoid inputting the password
for each db.
2. After the successful backup stop the old cluster #pg_ctl -D <datafolder
path> stop.
PostgreSQL Administration (Windows & Linux) 143
3. Rename the old PG directory.
4. Install and start new version and check proper installation.
5. Stop new cluster and restore the configuration changes in new
postgresql.conf and pg_hba.conf from old one.
6. After making the changes in configuration file, start the cluster.
7. Delete the “Create role postgres ..“ line from the dump. (Optional)
8. Restore the dump
C:\>psql -U postgres -f C:\bkpfolder_old\bkp1.sql
# psql -f /opt/bkpfolder/bkp1.sql
9. on the successful restore old postgres version can be uninstalled as per
process described in uninstall section of this course.
UPGRADE VIA PG_UPGRADE (Formerly PG_MIGRATOR)
• Pg_Upgrade (formerly called pg_migrator) allows data stored in
PostgreSQL data files to be upgraded to a later PostgreSQL major
version without the data dump/reload.
• Primary used for major PostgreSQL version upgrades.
• Major PostgreSQL releases regularly add new features that often
change the layout of the system tables, but the internal data storage
format rarely changes.
• pg_upgrade perform rapid upgrades by creating new system tables
and simply reusing the old user data files.
• pg_upgrade does its best to make sure the old and new clusters are
binary-compatible
PostgreSQL Administration (Windows & Linux) 144
PG_UPGRADE
Syntax:
PG_UPGRADE utility is more efficient then pg_dumpall, it is having
compatibility check option to check the compatibility between old and
new version. It is faster then pg_dumpall.
/usr/pgsql-12/bin/pg_upgrade --old-bindir=/usr/pgsql-10/bin --new-
bindir=/usr/pgsql-12/bin --old-datadir=/var/lib/pgsql/10/data --new-
datadir=/var/lib/pgsql/12/data –Link/Clone
Options:
- -b bindir
--old-bindir=bindir
The old PostgreSQL executable directory; environment variable
PGBINOLD
-B bindir
--new-bindir=bindir
The new PostgreSQL executable directory; environment variable
PGBINNEW
-c
--check
check clusters only, don't change any data
-d configdir
PostgreSQL Administration (Windows & Linux) 145
--old-datadir=configdir
the old database cluster configuration directory; environment
variable PGDATAOLD
-D configdir
--new-datadir=configdir
the new database cluster configuration directory; environment variable
PGDATANEW
-k
--link
Use hard links instead of copying files to the new cluster. This is the
fastest upgrade. Data directories remains in same location and creates a
link between new cluster and old data dirs. The drawback is we can’t go
back to old version.
--clone (Default)
Use efficient file cloning instead of copying files to the new cluster.
can go back to old version because links will not reset, but sufficient disk
space is required to hold both old and new copies.
-?
--help
show help, then exit
See the practical example on next page.
PostgreSQL Administration (Windows & Linux) 146
• Check whether PostgreSQL 10 is running or not and also find the
location of data and bin directory.
Data Location: /var/lib/pgsql/10/data
Bin location: /usr/pgsql-10/bin
• Backup data. (Depending on the backup strategy being used to
backup existing PostgreSQL) as postgres user
pg_basebackup -D /var/lib/pgsql/10/backups
or
pg_dumpall > /var/lib/pgsql/10/backups/clusterall.sql
3) Yum install PostgreSQL 12 as root or as admin user.
Yum install postgresql12-server.x86_64 postgresql12-contrib.x86_64
4)Check the location of the installed PostgreSQL 12 and old version
Postgresql 10.
PostgreSQL Administration (Windows & Linux) 147
You should see both the version 10 and 12 in /var/lib/pgsql/ and usr/
5 Initialize the PostgreSQL 12 cluster by navigating to the
/usr/pgsql-12/bin directory
Run pg_ctl as postgres user.
./pg_ctl -D /var/lib/pgsql/12/data initdb
PostgreSQL Administration (Windows & Linux) 148
6 Check the postgresql-12 data directory for newly populated directory.
/var/lib/pgsql/12/data
7 Change the directory to /tmp or any directory where you have
permission to write log file
Execute the below command to check whether new and old data
directory are compatible.
/usr/pgsql-12/bin/pg_upgrade --old-bindir=/usr/pgsql-10/bin --new-
bindir=/usr/pgsql-12/bin --old-datadir=/var/lib/pgsql/10/data --new-
datadir=/var/lib/pgsql/12/data –check
This test should pass without any error and should say clusters are
compatible. (Ensure that you are in the postgresql-10 bin folder when you
execute this command)
8 Ensure the application is down and no connections can be made to
PostgreSQL. If needed block connection from pg_hba.conf file.
9 Backup data again(Depending on the backup strategy being used to
backup existing PostgreSQL) as postgres user
pg_basebackup -D /var/lib/pgsql/10/backups
or
pg_dumpall > /var/lib/pgsql/10/backups/clusterall.sql
10 .[postgres@dv-pc-post bin]$ cd /usr/pgsql-10/bin
Stop postgresql-10 running on the server.
[postgres@dv-pc-post bin]$ ./pg_ctl -D /var/lib/pgsql/10/data stop
waiting for server to shut down.... done
server stopped
PostgreSQL Administration (Windows & Linux) 149
[postgres@dv-pc-post bin]$ ./pg_ctl -D /var/lib/pgsql/10/data status
pg_ctl: no server running
11 Navigate to /tmp folder and execute the below mentioned command
as postgresql user.
/usr/pgsql-12/bin/pg_upgrade --old-bindir=/usr/pgsql-10/bin --new-
bindir=/usr/pgsql-12/bin --old-datadir=/var/lib/pgsql/10/data --new-
datadir=/var/lib/pgsql/12/data
12 Change the port to 5432 in Postgresql.conf file in the location
/var/lib/pgsql/12/data/ (optional)
Change the port to 5433 in postgresql.conf file in the location
/var/lib/pgsql/10/data/ (optional)
13 Now start the new postgresql-12 instance.
cd /usr/pgsql-12/bin/
./pg_ctl -D /var/lib/pgsql/12/data start
PostgreSQL Administration (Windows & Linux) 150
14 Check the version by logging into psql
15 .There will be two scripts in the /tmp folder namely
delete_old_cluster and analyze_new_cluster.sh
16 Execute the below mentioned script logged in as postgres user.
./ analyze_new_cluster.sh
PostgreSQL Administration (Windows & Linux) 151
17 Check the tables, views and other objects are present in the new
upgrade database.
18 Uninstall old postgresql 10 – ( Purely your choice if you want to drop
it right away or want to keep it for some time) software and old
postgresql-10 directory.
19 Run ./ delete_old_cluster command to remove old cluster.
Introduction to Postgresql-13:
• Postgresql-13 was released on 2020-09-24 with performance
improvement features.
PostgreSQL Administration (Windows & Linux) 152
• It includes significant improvements to its indexing and lookup
system that benefit large databases.
• Space savings and performance gains from de-duplication of B-tree
index entries
• Improved performance for queries that use aggregates or partitioned
tables.
• Better query planning when using extended statistics.
• Parallelized vacuuming of indexes
• Incremental sorting
B-Tree Deduplication
• Merging of duplicate values together and forming a single list for
each value. So, key value appears only once.
Ex: Before : ‘Key A’ ,(1,1), ‘Key A’,(1,2) ,’Key A’, (1,3)
Now : ‘Key A’ (1,1)(1,2)(1,3)
• Deduplication results in a smaller index size for indexes with
repeating entries.
• Ram is efficiently used when the index is cached in shared buffers.
• Improved performance for queries that uses index scanning.
• Index bloating and Routine Index vacuum overhead is reduced.
• Users upgrading with Pg_Upgrade will need to use
”REINDEX” to make an existing index use this feature.
Deduplication example before and now
PostgreSQL Administration (Windows & Linux) 153
Version 12:
index size is 240 KB.
Version 13:
PostgreSQL Administration (Windows & Linux) 154
Index size is 88kB with same scenario. Because all rows have same
duplicate data.
Incremental Sorting
Incremental sorting, which accelerates sorting data when data that is sorted
from earlier parts of a query are already sorted.
Example: index on c1 and you need to sort dataset by c1, c2. Then
incremental sort can help you because it wouldn’t sort the whole dataset,
but sort individual groups whose have the same value of c1 instead. The
incremental sort is extremely helpful when you have a LIMIT clause.
See diff. Between version 12 and 13. cost is dramatically reduced for same
query in version 13.
Parallel Vacuum
VACUUM reclaims storage occupied by dead tuples
PostgreSQL Administration (Windows & Linux) 155
Tuples that are deleted or obsoleted by an update are not physically
removed from their table; they remain present until a VACUUM is done.
Max_parallel_maintenance_workers , Min_parallel_index_scan_size
parameter governs parallel vacuum.
The degree of parallelization is either specified by the user or determined
based on the number of indexes that the table has.
AutoVacuum for Append only transactions.
Syntax : VACUUM (PARALLEL 2, VERBOSE) <TableName>
Backup Manifests and pg_verifybackup in PostgreSQL 13
PostgreSQL 13 introduces two useful features to enhance automated
validation of physical backups: backup manifests and a new tool called
pg_verifybackup. In this short article I attempt to provide an overview of
them.
Backup manifests
JSON listing of the content taken by a physical backup using
pg_basebackup.
PostgreSQL Administration (Windows & Linux) 156
pg_verifybackup
A tool to verify that the content of a backup matches the given manifest.
pg_verifybackup expects to find also the WAL files from the start to the
end of the backup, which I deliberately did not include in the backup. I can
skip verification of the WAL files with the "-n” option:
#pg_verifybackup -n ~/backups/1588374735/
Trusted Extensions
Can install extensions without super user privileges if we have create
privilege on database.
Ex : plperl,pgcrypto and ltree.
Drop Database:
DROP DATABASE DBNAME WITH (FORCE)
(in version 12 , If a user is connected to a database, postgresql will not
allow you to drop that database until all user sessions are killed. Now, if
PostgreSQL Administration (Windows & Linux) 157
there are multiple users who are connected to the database, it becomes a
big task, to kill each and every session.) WITH (FORCE) is introduced in
version 13 to do the same thing.
Explain tracks wal_usage:
EX: EXPLAIN (ANALYZE, WAL, COSTS OFF) UPDATE t1 SET id
= 1 WHERE id = 1;
Whenever we do a bulk insert or update or delete, that changes are logged
in the wall files.
Right now, if I want to know how much data or how much changes are
written to WAL files in bytes we can use this feature.
SYSTEM VIEWES:
• Pg_stat_activity to report a parallel worker's leader process.
• Pg_stat_progress_basebackup to report the progress of streaming
base backups.
• Pg_stat_progress_analyze to report ANALYZE progress.
• Pg_shmem_allocations to display shared memory usage.
PostgreSQL Administration (Windows & Linux) 158