0% found this document useful (0 votes)

18 views36 pages

Postgresql Benchmark

The document discusses reaching 1 billion rows per second in PostgreSQL by scaling out to multiple nodes. It details how the author tweaked PostgreSQL for parallel query processing, implemented sharding across 32 nodes to process queries in parallel, and was able to achieve over 1 billion rows per second. It also discusses future improvements planned for PostgreSQL.

Uploaded by

Robert Marbun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views36 pages

Postgresql Benchmark

Uploaded by

Robert Marbun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Reaching 1 billion rows / second

Hans-Jürgen Schönig
www.postgresql-support.de

Hans-Jürgen Schönig
www.postgresql-support.de
Reaching a milestone

Hans-Jürgen Schönig
www.postgresql-support.de
Goal

I Processing 1 billion rows / second

I Show a path to even more scalability
I Silence the “scalability” discussion at some point
I See where the limitations are
I Do it WITHOUT commercial tools, warehousing tools, etc.

Hans-Jürgen Schönig
www.postgresql-support.de
Traditional PostgreSQL limitations

I Traditionally:
I We could only use 1 CPU core per query
I Scaling was possible by running more than one query at a time
I Usually hard to do

Hans-Jürgen Schönig
www.postgresql-support.de
PL/Proxy: The traditional way to do it

I PL/Proxy is a stored procedure language to scale out to shards.

I Worked nicely for OLTP workloads
I Somewhat usable for analytics
I A LOT of manual work

Hans-Jürgen Schönig
www.postgresql-support.de
On the app level

I Doing scaling on the app level

I A lot of manual work
I Not cool enough
I Needs a lot of development
I Why use a database if work is still manual?
I Solving things on the app level is certainly not an option

Hans-Jürgen Schönig
www.postgresql-support.de
The 1 billion row challenge

Hans-Jürgen Schönig
www.postgresql-support.de
Coming up with a data structure

I We tried to keep that simple:

Hans-Jürgen Schönig
www.postgresql-support.de
The query

SELECT grp, count(data)

FROM t_demo
GROUP BY 1;

Hans-Jürgen Schönig
www.postgresql-support.de
Single server performance

Hans-Jürgen Schönig
www.postgresql-support.de
Tweaking a simple server

I The main questions are:

I How much can we expect from a single server?
I How well does it scale with many CPUs?
I How far can we get?

Hans-Jürgen Schönig
www.postgresql-support.de
PostgreSQL parallelism

I Parallel queries have been added in PostgreSQL 9.6

I It can do a lot
I It is by far not feature complete yet
I Number of workers will be determined by the PostgreSQL
optimizer
I We do not want that
I We want ALL cores to be at work

Hans-Jürgen Schönig
www.postgresql-support.de
Adjusting CPU core usage

I Usually the number of processes per scan is derived from the

size of the table

test=# SHOW min_parallel_relation_size ;

min_parallel_relation_size
----------------------------
8MB
(1 row)

I One process is added if the tablesize triples

Hans-Jürgen Schönig
www.postgresql-support.de
Overruling the planner

I We could never have enough data to make PostgreSQL go for

16 or 32 cores.
I Even if the value is set to a couple of kilobytes.
I The default mechanism can be overruled:

test=# ALTER TABLE t_demo

SET (parallel_workers = 32);
ALTER TABLE

Hans-Jürgen Schönig
www.postgresql-support.de
Making full use of cores

I How well does PostgreSQL scale on a single box?

I For the next test we assume that I/O is not an issue
I If I/O does not keep up, CPU does not make a difference
I Make sure that data can be read fast enough.
I Observation: 1 SSD might not be enough to feed a modern
Intel chip

Hans-Jürgen Schönig
www.postgresql-support.de
Single node scalability (1)

Hans-Jürgen Schönig
www.postgresql-support.de
{
Single node scalability (2)

I We used a 16 core box here

I As you can see, the query scales up nicely
I Beyond 16 cores hyperthreading kicks in
I We managed to gain around 18%

Hans-Jürgen Schönig
www.postgresql-support.de
Single node scalability (3)

I On a single Google VM we could reach close to 40 million rows

/ second
I For many workloads this is already more than enough
I Rows / sec will of course depend on type of query

Hans-Jürgen Schönig
www.postgresql-support.de
Moving on to many nodes

Hans-Jürgen Schönig
www.postgresql-support.de
The basic system architecture (1)

I We want to shard data to as many nodes as needed

I For the demo: Place 100 million rows on each node
I We do so to eliminate the I/O bottleneck
I In case I/O happens we can always compensate using more
servers
I Use parallel queries on each shard

Hans-Jürgen Schönig
www.postgresql-support.de
Testing with two nodes (1)

explain SELECT grp, COUNT(data) FROM t_demo GROUP BY 1;

Finalize HashAggregate
Group Key: t_demo.grp
-> Append
-> Foreign Scan (partial aggregate)
-> Foreign Scan (partial aggregate)
-> Partial HashAggregate
Group Key: t_demo.grp
-> Seq Scan on t_demo

Hans-Jürgen Schönig
www.postgresql-support.de
Testing with two nodes (2)

I Throughput doubles as long as partial results are small

I Planner pushes down stuff nicely
I Linear increases are necessary to scale to 1 billion rows

Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (1)

I postgres_fdw uses cursors on the remote side

I cursor_tuple_fraction has to be set to 1 to improve the
planning process
I set fetch_size to a large value
I That is the easy part

Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (2)

I We have to make sure that all remote database servers work at

the same time
I This requires “parallel append and async fetching”
I All queries are sent to the many nodes in parallel
I Data can be fetched in parallel
I We cannot afford to wait for each nodes to complete if we want
to scale in a linear way

Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (3)

I PostgreSQL could not be changed without substantial work

being done recently
I Traditionally joins had to be done BEFORE aggregation
I This is a showstopper for distributed aggregation because all the
data has to be fetched from the remote host before aggregation
I Without this change the test is not possible.

Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (4)

I Easy tasks:
I Aggregates have to be implemented to handle partial results
coming from shards
I Code is simple and available as extension
I For the test we implemented a handful of aggregates

Hans-Jürgen Schönig
www.postgresql-support.de
Parallel execution on shards is now possible

I Dissect aggregation
I Send partial queries to shards in parallel
I Perform parallel execution on shards
I Add up data on main node

Hans-Jürgen Schönig
www.postgresql-support.de
Final results

node=# SELECT grp, count(data) FROM t_demo GROUP BY 1;

grp | count
-----+-----------
0 | 320000000
1 | 320000000
...
9 | 320000000
(10 rows)
Planning time: 0.955 ms
Execution time: 2910.367 ms

Hans-Jürgen Schönig
www.postgresql-support.de
Hardware used

I We used 32 boxes (16 cores) on Google

I Data was in memory
I Adding more servers is EASY
I Price tag: The staggering amount of EUR 28.14 (for
development, testing and running the test)

Hans-Jürgen Schönig
www.postgresql-support.de
A look at PostgreSQL 10.0

I A lot more parallelism will be available

I Many executor nodes will enjoy parallel execution
I PostgreSQL 10.0 will be a giant leap forward

Hans-Jürgen Schönig
www.postgresql-support.de
More complex plans

I ROLLUP / CUBE / GROUPING SETS has to wait for 10.0

I A patch for that has been seen on the mailing list
I Be careful with complex intermediate results
I Avoid sorting of large amounts of data
I Some things are just harder on large data sets

Hans-Jürgen Schönig
www.postgresql-support.de
Future ideas: JIT compilation

I JIT will allow us to do the same thing with fewer CPUs

I Will significantly improve throughput
I Some project teams are working on that

Hans-Jürgen Schönig
www.postgresql-support.de
Future ideas: “Deeper execution”

I So far only one “stage” of execution is used

I Nothing stops us from building “trees” of servers
I More complex operations can be done
I Infrastructure is in place

Hans-Jürgen Schönig
www.postgresql-support.de
Future things: Column stores

I Column stores will bring a real boost

I Vectorization can speed up things drastically
I Many commercial vendors already do that
I GPUs may also be useful

Hans-Jürgen Schönig
www.postgresql-support.de
Finally

I Any questions?

Hans-Jürgen Schönig
www.postgresql-support.de
Contact us

Cybertec Schönig & Schönig GmbH

Hans-Jürgen Schönig
Gröhrmühlgasse 26
A-2700 Wiener Neustadt

www.postgresql-support.de

Follow us on Twitter: @PostgresSupport

Hans-Jürgen Schönig
www.postgresql-support.de

3d Star Tetrahedron Template
100% (2)
3d Star Tetrahedron Template
1 page
Pdms 2中文版 (印粗動作到23頁即可) PDF
No ratings yet
Pdms 2中文版 (印粗動作到23頁即可) PDF
1 page
Applied Generative AI Professional
No ratings yet
Applied Generative AI Professional
8 pages
Adobe Acrobat Pro DC 21.001.20155 Crack + Serial Number (Latest)
50% (2)
Adobe Acrobat Pro DC 21.001.20155 Crack + Serial Number (Latest)
3 pages
PostgreSQL DBA Contents
No ratings yet
PostgreSQL DBA Contents
2 pages
50 46 Pgcon2008 Problem
No ratings yet
50 46 Pgcon2008 Problem
36 pages
PostgreSQL Configuration For Humans
No ratings yet
PostgreSQL Configuration For Humans
38 pages
Parallel Query Processing in PostgreSQL
No ratings yet
Parallel Query Processing in PostgreSQL
15 pages
PostgreSQL Distributed Architectures and Best Practices
No ratings yet
PostgreSQL Distributed Architectures and Best Practices
42 pages
PostgreSQL Proficiency For Python People
No ratings yet
PostgreSQL Proficiency For Python People
215 pages
Foundations PostgreSQL Administration 13
100% (1)
Foundations PostgreSQL Administration 13
307 pages
Zafin Learn Session - PostgreSQL Performance For Application Developers
No ratings yet
Zafin Learn Session - PostgreSQL Performance For Application Developers
58 pages
Introduction Postgre SQLAdministration V11
No ratings yet
Introduction Postgre SQLAdministration V11
274 pages
Distributed PostgreSQL
No ratings yet
Distributed PostgreSQL
118 pages
Psycopg 2010 Stuttgart
No ratings yet
Psycopg 2010 Stuttgart
44 pages
Postgresql Course Material
No ratings yet
Postgresql Course Material
205 pages
PostgreSQL_Interview_QA
No ratings yet
PostgreSQL_Interview_QA
9 pages
Postgrre
No ratings yet
Postgrre
14 pages
Accidentaldbalinuxcon 130102190320 Phpapp02
No ratings yet
Accidentaldbalinuxcon 130102190320 Phpapp02
61 pages
SOW For Postgres DB Activities - v01 - Linux
No ratings yet
SOW For Postgres DB Activities - v01 - Linux
7 pages
Challenges of Distributing Postgres: A Citus Story: Ozgun Erdogan
No ratings yet
Challenges of Distributing Postgres: A Citus Story: Ozgun Erdogan
47 pages
CSE-6001 PostgreSQL Tutorial
No ratings yet
CSE-6001 PostgreSQL Tutorial
34 pages
Making Postgres Central in Your Data Center
No ratings yet
Making Postgres Central in Your Data Center
39 pages
Postgresql
No ratings yet
Postgresql
56 pages
ADC Theory
No ratings yet
ADC Theory
7 pages
EPAS Essentials v15
No ratings yet
EPAS Essentials v15
432 pages
Admin Workshop
No ratings yet
Admin Workshop
117 pages
PostgreSQL When It's Not Your Job
No ratings yet
PostgreSQL When It's Not Your Job
183 pages
PostgreSQL Essentials v16 Student
No ratings yet
PostgreSQL Essentials v16 Student
462 pages
Postgresql: Complete
No ratings yet
Postgresql: Complete
56 pages
Resumo Postgresql
No ratings yet
Resumo Postgresql
6 pages
Major Features: Postgres 10: Ruce Omjian
No ratings yet
Major Features: Postgres 10: Ruce Omjian
20 pages
PostgreSQL IQ
No ratings yet
PostgreSQL IQ
27 pages
Rethinking Web Development With PostgreSQL
No ratings yet
Rethinking Web Development With PostgreSQL
5 pages
DBA Roadmap - Learn To Become A Database Administrator With Postg
No ratings yet
DBA Roadmap - Learn To Become A Database Administrator With Postg
8 pages
Postgresql Query Optimization: Step by Step Techniques
No ratings yet
Postgresql Query Optimization: Step by Step Techniques
50 pages
0292 Introduction Postgresql
No ratings yet
0292 Introduction Postgresql
91 pages
PostgresChina2018 刘东明 PostgreSQL并行查询
No ratings yet
PostgresChina2018 刘东明 PostgreSQL并行查询
36 pages
Postgresql Tuning Guide: Postgresql Architecture: Key Takeaways
No ratings yet
Postgresql Tuning Guide: Postgresql Architecture: Key Takeaways
8 pages
Postgresql 7.2 Tutorial: The Postgresql Global Development Group
No ratings yet
Postgresql 7.2 Tutorial: The Postgresql Global Development Group
35 pages
Learn PostgreSQL - Second Edition - . - Instant Download
100% (3)
Learn PostgreSQL - Second Edition - . - Instant Download
46 pages
PostgreSQL Essentials v16 Student
No ratings yet
PostgreSQL Essentials v16 Student
400 pages
PostgreSQL For Beginners
100% (6)
PostgreSQL For Beginners
142 pages
PostgreSQL - Identifying Slow Queries and Fixing Them
No ratings yet
PostgreSQL - Identifying Slow Queries and Fixing Them
40 pages
Modul PostgreSQL Terbaru - WebHozz
No ratings yet
Modul PostgreSQL Terbaru - WebHozz
175 pages
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
No ratings yet
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
45 pages
PostgreSQL Server Programming 2nd Edition Usama Dar - Download The Ebook and Start Exploring Right Away
100% (1)
PostgreSQL Server Programming 2nd Edition Usama Dar - Download The Ebook and Start Exploring Right Away
81 pages
Q4 2021 - Webinar - Slides - Tuning Tips To Maximize Postgres Performance
No ratings yet
Q4 2021 - Webinar - Slides - Tuning Tips To Maximize Postgres Performance
41 pages
03-PostgreSQL-Database Admin Overview
No ratings yet
03-PostgreSQL-Database Admin Overview
32 pages
Five Steps Performance Postgres
No ratings yet
Five Steps Performance Postgres
94 pages
52492-rc071 Postgresql PDF
No ratings yet
52492-rc071 Postgresql PDF
11 pages
A Tour of Postgresql Internals: Tom Lane Great Bridge, LLC Tgl@Sss - Pgh.Pa - Us 1
No ratings yet
A Tour of Postgresql Internals: Tom Lane Great Bridge, LLC Tgl@Sss - Pgh.Pa - Us 1
25 pages
Postgres Topic
No ratings yet
Postgres Topic
116 pages
PostgreSQL Administration (DBA) - ToC 22112024
No ratings yet
PostgreSQL Administration (DBA) - ToC 22112024
12 pages
Experiment 9
No ratings yet
Experiment 9
8 pages
PostgreSQL As A NoSQL Database
100% (1)
PostgreSQL As A NoSQL Database
61 pages
12 Algorithms For System Design Interviews
No ratings yet
12 Algorithms For System Design Interviews
8 pages
Postgres The First Experience
No ratings yet
Postgres The First Experience
173 pages
PostgreSQL Internals Notes Compilation
100% (1)
PostgreSQL Internals Notes Compilation
18 pages
PostgreSQL: Let's Make PostgreSQL Multi-Threaded
No ratings yet
PostgreSQL: Let's Make PostgreSQL Multi-Threaded
1 page
PostgreSQL For Data Architects - Sample Chapter
No ratings yet
PostgreSQL For Data Architects - Sample Chapter
23 pages
Nosql For Postgresql: Best Practices
No ratings yet
Nosql For Postgresql: Best Practices
94 pages
Ansible For Windows By Examples
From Everand
Ansible For Windows By Examples
Berton
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Encore PCI TV Tuner ENLTV-FM3 / ENLTV-3
100% (3)
Encore PCI TV Tuner ENLTV-FM3 / ENLTV-3
32 pages
PythonProgrammingTutorial Day01
No ratings yet
PythonProgrammingTutorial Day01
6 pages
Computer Graphics Lab Manual 2019-20 (1) .
No ratings yet
Computer Graphics Lab Manual 2019-20 (1) .
65 pages
General Capabilities Digital Literacy Glossary v9
No ratings yet
General Capabilities Digital Literacy Glossary v9
6 pages
ETABS Practical 1
No ratings yet
ETABS Practical 1
2 pages
Deep Convolutional Neural Networks: Structure, Feature Extraction and Training
No ratings yet
Deep Convolutional Neural Networks: Structure, Feature Extraction and Training
8 pages
OpenShift - Container - Platform 4.6 OpenShift - Virtualization en US
No ratings yet
OpenShift - Container - Platform 4.6 OpenShift - Virtualization en US
257 pages
Tech Mahindra Unix-1 History of UNIX
No ratings yet
Tech Mahindra Unix-1 History of UNIX
11 pages
Unreal Engine 5.5 Release Notes
No ratings yet
Unreal Engine 5.5 Release Notes
288 pages
done-TLE ICTCSS10 Q1 CLAS5 Performing-BIOS-Configuration
No ratings yet
done-TLE ICTCSS10 Q1 CLAS5 Performing-BIOS-Configuration
20 pages
Business Technical Impact On SWIFT Interfaces - ISO Readiness - March 2022 (00000002)
No ratings yet
Business Technical Impact On SWIFT Interfaces - ISO Readiness - March 2022 (00000002)
60 pages
101 Pro Tips and Tricks
100% (3)
101 Pro Tips and Tricks
18 pages
Iq-R General Catalog
No ratings yet
Iq-R General Catalog
160 pages
MorphoManager Universal BioBridge Integration - AccessIt!
No ratings yet
MorphoManager Universal BioBridge Integration - AccessIt!
6 pages
ATCOSIM Corpus Speakers Profile
No ratings yet
ATCOSIM Corpus Speakers Profile
2 pages
CSS 12 Module 1
No ratings yet
CSS 12 Module 1
3 pages
Project Review - Final B187
No ratings yet
Project Review - Final B187
15 pages
5 Bit Plane Slicing
No ratings yet
5 Bit Plane Slicing
6 pages
Previewpdf
No ratings yet
Previewpdf
35 pages
Professional Cloud DevOps Engineer V12.75
No ratings yet
Professional Cloud DevOps Engineer V12.75
18 pages
IG Playbook EN
No ratings yet
IG Playbook EN
32 pages
Fuji Frontier550 570 590 en
No ratings yet
Fuji Frontier550 570 590 en
4 pages
Turbio
No ratings yet
Turbio
2 pages
2PAA102573-600 A en System 800xa 6.0 Device Management Device Library Wizard
No ratings yet
2PAA102573-600 A en System 800xa 6.0 Device Management Device Library Wizard
98 pages
Wyse p25 - Users Guide - en Us
No ratings yet
Wyse p25 - Users Guide - en Us
12 pages
Brief Instruction Manual
No ratings yet
Brief Instruction Manual
4 pages

Postgresql Benchmark

Uploaded by

Postgresql Benchmark

Uploaded by

Reaching 1 billion rows / second

I Processing 1 billion rows / second

I PL/Proxy is a stored procedure language to scale out to shards.

I Doing scaling on the app level

I We tried to keep that simple:

SELECT grp, count(data)

I The main questions are:

I Parallel queries have been added in PostgreSQL 9.6

I Usually the number of processes per scan is derived from the

test=# SHOW min_parallel_relation_size ;

I One process is added if the tablesize triples

I We could never have enough data to make PostgreSQL go for

test=# ALTER TABLE t_demo

I How well does PostgreSQL scale on a single box?

I We used a 16 core box here

I On a single Google VM we could reach close to 40 million rows

I We want to shard data to as many nodes as needed

explain SELECT grp, COUNT(data) FROM t_demo GROUP BY 1;

I Throughput doubles as long as partial results are small

I postgres_fdw uses cursors on the remote side

I We have to make sure that all remote database servers work at

I PostgreSQL could not be changed without substantial work

node=# SELECT grp, count(data) FROM t_demo GROUP BY 1;

I We used 32 boxes (16 cores) on Google

I A lot more parallelism will be available

I ROLLUP / CUBE / GROUPING SETS has to wait for 10.0

I JIT will allow us to do the same thing with fewer CPUs

I So far only one “stage” of execution is used

I Column stores will bring a real boost

Cybertec Schönig & Schönig GmbH

Follow us on Twitter: @PostgresSupport

You might also like