Reaching 1 billion rows / second
Hans-Jürgen Schönig
www.postgresql-support.de
Hans-Jürgen Schönig
www.postgresql-support.de
Reaching a milestone
Hans-Jürgen Schönig
www.postgresql-support.de
Goal
I Processing 1 billion rows / second
I Show a path to even more scalability
I Silence the “scalability” discussion at some point
I See where the limitations are
I Do it WITHOUT commercial tools, warehousing tools, etc.
Hans-Jürgen Schönig
www.postgresql-support.de
Traditional PostgreSQL limitations
I Traditionally:
I We could only use 1 CPU core per query
I Scaling was possible by running more than one query at a time
I Usually hard to do
Hans-Jürgen Schönig
www.postgresql-support.de
PL/Proxy: The traditional way to do it
I PL/Proxy is a stored procedure language to scale out to shards.
I Worked nicely for OLTP workloads
I Somewhat usable for analytics
I A LOT of manual work
Hans-Jürgen Schönig
www.postgresql-support.de
On the app level
I Doing scaling on the app level
I A lot of manual work
I Not cool enough
I Needs a lot of development
I Why use a database if work is still manual?
I Solving things on the app level is certainly not an option
Hans-Jürgen Schönig
www.postgresql-support.de
The 1 billion row challenge
Hans-Jürgen Schönig
www.postgresql-support.de
Coming up with a data structure
I We tried to keep that simple:
node=# \d t_demo
Table "public.t_demo"
Column | Type | Collation | Nullable |
--------+---------+-----------+----------+
id | serial | | not null |
grp | integer | | |
data | real | | |
Indexes:
"idx_id" btree (id)
Hans-Jürgen Schönig
www.postgresql-support.de
The query
SELECT grp, count(data)
FROM t_demo
GROUP BY 1;
Hans-Jürgen Schönig
www.postgresql-support.de
Single server performance
Hans-Jürgen Schönig
www.postgresql-support.de
Tweaking a simple server
I The main questions are:
I How much can we expect from a single server?
I How well does it scale with many CPUs?
I How far can we get?
Hans-Jürgen Schönig
www.postgresql-support.de
PostgreSQL parallelism
I Parallel queries have been added in PostgreSQL 9.6
I It can do a lot
I It is by far not feature complete yet
I Number of workers will be determined by the PostgreSQL
optimizer
I We do not want that
I We want ALL cores to be at work
Hans-Jürgen Schönig
www.postgresql-support.de
Adjusting CPU core usage
I Usually the number of processes per scan is derived from the
size of the table
test=# SHOW min_parallel_relation_size ;
min_parallel_relation_size
----------------------------
8MB
(1 row)
I One process is added if the tablesize triples
Hans-Jürgen Schönig
www.postgresql-support.de
Overruling the planner
I We could never have enough data to make PostgreSQL go for
16 or 32 cores.
I Even if the value is set to a couple of kilobytes.
I The default mechanism can be overruled:
test=# ALTER TABLE t_demo
SET (parallel_workers = 32);
ALTER TABLE
Hans-Jürgen Schönig
www.postgresql-support.de
Making full use of cores
I How well does PostgreSQL scale on a single box?
I For the next test we assume that I/O is not an issue
I If I/O does not keep up, CPU does not make a difference
I Make sure that data can be read fast enough.
I Observation: 1 SSD might not be enough to feed a modern
Intel chip
Hans-Jürgen Schönig
www.postgresql-support.de
Single node scalability (1)
Hans-Jürgen Schönig
www.postgresql-support.de
{
Single node scalability (2)
I We used a 16 core box here
I As you can see, the query scales up nicely
I Beyond 16 cores hyperthreading kicks in
I We managed to gain around 18%
Hans-Jürgen Schönig
www.postgresql-support.de
Single node scalability (3)
I On a single Google VM we could reach close to 40 million rows
/ second
I For many workloads this is already more than enough
I Rows / sec will of course depend on type of query
Hans-Jürgen Schönig
www.postgresql-support.de
Moving on to many nodes
Hans-Jürgen Schönig
www.postgresql-support.de
The basic system architecture (1)
I We want to shard data to as many nodes as needed
I For the demo: Place 100 million rows on each node
I We do so to eliminate the I/O bottleneck
I In case I/O happens we can always compensate using more
servers
I Use parallel queries on each shard
Hans-Jürgen Schönig
www.postgresql-support.de
Testing with two nodes (1)
explain SELECT grp, COUNT(data) FROM t_demo GROUP BY 1;
Finalize HashAggregate
Group Key: t_demo.grp
-> Append
-> Foreign Scan (partial aggregate)
-> Foreign Scan (partial aggregate)
-> Partial HashAggregate
Group Key: t_demo.grp
-> Seq Scan on t_demo
Hans-Jürgen Schönig
www.postgresql-support.de
Testing with two nodes (2)
I Throughput doubles as long as partial results are small
I Planner pushes down stuff nicely
I Linear increases are necessary to scale to 1 billion rows
Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (1)
I postgres_fdw uses cursors on the remote side
I cursor_tuple_fraction has to be set to 1 to improve the
planning process
I set fetch_size to a large value
I That is the easy part
Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (2)
I We have to make sure that all remote database servers work at
the same time
I This requires “parallel append and async fetching”
I All queries are sent to the many nodes in parallel
I Data can be fetched in parallel
I We cannot afford to wait for each nodes to complete if we want
to scale in a linear way
Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (3)
I PostgreSQL could not be changed without substantial work
being done recently
I Traditionally joins had to be done BEFORE aggregation
I This is a showstopper for distributed aggregation because all the
data has to be fetched from the remote host before aggregation
I Without this change the test is not possible.
Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (4)
I Easy tasks:
I Aggregates have to be implemented to handle partial results
coming from shards
I Code is simple and available as extension
I For the test we implemented a handful of aggregates
Hans-Jürgen Schönig
www.postgresql-support.de
Parallel execution on shards is now possible
I Dissect aggregation
I Send partial queries to shards in parallel
I Perform parallel execution on shards
I Add up data on main node
Hans-Jürgen Schönig
www.postgresql-support.de
Final results
node=# SELECT grp, count(data) FROM t_demo GROUP BY 1;
grp | count
-----+-----------
0 | 320000000
1 | 320000000
...
9 | 320000000
(10 rows)
Planning time: 0.955 ms
Execution time: 2910.367 ms
Hans-Jürgen Schönig
www.postgresql-support.de
Hardware used
I We used 32 boxes (16 cores) on Google
I Data was in memory
I Adding more servers is EASY
I Price tag: The staggering amount of EUR 28.14 (for
development, testing and running the test)
Hans-Jürgen Schönig
www.postgresql-support.de
A look at PostgreSQL 10.0
I A lot more parallelism will be available
I Many executor nodes will enjoy parallel execution
I PostgreSQL 10.0 will be a giant leap forward
Hans-Jürgen Schönig
www.postgresql-support.de
More complex plans
I ROLLUP / CUBE / GROUPING SETS has to wait for 10.0
I A patch for that has been seen on the mailing list
I Be careful with complex intermediate results
I Avoid sorting of large amounts of data
I Some things are just harder on large data sets
Hans-Jürgen Schönig
www.postgresql-support.de
Future ideas: JIT compilation
I JIT will allow us to do the same thing with fewer CPUs
I Will significantly improve throughput
I Some project teams are working on that
Hans-Jürgen Schönig
www.postgresql-support.de
Future ideas: “Deeper execution”
I So far only one “stage” of execution is used
I Nothing stops us from building “trees” of servers
I More complex operations can be done
I Infrastructure is in place
Hans-Jürgen Schönig
www.postgresql-support.de
Future things: Column stores
I Column stores will bring a real boost
I Vectorization can speed up things drastically
I Many commercial vendors already do that
I GPUs may also be useful
Hans-Jürgen Schönig
www.postgresql-support.de
Finally
I Any questions?
Hans-Jürgen Schönig
www.postgresql-support.de
Contact us
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
www.postgresql-support.de
Follow us on Twitter: @PostgresSupport
Hans-Jürgen Schönig
www.postgresql-support.de