2013-08-06 - Postgres Tips and Tricks

Postgres Tips and Tricks
By Lloyd Albin
5/1/2013, 6/11/2013
Why does my query run slow
SELECT max(l1.upload_timestamp) AS LastUploadTimestamp,
lcase(l1.network) as network,
l1.filename, count(l1.file_id) AS NumberUploads
FROM lab_upload_log AS l1
GROUP BY lcase(l1.network), l1.filename
DISTINCT and GROUP BY give the same results, but internally

DISTINCT is faster.
SELECT DISTINCT max(l1.upload_timestamp) AS LastUploadTimestamp,

SELECT max(l1.upload_timestamp) AS LastUploadTimestamp,
CREATE INDEX lab_upload_log_idx ON lab_data_ops.lab_upload_log

USING btree (network, filename);
Create a functional index for the field
CREATE INDEX lab_upload_log_idx2 ON lab_data_ops.lab_upload_log

USING btree (lower(network), filename);
Create an index for the field if you are going to use the field in a WHERE clause
or a GROUP BY clause.
SELECT ucase(final_lab_upload_log.lab) AS lab,
count(final_lab_upload_log.filename) AS "# Unique Files",
max(x.NumUploads) AS "Total # Uploads“
FROM final_lab_upload_log
LEFT JOIN (
SELECT count(lab_upload_log.filename) AS NumUploads,
ucase(lab_upload_log.lab) AS lab
FROM lab_upload_log
WHERE lcase(lab_upload_log.network)='vtn‘
AND lab_upload_log.upload_timestamp > (curdate() - 30)
GROUP BY ucase(lab_upload_log.lab)) AS x
ON x.lab = final_lab_upload_log.lab
WHERE final_lab_upload_log.LastUploadTimestamp > (curdate() - 30)
GROUP BY ucase(final_lab_upload_log.lab)
Remove doubling of the ucase. The doubling stops the index from being
used.
SELECT ucase(final_lab_upload_log.lab) AS lab,

…
GROUP BY ucase(final_lab_upload_log.lab)
How much will this really benefit me?
The previous queries when used in production run in:
5 seconds (individually)
But in reality, we have three of these types of queries and so they take over:
15 seconds combined
After the changes on the three previous slides:
30-100 milliseconds per each individual query

<200 milliseconds for the 3 queries together
How to generate this table
Lab Upload Log
Today Last 7 Days Last 30 Days
Lab # Unique Total # Files # Unique Total # Files # Unique Total # Files
Files Uploaded Files Uploaded Files Uploaded
FH 2 2 55 56
HV 198 202
SA 10 10 54 54
Total 0 0 12 12 307 312
The better way to do this would be to write a single query to

generate
This this table.by 3 separate queries and then merged together via javascript.
table is generated
SELECT a.lab, c.u_files, c.files, b.u_files, b.files, a.u_files, a.files

FROM a LEFT JOIN b ON a.lab = b.lab LEFT JOIN c ON a.lab = b.lab
Reduce Code
Don’t write the same query in two different places. This
causes you to have to maintain twice as much code and
makes it so that when people are updating the code, that
they may only catch one instance of the code.
A good example of this problem is when you create a user

defined query in atlas on a folder and then create a
module with the same query and you attach the module
to other folders. The user defined query should be
removed and the module attached to that folder.
How much is the savings worth
Where did my NULL’s go?
v t
SELECT * FROM table
Source Table WHERE v != 398 100 100
v t 500
100 100 v t
SELECT * FROM table
398 398 100 100
WHERE t != ‘398’
500 null
null
v t
SELECT * FROM table
100 100
WHERE v != 398
OR v IS NULL 500
null
When you do comparisons in Postgres NULL’s are automatically

removed unless you specifically ask for them.
The normal way to get current job
SELECT d.* DISTINCT ON with ORDER BY and
FROM ( GROUP BY with MIN or MAX can give
SELECT b.emplid, the same results, but internally
b.empl_rcd, DISTINCT ON is faster and is easier to
b.effdt,
max(b.effseq) AS effseq
read the code.
FROM (
SELECT ps_job.emplid,
ps_job.empl_rcd,
max(ps_job.effdt) AS effdt
FROM finance_feeds.ps_job
GROUP BY ps_job.emplid,
ps_job.empl_rcd
)a
LEFT JOIN finance_feeds.ps_job b ON a.emplid::text =
b.emplid::text AND a.empl_rcd = b.empl_rcd AND a.effdt = b.effdt
GROUP BY b.emplid,
b.empl_rcd,
b.effdt
)c
LEFT JOIN finance_feeds.ps_job d ON c.emplid::text = d.emplid::text AND
c.empl_rcd = d.empl_rcd AND c.effdt = d.effdt AND c.effseq = d.effseq
WHERE d.empl_status::text <> 'T' ::text;
-- 833 rows returned (execution time: 32 ms; total time: 93 ms) -- 14,719 rows in finance_feeds.ps_job
Fast way to get current job
SELECT * FROM
(
SELECT DISTINCT ON (emplid, empl_rcd) *
FROM finance_feeds.ps_job
ORDER BY emplid, empl_rcd, effdt DESC, effseq DESC
)a
WHERE a.empl_status::text <> 'T' ::text;
-- 833 rows returned (execution time: 31 ms; total time: 31 ms)

-- 14,719 rows in finance_feeds.ps_job
Much simpler code using DISTINCT ON with ORDER BY instead of

GROUP BY with MAX.
Query Plan Difference
Original Query
LabKey does not
support DISTINCT
ON at this time.
https://www.labkey.o
rg/wiki/home/Docu
mentation/page.view
DISTINCT ON Query ?name=labkeySql
Query plan is almost ½ the speed and only scans the table once instead of three times.
How to compare two queries
There is a great command called EXCEPT. This will compare the results of two queries
and tell you what is in the first query that is not in the second query.
SELECT * FROM view_a

EXCEPT
SELECT * FROM view_b
SELECT * FROM (SELECT * FROM view_a)) a

EXCEPT
SELECT * FROM (SELECT * FROM view_b)) b
This will show you all lines in view_a that are not in view_b. To find out all lines
on view_b that are not in view_a, reverse the two queries. If you are comparing
two complex query statements, wrap them in a simple SELECT statement so
that the EXCEPT will not get confused.
There is a great command called EXCEPT. This will compare the results of two queries
and tell you what is in the first query that is not in the second query.
SELECT * FROM (
SELECT 1
UNION
SELECT 2
)a
EXCEPT
SELECT * FROM (
SELECT 1
UNION
SELECT 3 ?Column?
)b
2
Normally you will also want to reverse the two queries so that you can check the results going
the other direction. This way you have two sets of results, what is extra in each query.
SELECT * FROM (
SELECT 1
UNION
SELECT 3
)b
EXCEPT
SELECT * FROM (
SELECT 1
UNION ?Column?
SELECT 2
3
)a
If you want only the records that match, then you want to use INTERSECT.
SELECT * FROM (
SELECT 1
UNION
SELECT 3
)b
INTERSECT
SELECT * FROM (
SELECT 1
UNION
SELECT 2 ?Column?
)a
1
Finding the slow line in your query
EXPLAIN will show you the query plan, and this by itself is helpful, but even more
helpful is the EXPLAIN (ANALYZE, BUFFERS) which compares the query plan to what
actually happened when the query was run. Also use http://explain.depesz.com/
Nested Loop Left Join (cost=305819.49..449850.69 rows=1 width=572) (actual

time=1871.328..9512784.289 rows=10983 loops=1)
Filter: ((NOT (hashed SubPlan 42)) AND ((SubPlan 38) IS NOT NULL))
Rows Removed by Filter: 74
Buffers: shared hit=2412568803 read=99
1) Estimate rows=1 vers Actual rows=10983. When you have big difference
between these numbers, this is a sign of a problem. This can be caused by
not having enough statistics, not having an index, etc.
2) Actual time=1871.328..9912784.289. This means that this row started 1.87
seconds into the query and took the difference of the two times, 2.64
hours, to complete
Number of months between dates
date_trunc(‘month’, date) -- First day of the month
date + ‘1 day’::interval -- Converts last day of the month to the first day of next month
age(date, date) -- The difference between two timestamps as an interval
date_part(‘year’, date/interval) -- returns only the year portion of the date/interval
SELECT
(date_part('year', age(max(b.earnenddate)::timestamp + interval '1 day',
date_trunc('month',min(b.earnenddate)::date)::timestamp))*12 +
date_part('month', age(max(b.earnenddate)::timestamp + interval '1 day',
date_trunc('month',min(b.earnenddate)::date)::timestamp)))::integer
AS elapsed_months
FROM (
SELECT '06/30/2012' AS earnenddate
UNION
SELECT '12/31/2013' AS earnenddate
) b; elapsed_months
19
Creating a Table from a View
CREATE TABLE schema.table AS
SELECT * FROM schema.view;
This allows you to create a table without having to look up all the field
names and types to first generate a table and then fill it with the results
of the view.
SELECT * INTO schema.table

FROM schema.view;
The PostgreSQL usage of SELECT INTO to represent table creation is

historical. It is best to use CREATE TABLE AS for this purpose in new
code.
Update a Table from a View
BEGIN;
TRUNCATE schema.table;
INSERT INTO schema.table SELECT * FROM schema.view;
COMMIT;
This allows you to take the results of a view and append them to an
existing table.
You may wish to TRUNCATE schema.table before adding the new data.
What order is my data in?
When you don’t use an ORDER BY clause, your data is in physical order of insert and
update.
CREATE TEMP TABLE test (

key test
key SERIAL,
test BOOLEAN 1 f
); 2 f
3 f
INSERT INTO test VALUES (1,FALSE);
INSERT INTO test VALUES (2,FALSE);
INSERT INTO test VALUES (3,FALSE); deleted key test
1 f
SELECT * FROM test; 2 f
3 f
What order is my data in?
UPDATE test SET test = TRUE WHERE key = 2;
deleted key test

1 f
X 2 f
3 f
2 t
SELECT * FROM test;
key test
1 f
3 f
2 t
ORDER BY field
Normally you use ORDER BY field_name, but you can also use ORDER BY field_number.
I have found this to sometimes be useful when unioning one or more sets of data
together.
SELECT 'test5' AS test1

test1
UNION ALL
test2
SELECT 'test2' AS test2
ORDER BY 1 test5
Notes:
ORDER BY test1 will work.
ORDER BY test2 will not work.
UNION ALL gives you all rows from each SELECT and runs much faster.
UNION only gives you DISTINCT rows between the two tables and ordered
TRUNCATE vers DELETE
TRUNCATE is normally the best way to go because it removes all the data within the
table(s) quickly and by specifying more than one table, deals automatically with
foreign key dependencies. Delete can take a long time depending on the foreign key
dependencies, etc.
TRUNCATE is not MVCC-save, so after truncation, the table will appear empty to all
concurrent transactions, even if they are using a snapshot taken before the truncation
occurred. DELETE does not have this issue.
RESTART IDENTITY
Automatically restart sequences owned by columns of the truncated table(s).
test1
test2
test5
pg_stat_activity 9.1-
SELECT * FROM pg_stat_activity;
datid datname procpid usesysid username application_name

487377201 main 18133 10 postgres EMS SQL Manager for
PostgreSQL
client_addr client_port backend_start xact_start query_start
10.6.106.202 64241 2013-06-24 2013-06-24 2013-06-24
15:10:13.139305- 15:22:32.100781- 15:22:32.100781-
07 07 07
waiting current_query
False SELECT * FROM pg_stat_activity
This will show you what queries that you currently have running on a server. As
user Postgres, you will see all queries running on a server. If there is no query
running, you will see <IDLE>.
Postgres Tips & Tricks
POSTGRES 9.2+
pg_stat_activity 9.2+
SELECT * FROM pg_stat_activity;
datid datname procpid usesysid username application_name

487377201 main 18133 10 postgres EMS SQL Manager for
PostgreSQL
client_addr client_port backend_start xact_start query_start
10.6.106.202 64241 2013-06-24 2013-06-24 2013-06-24
15:10:13.139305- 15:22:32.100781- 15:22:32.100781-
07 07 07
waiting current_query
False <idle>: SELECT * FROM pg_stat_activity
This will show you what queries that you currently have running on a server. As
user Postgres, you will see all queries running on a server. They will also say
<idle> or <active>. The <idle> connections show you the last query executed.
Killing your own backend’s
SELECT pg_cancel_backend(pid);
This cancels your current command and leaves your connection open
for your next command. If you are in the middle of a transaction, the
transaction will be aborted once you try and COMMIT your transaction.
It will also complain about every line failing until you try and COMMIT.
SELECT pg_terminate_backend(pid);
This cancels your current command and closes your connection. If you
are in the middle of a transaction, the transaction will be aborted
instantly.
If all the backends you see are gone and you are still getting the open
connections when trying to drop your database, contact a dba.
Postgres Tips & Tricks
POSTGRES 9.3+
Materialized Views
CREATE MATERIALIZED VIEW schema.materialized_view AS
SELECT * FROM schema.table;
SELECT * FROM schema.materialized_view;
REFRESH MATERIALIZED VIEW schema.materialized_view;
This is basically a simple melding of a TABLE and VIEW into a single

entity. When you create the MATERIALIZED VIEW is populates the
underlying TABLE. Every time you use the MATERIALIZED VIEW it
returns you the data in the TABLE. To update the data in the TABLE, you
need to run the REFRESH MATERIALIZED VIEW command.
Foreign Data Tables
CREATE EXTENSION postgres_fdw;
CREATE SERVER db_main FOREIGN DATA WRAPPER postgres_fdw

OPTIONS (host 'db.scharp.org', dbname 'main', port '5432');
CREATE USER MAPPING FOR postgres SERVER db_main

OPTIONS (user 'webservices', password 'password');
CREATE FOREIGN TABLE ist.webservices_token (

"time" TIMESTAMP WITHOUT TIME ZONE DEFAULT now() NOT NULL,
token TEXT NOT NULL
The Postgres DBA’s should take care of
)
the EXTENSION, SERVER and USER
SERVER db_main; MAPPING’s. The developer can then
create the FOREIGN TABLES. For each
SELECT * FROM ist.webservices_token; user that wants to use the FOREIGN
TABLE, there must be a USER MAPPING
created by a DBA.

2013-08-06 - Postgres Tips and Tricks

Uploaded by

Copyright:

Available Formats

2013-08-06 - Postgres Tips and Tricks

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2013-08-06 - Postgres Tips and Tricks

Uploaded by

Copyright:

Available Formats

Postgres Tips and Tricks

DISTINCT and GROUP BY give the same results, but internally

SELECT DISTINCT max(l1.upload_timestamp) AS LastUploadTimestamp,

CREATE INDEX lab_upload_log_idx ON lab_data_ops.lab_upload_log

CREATE INDEX lab_upload_log_idx2 ON lab_data_ops.lab_upload_log

SELECT ucase(final_lab_upload_log.lab) AS lab,

After the changes on the three previous slides:

30-100 milliseconds per each individual query

Total 0 0 12 12 307 312

The better way to do this would be to write a single query to

SELECT a.lab, c.u_files, c.files, b.u_files, b.files, a.u_files, a.files

A good example of this problem is when you create a user

When you do comparisons in Postgres NULL’s are automatically

-- 833 rows returned (execution time: 31 ms; total time: 31 ms)

Much simpler code using DISTINCT ON with ORDER BY instead of

SELECT * FROM view_a

SELECT * FROM (SELECT * FROM view_a)) a

Nested Loop Left Join (cost=305819.49..449850.69 rows=1 width=572) (actual

SELECT * INTO schema.table

The PostgreSQL usage of SELECT INTO to represent table creation is

INSERT INTO schema.table SELECT * FROM schema.view;

CREATE TEMP TABLE test (

deleted key test

SELECT * FROM test;

SELECT 'test5' AS test1

datid datname procpid usesysid username application_name

datid datname procpid usesysid username application_name

SELECT * FROM schema.materialized_view;

REFRESH MATERIALIZED VIEW schema.materialized_view;

This is basically a simple melding of a TABLE and VIEW into a single

CREATE SERVER db_main FOREIGN DATA WRAPPER postgres_fdw

CREATE USER MAPPING FOR postgres SERVER db_main

CREATE FOREIGN TABLE ist.webservices_token (

You might also like