Why hack on Postgres?
Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Introduction to Hacking PostgreSQL
With lots of code review!
Gavin Sherry
gavin@alcove.com.au
Alcove Systems Engineering
January 14, 2007
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 1 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Outline
1 Why hack on Postgres?
2 Development Environment
3 PostgreSQL Architecture
4 Conventions, and all that
5 Submitting a patch
6 Some real code
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 2 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Databases are cool
Everyone uses them
There are only a small number – begging for diversification
Many open problems to be solved
Example: Storage capacity doubles each year, bandwidth remains
constant and latency goes up!
You can solve lots of the cool problems
Database guys get the babes (or guys (or what ever))
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 3 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Everyone wants database hackers
The number of hard core database kernel developers in the world
is low
Demand is high
Employers are cool: Google, MSR, CERN, NASA, JPL, Los
Alamos, heaps of startups. . .
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 4 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
So, why PostgreSQL?
Most database technologies come from “System/R” or
“Berkeley/INGRES/POSTGRES”
Many high end databases are based on Postgres
Netezza
ParACCEL
Greenplum DB
Streaming technologies like Telegraph CQ and StreamBase
...
Other database technologies came out of PostgreSQL
Informix
Illustra
It’s BSD, you can do what you want
Hacking on PostgreSQL is easier than hacking on any other large,
mature source base I’ve looked at
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 5 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Outline
1 Why hack on Postgres?
2 Development Environment
3 PostgreSQL Architecture
4 Conventions, and all that
5 Submitting a patch
6 Some real code
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 6 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Development Environment
The basics
Most of the Postgres developers
use Unix
You’ll need to know C
C is easy
Code reading is hard, but worth
getting good at
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 7 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Development Tools
The Basics
gcc, Bison, Flex, CVS, autotools,
gdb
Configure flags:
--enable-depend
--enable-debug
--enable-cassert
CFLAGS=-O0
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 8 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Development tools cont.
tags or cscope are essential
“What is the definition of this function/type?”
“What are all the call-sites of this function?”
src/tools/make [ce]tags
ccache and distcc are useful, especially on slower machines
valgrind can be useful for debugging memory errors
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 9 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Text Editor
Coding
Use a proper text editor
Follow the style of the surrounding
code (Basically, Allman BSD)
If you don’t follow the style, it will
just slow down the review
process
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 10 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Outline
1 Why hack on Postgres?
2 Development Environment
3 PostgreSQL Architecture
4 Conventions, and all that
5 Submitting a patch
6 Some real code
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 11 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
PostgreSQL Architecture
Five main components:
1 The parser - parse the query string
2 The rewriter - apply rewrite rules
3 The optimizer - determine an efficient query plan
4 The executor - execute a query plan
5 The utility processor - process DDL like CREATE TABLE
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 12 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
PostgreSQL Architecture
Five main components:
1 The parser - parse the query string
2 The rewriter - apply rewrite rules
3 The optimizer - determine an efficient query plan
4 The executor - execute a query plan
5 The utility processor - process DDL like CREATE TABLE
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 12 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
PostgreSQL Architecture
Five main components:
1 The parser - parse the query string
2 The rewriter - apply rewrite rules
3 The optimizer - determine an efficient query plan
4 The executor - execute a query plan
5 The utility processor - process DDL like CREATE TABLE
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 12 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
PostgreSQL Architecture
Five main components:
1 The parser - parse the query string
2 The rewriter - apply rewrite rules
3 The optimizer - determine an efficient query plan
4 The executor - execute a query plan
5 The utility processor - process DDL like CREATE TABLE
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 12 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
PostgreSQL Architecture
Five main components:
1 The parser - parse the query string
2 The rewriter - apply rewrite rules
3 The optimizer - determine an efficient query plan
4 The executor - execute a query plan
5 The utility processor - process DDL like CREATE TABLE
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 12 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Architecture Diagram
Postgres backend
PostgresMain()
PARSE:
Parse query string
pg_parse_query()
ANALYZE:
Semantic analysis of query, transform to Query node
parse_analyze()
REWRITE: UTILITY PROCESSOR:
Apply rewrite rules Execute DDL
pg_rewrite_queries() PortalRun() -> ProcessUtility()
PLAN:
Produce a query plan
pg_plan_queries()
EXECUTOR:
Execute DML
PortalRun() -> ExecutePlan()
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 13 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
The Parser
Lex and parse the query string submitted by the user
parser/gram.y has the guts; entry point is parser/parser.c
Produces a “raw parsetree”: a linked list of parse nodes
Parse nodes are defined in include/nodes/parsenodes.h
There is usually a simple mapping between grammar productions
and parse node structure
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 14 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Semantic Analysis
In the parser itself, only syntactic analysis is done
Next comes semantic analysis
parser/analyze.c and related code under parser/
Semantic analysis SELECT a, b, c FROM
Resolve column references t1, t2, t3 WHERE...
Verify that target schemas, tables and
columns exist
Check that the types used in expressions are
consistent
In general, check for errors that are
impossible or difficult to detect in the parser
itself
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 15 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Rewriter, Planner
The analysis phase produces a Query, which is the query’s parse
tree
The rewriter applies rewrite rules: view definitions and ordinary
rules. Input is a Query, output is zero or more Querys
The planner takes a Query and produces a Plan, which encodes
how the query ought to be executed
Only needed for “optimizable” statements (INSERT, DELETE,
SELECT, UPDATE)
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 16 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Executor, Utility Processor
DDL statements are “executed” via the utility processor, which
basically just calls the appropriate function for each different kind
of DDL statement
processUtility() in tcop/utility.c; the implementation of
the DDL statements is in commands/
Optimizeable statements are processed via the Executor: given a
Plan, it executes the plan and produces any resulting tuples
executor/; entry point is in execMain.c
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 17 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Outline
1 Why hack on Postgres?
2 Development Environment
3 PostgreSQL Architecture
4 Conventions, and all that
5 Submitting a patch
6 Some real code
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 18 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Common Idioms: Nodes
Postgres uses a very simple object system with support for single
inheritance. The root of the class hierarchy is Node:
typedef struct typedef struct typedef struct
{ { {
NodeTag type; NodeTag type; Parent parent;
} Node; int a_field; int b_field;
} Parent; } Child;
This relies on a C trick: you can treat a Child * like a Parent *
since their initial fields are the same
The first field of any Node is a NodeTag, which can be used to
determine a Node’s specific type at runtime
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 19 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Nodes, Cont.
Important things to keep in mind
Create a new Node: makeNode()
Run-time type testing via the IsA() macro
Test if two nodes are equal: equal()
Deep copy a node: copyObject()
Serialise a node to text: nodeToString()
Deserialise a node from text: stringToNode()
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 20 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Nodes, Cont.
Important things to keep in mind
Create a new Node: makeNode()
Run-time type testing via the IsA() macro
Test if two nodes are equal: equal()
Deep copy a node: copyObject()
Serialise a node to text: nodeToString()
Deserialise a node from text: stringToNode()
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 20 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Nodes, Cont.
Important things to keep in mind
Create a new Node: makeNode()
Run-time type testing via the IsA() macro
Test if two nodes are equal: equal()
Deep copy a node: copyObject()
Serialise a node to text: nodeToString()
Deserialise a node from text: stringToNode()
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 20 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Nodes, Cont.
Important things to keep in mind
Create a new Node: makeNode()
Run-time type testing via the IsA() macro
Test if two nodes are equal: equal()
Deep copy a node: copyObject()
Serialise a node to text: nodeToString()
Deserialise a node from text: stringToNode()
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 20 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Nodes, Cont.
Important things to keep in mind
Create a new Node: makeNode()
Run-time type testing via the IsA() macro
Test if two nodes are equal: equal()
Deep copy a node: copyObject()
Serialise a node to text: nodeToString()
Deserialise a node from text: stringToNode()
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 20 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Nodes, Cont.
Important things to keep in mind
Create a new Node: makeNode()
Run-time type testing via the IsA() macro
Test if two nodes are equal: equal()
Deep copy a node: copyObject()
Serialise a node to text: nodeToString()
Deserialise a node from text: stringToNode()
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 20 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Nodes: Hints
When you modify a node or add a new node, remember to update
nodes/equalfuncs.c
nodes/copyfuncs.c
You may have to update nodes/outfuncs.c if your Node is to
be serialised/deserialised
Grepping for references to the node’s type can be helpful to make
sure you don’t forget to update anything
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 21 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Memory Management
Postgres uses hierarchical,
region-based memory management,
and it absolutely rocks
backend/util/mmgr
Memory is allocated via palloc()
All allocations occur inside a
memory context
Default memory context:
CurrentMemoryContext
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 22 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Memory Management, cont.
Allocations can be freed individually via pfree()
When a memory context is reset, all allocations in the context are
released
Resetting contexts is both faster and less error-prone than
releasing individual allocations
Contexts are arranged in a tree; deleting/resetting a context
deletes/resets its child contexts
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 23 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Memory Management Conventions
You should sometimes pfree() your allocations
If the context is short lived, who cares
Calculate just how much memory you might consume in a worst
case
Be aware of the memory allocation assumptions made by
functions you call
Memory leaks, per se, are rare in the backend
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 24 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Error Handling
Most errors reported by ereport() or elog()
ereport() is for user-visible errors, and allows more fields to be
specified (SQLSTATE, detail, hint, etc.)
Implemented via longjmp(3)
elog(ERROR) walks back up the stack
The top-level error handler aborts the current transaction and
resets the transaction’s memory context
Releases all resources held by the transaction, including files, locks,
memory, and buffer pins
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 25 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Error Handling, Cont.
Custom error handlers can be
defined via PG TRY() Example
Assert(BufferIsValid(buf);
Think about error handling! PG_TRY();
{
Never ignore the return values of ...
}
system calls PG_CATCH();
{
Should your function return an error ...
if (...)
code, or ereport() on failure? PG_RE_THROW();
else
Probably ereport() – think ...
about why. . . and why not }
PG_END_TRY();
Use assertions (Assert) liberally to
detect programming errors, but
never errors the user might
encounter
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 26 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Outline
1 Why hack on Postgres?
2 Development Environment
3 PostgreSQL Architecture
4 Conventions, and all that
5 Submitting a patch
6 Some real code
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 27 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Your First Patch
Step 1: Research and preparation
Is your new feature actually useful?
Does it just scratch your itch, or
Is it of general value?
Does it belong in the backend?
Does the SQL standard define
similar or equivalent functionality?
What about Oracle, DB2, . . . ?
Has someone suggested this idea in
the past?
Search the archives and TODO list
Most ideas are bad
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 28 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Sending A Proposal
Step 2: Send a proposal for your feature to pgsql-hackers
Patches that appear without prior discussion risk wasting your time
Discuss your proposed syntax and behavior
Consider corner cases, and how the feature will relate to other
parts of PostgreSQL (consistency is good)
Will any system catalog changes be needed?
Backward-compatibility?
Try to reach a consensus with -hackers on how the feature
ought to behave
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 29 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Implementation
Step 3: Implement the patch
A general strategy is to look at how similar parts of the system
function
Don’t copy and paste (IMHO)
Instead, read through similar sections of code to try to understand
how they work, and the APIs they are using
Implement (just) what you need, refactoring the existed APIs if
required
Ask for implementation advice as needed (-hackers or IRC)
Consider posting work-in-progress versions of the patch
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 30 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Thoroughness
Step 4: Update tools
For example, if you’ve modified DDL syntax, update psql’s tab
completion
Add pg dump support if necessary
Step 5: Testing
Make sure the existing regression tests don’t fail
No compiler warnings
Add new regression tests for the new feature
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 31 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Thoroughness
Step 4: Update tools
For example, if you’ve modified DDL syntax, update psql’s tab
completion
Add pg dump support if necessary
Step 5: Testing
Make sure the existing regression tests don’t fail
No compiler warnings
Add new regression tests for the new feature
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 31 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Submitting The Patch
Step 6: Update documentation
make check in doc/src/sgml does a syntax check that is
faster than building the whole SGML docs
Check documentation changes visually in a browser
Step 7: Submit the patch
Use context diff format: diff -c
Review every hunk of the patch
Is this hunk necessary? Are there errors?
Does it needlessly change whitespace or existing code?
Work with a code reviewer to make any necessary changes
If your patch falls through the cracks, be persistent
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 32 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Submitting The Patch
Step 6: Update documentation
make check in doc/src/sgml does a syntax check that is
faster than building the whole SGML docs
Check documentation changes visually in a browser
Step 7: Submit the patch
Use context diff format: diff -c
Review every hunk of the patch
Is this hunk necessary? Are there errors?
Does it needlessly change whitespace or existing code?
Work with a code reviewer to make any necessary changes
If your patch falls through the cracks, be persistent
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 32 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Outline
1 Why hack on Postgres?
2 Development Environment
3 PostgreSQL Architecture
4 Conventions, and all that
5 Submitting a patch
6 Some real code
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 33 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Time for some actual code
We’ll look at a patch I
“prepared earlier”
It’s against PostgreSQL CVS
HEAD
You can see it here:
http://alcove.com.au/when_clause.patch
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 34 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
WHEN Clause
We’ll be walking you through the implementation of the WHEN
clause for CREATE TRIGGER
You can see a patch at http://neilconway.org/talks/hacking/when_clause.patch
Defined by SQL:2003, implemented by Oracle and others
Optional clause; when the WHEN expression evaluates to false
(or NULL), the associated trigger is not fired
In the WHEN clause, OLD and NEW tuples can be referenced:
In UPDATE and DELETE triggers, OLD is the tuple being replaced
In UPDATE and INSERT triggers, NEW is the tuple being added
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 35 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
WHEN Clause Considerations
Syntax is easy: defined by SQL spec
Syntax
CREATE TRIGGER name { BEFORE | AFTER }
{ event [ OR . . . ] }
ON t a b l e [ FOR [ EACH ] { ROW | STATEMENT } ]
[ WHEN ( expr ) ]
EXECUTE PROCEDURE funcname ( arguments )
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 36 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
WHEN Clause Considerations, cont.
Behavioral questions:
Should we allow WHEN clause for statement-level triggers? (SQL
spec doesn’t specify)
What subset of SQL should we allow? Aggregate functions,
subqueries, . . . ?
No backward-compat concerns
Obviously needs to be in the backend
Useful for at least SQL-spec compliance
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 37 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Implementation Outline
1 Add support for the new syntax to the parser
2 Update the CREATE TRIGGER parsenode
3 Add support for WHEN clause to analysis phase
4 Add new field to pg trigger system catalog, containing the
WHEN clause
5 Modify implementation of CREATE TRIGGER to add the WHEN
clause to the new pg trigger row
6 Add support for the WHEN clause when firing triggers in the
executor (most of the difficulty is here)
7 Update pg dump/psql to support the WHEN clause
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 38 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Implementation Outline
1 Add support for the new syntax to the parser
2 Update the CREATE TRIGGER parsenode
3 Add support for WHEN clause to analysis phase
4 Add new field to pg trigger system catalog, containing the
WHEN clause
5 Modify implementation of CREATE TRIGGER to add the WHEN
clause to the new pg trigger row
6 Add support for the WHEN clause when firing triggers in the
executor (most of the difficulty is here)
7 Update pg dump/psql to support the WHEN clause
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 38 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Implementation Outline
1 Add support for the new syntax to the parser
2 Update the CREATE TRIGGER parsenode
3 Add support for WHEN clause to analysis phase
4 Add new field to pg trigger system catalog, containing the
WHEN clause
5 Modify implementation of CREATE TRIGGER to add the WHEN
clause to the new pg trigger row
6 Add support for the WHEN clause when firing triggers in the
executor (most of the difficulty is here)
7 Update pg dump/psql to support the WHEN clause
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 38 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Implementation Outline
1 Add support for the new syntax to the parser
2 Update the CREATE TRIGGER parsenode
3 Add support for WHEN clause to analysis phase
4 Add new field to pg trigger system catalog, containing the
WHEN clause
5 Modify implementation of CREATE TRIGGER to add the WHEN
clause to the new pg trigger row
6 Add support for the WHEN clause when firing triggers in the
executor (most of the difficulty is here)
7 Update pg dump/psql to support the WHEN clause
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 38 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Implementation Outline
1 Add support for the new syntax to the parser
2 Update the CREATE TRIGGER parsenode
3 Add support for WHEN clause to analysis phase
4 Add new field to pg trigger system catalog, containing the
WHEN clause
5 Modify implementation of CREATE TRIGGER to add the WHEN
clause to the new pg trigger row
6 Add support for the WHEN clause when firing triggers in the
executor (most of the difficulty is here)
7 Update pg dump/psql to support the WHEN clause
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 38 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Implementation Outline
1 Add support for the new syntax to the parser
2 Update the CREATE TRIGGER parsenode
3 Add support for WHEN clause to analysis phase
4 Add new field to pg trigger system catalog, containing the
WHEN clause
5 Modify implementation of CREATE TRIGGER to add the WHEN
clause to the new pg trigger row
6 Add support for the WHEN clause when firing triggers in the
executor (most of the difficulty is here)
7 Update pg dump/psql to support the WHEN clause
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 38 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Implementation Outline
1 Add support for the new syntax to the parser
2 Update the CREATE TRIGGER parsenode
3 Add support for WHEN clause to analysis phase
4 Add new field to pg trigger system catalog, containing the
WHEN clause
5 Modify implementation of CREATE TRIGGER to add the WHEN
clause to the new pg trigger row
6 Add support for the WHEN clause when firing triggers in the
executor (most of the difficulty is here)
7 Update pg dump/psql to support the WHEN clause
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 38 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Implementation outline cont.
8 Lose much sleep
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 39 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Parser Changes
Trivial, as it turns out — see page 2
Grammar code
1 CreateTrigStmt :
2 CREATE TRIGGER name T r i g g e r A c t i o n T i m e T r i g g e r E v e n t s ON
3 q u a l i f i e d n a m e TriggerForSpec TriggerWhen EXECUTE PROCEDURE
4 func name ’ ( ’ TriggerFuncArgs ’ ) ’
5 {
6 C r e a t e T r i g S t m t ∗ n = makeNode ( C r e a t e T r i g S t m t ) ;
7 /∗ ... ∗/
8 n−>when = $10 ;
9 $$ = ( Node ∗ ) n ;
10 }
11
12 TriggerWhen :
13 WHEN ’ ( ’ a e x p r ’) ’ { $$ = $3 ; }
14 | / ∗EMPTY∗ / { $$ = NULL ; }
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 40 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Parsenode Changes
The definition of the CreateTrigStmt parse node is closely
derived from the syntax of CREATE TRIGGER
Add a new field to the struct to stash the WHEN clause
Be sure to update equalfuncs.c and copyfuncs.c
See pages 3 and 4 of handout
Next: update the analysis phase. How do we parse WHEN clauses
like OLD.a <> NEW.a?
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 41 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Expressions In Postgres
The WHEN clause is a boolean expression
An expression is a tree of Expr nodes
There are Expr subclasses for the different kinds of expressions:
function calls, operator invocations, constants, variables, etc.
ExecEvalExpr() evaluates an expression by recursing through
this tree. For example:
A function is evaluated by first evaluating its arguments, then calling
the function itself
A constant value is trivial to evaluate
See pages 4 through 6 of handout
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 42 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Variable Expressions
In an expression like t.a > 10, t.a is a range variable, colloquially
known as a table column
Represented by the Var expression type
How are range variables implemented?
Var.varno identifies the variable’s table (t above)
Var.varattno is the attribute number of the variable’s column
varno is an index into the expression’s range table
The range table is the set of relations that can be referenced in
expressions — each Query has an associated range table
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 43 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Analysis Phase
The analysis phase is where we lookup identifiers; therefore,
during the analysis phase, we need to add range table entries for
the NEW and OLD relations
Other analysis phase work is straightforward:
Exclusive-lock the target relation
Disallow subqueries and aggregates in the WHEN clause
See pages 4 through 6 of the handout
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 44 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
System Catalogs
The format of the system catalogs is defined by header files, in the
src/include/catalog directory
These files are normal C headers, with some special macros
These macros are pre-processed for bootstrapping (initdb)
Nice effect: access to system catalog fields is the same as
accessing a C struct
A compiled copy of the backend depends upon the exact definition
of the system catalogs
If you modify the system catalog format, bump the catalog version
to force initdb
See pages 6 and 7
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 45 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
System Catalog Changes
Triggers are stored in the pg trigger catalog
To add support for WHEN, we add a new field to
FormData pg trigger in pg trigger.h
Add tgqual field, which stores a serialized version of the WHEN
expression tree
Review: nodeToString() serializes a Node
We can use stringToNode() to reconstruct the expression tree
when needed
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 46 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
CREATE TRIGGER Changes
CREATE TRIGGER needs to store the textual representation of the
WHEN clause in the new row it inserts into pg trigger
Also reject WHEN clause for statement-level triggers here
Also create a dependency between the elements of the WHEN
expression and the trigger
If the WHEN clause references column a of the table, DROP COLUMN
a should be disallowed (without cascade)
See page 8
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 47 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
TriggerDesc Updates
The Relation struct contains metadata about an opened
relation: the relation’s pg class row, a description of the format
of its tuples, associated indexes, associated triggers, etc.
Stored in the relcache
See pages 7 and 8
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 48 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
TriggerDesc Updates, cont.
Trigger information is stored in a subsidiary struct, TriggerDesc,
which itself contains a Trigger struct for each trigger on the
relation
Add a field to Trigger to store the WHEN clause
Fill it in when TriggerDesc constructed
Remember to update support functions!
FreeTriggerDesc(), CopyTriggerDesc(),
equalTriggerDescs()
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 49 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Executor Changes
The guts of the required changes are in the executor
We need to evaluate the WHEN clause before we fire a row-level
trigger
To do that, we need to:
Preprocess the WHEN clause to get it ready to be evaluated
Teach the executor to be able to evaluate expressions referencing
the NEW and OLD relations
See pages 9 through 12
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 50 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
OLD and NEW in Executor
Review: ExecEvalExpr() evaluates expression trees
To do so, it uses an ExprContext
All info needed to evaluate an expression
To evaluate an expression, you find an appropriate ExprContext,
setup the necessary information, then use ExecEvalExpr()
The executor keeps a “per-tuple ExprContext” that we can use: it is
reset for each tuple that is output
See pages 10
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 51 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Evaluating Variable Expressions
ExecEvalVar() is called by ExecEvalExpr() to evaluate Var
expressions:
ExecEvalVar change
switch (variable->varno)
{
case INNER: /* get the tuple from the inner node */
slot = econtext->ecxt_innertuple;
break;
case OUTER: /* get the tuple from the outer node */
slot = econtext->ecxt_outertuple;
break;
default: /* get the tuple from the relation being scanned */
slot = econtext->ecxt_scantuple;
break;
}
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 52 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Evaluating Variables
Note that the varno is ignored, except for the special INNER and
OUTER varnos
The code assumes that the caller will insert the current tuple into
the ExprContext’s “scan tuple” slot before calling ExecEvalExpr
This won’t work for us: the WHEN expression could reference two
different tuples (OLD and NEW)
How can we solve this?
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 53 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Solution
Add two more special varnos, TRIG OLD VARNO and
TRIG NEW VARNO
In the analysis phase, rewrite the varnos in the expression so that
references to the special relations are assigned the right varno
Machinery for this exists: ChangeVarNodes walks an expression
tree, changing varno x → y in every node of the tree
Add two new slots to ExprContext to hold the OLD and NEW
tuples, and setup these slots before calling ExecEvalExpr
In ExecEvalVar, add two more special-cases for the two special
varnos, fetching from the appropriate slots of the ExprContext
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 54 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Checking The Qualification
Before firing triggers, check the WHEN clause
For BEFORE triggers, this is easy. Add code to invoke
ExecQual() to:
ExecBRDeleteTriggers()
ExecBRInsertTriggers()
ExecBRUpdateTriggers()
Use the current executor instance to get per-tuple ExprContext;
try to avoid overhead by preparing WHEN expression the first time
the trigger is fired for this command
See pages 10 and 11
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 55 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
AFTER Trigger Support
Unfortunately, supporting AFTER triggers is not so easy
AfterTriggerSaveEvent() enqueues a trigger to be invoked
later, such as at the end of the current query
We can’t check the WHEN condition here
Instead, we need to check the WHEN condition when the saved
events are fired — but we won’t necessarily have an executor
instance to use!
Should just be a Small Matter of Programming
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 56 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Subqueries in WHEN clause
Subqueries in the WHEN clause would be convenient
Unfortunately, they’re hard to implement
We’d have to run the full-fledged query planner on the expression
Postgres has the infrastructure to do this, it’s just a matter of using
it
All the other code we’ve written is prepared to handle subqueries
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 57 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
psql Support
psql’s \d command includes the definitions of the triggers on a
table. How do we get it to include the WHEN clause?
psql gets trigger definitions by calling the backend function
pg get triggerdef(), so we need to update it
There is already machinery for pretty-printing expressions as SQL
text, so we can reuse all that
One hurdle: tgqual may contain the special TRIG OLD VARNO
and TRIG NEW VARNO varnos, which the expression printing code
doesn’t understand
Quick hack: use ChangeVarNodes() to switch back to original
varnos
See pages 12 and 13
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 58 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
pg dump Support
We need to update pg dump to dump WHEN clause
pg dump reconstructs the CREATE TRIGGER command for a
trigger by examining the trigger’s pg trigger row
For WHEN, this isn’t so easy:
tgqual references TG OLD VARNO and TG NEW VARNO, so there is
no easy way to reconstruct tgqual in a client app
Change pg dump to use pg get triggerdef() to send a
fully-formed CREATE TRIGGER to the client
See pages 12 and 13
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 59 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Regression Tests
Invoked by make check
Run out of src/test/regress
Put tests in sql/triggers.sql
Reflect changes in expected/triggers.out
See pages 13 and 14
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 60 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Documentation
Documentation is in DocBook SGML
Located in docs/src/sgml
SQL command reference in ref/create trigger.sgml
Be sure to add an example
See page 14 and 15
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 61 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
TODO Items
As implemented, the patch has some deficiencies:
No support for AFTER triggers
No support for subqueries in the WHEN clause
Leaks the when field in FreeTriggerDesc()
setup trigger quals() does some redundant work
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 62 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Where to go from here
Sign up to -hackers, -patches and -committers
Review incoming patches, particularly those from active hackers,
learn from them
Review the todo list: http://developer.postgresql.org
Choose some small project (denoted by % on the official TODO
list)
Read the code as much as possible
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 63 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Where to go from here
Sign up to -hackers, -patches and -committers
Review incoming patches, particularly those from active hackers,
learn from them
Review the todo list: http://developer.postgresql.org
Choose some small project (denoted by % on the official TODO
list)
Read the code as much as possible
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 63 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Where to go from here
Sign up to -hackers, -patches and -committers
Review incoming patches, particularly those from active hackers,
learn from them
Review the todo list: http://developer.postgresql.org
Choose some small project (denoted by % on the official TODO
list)
Read the code as much as possible
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 63 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Where to go from here
Sign up to -hackers, -patches and -committers
Review incoming patches, particularly those from active hackers,
learn from them
Review the todo list: http://developer.postgresql.org
Choose some small project (denoted by % on the official TODO
list)
Read the code as much as possible
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 63 / 63
Why hack on Postgres? Development Environment PostgreSQL Architecture Conventions, and all that Submitting a pat
Where to go from here
Sign up to -hackers, -patches and -committers
Review incoming patches, particularly those from active hackers,
learn from them
Review the todo list: http://developer.postgresql.org
Choose some small project (denoted by % on the official TODO
list)
Read the code as much as possible
Gavin Sherry (Alcove) Introduction to Hacking PostgreSQL January 14, 2007 63 / 63