Skip to content

WindowFunctionsByDefault

Mike Bayer edited this page Feb 20, 2014 · 2 revisions

WindowFunctionsByDefault

note: - See also http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery for a more ad-hoc version of this technique.

note: - this recipe requires at least SQLAlchemy version 0.6.2.

SQL 2003 introduces the concept of a "window function", that is a function which allows criteria against the current result row in the result returned by a SELECT statement. Per http://troels.arvin.dk/db/rdbms/#select-limit-offset, the window function ROW_NUMBER() is used as the SQL standard method for paginating result sets.

Oracle and SQL Server have supported window functions for many years. Postgresql 8.4 supports them, although testing reveals that PG's OFFSET construct, while having inherent performance issues, is still dramatically faster than the ROW_NUMBER() approach.

SQLAlchemy's current behavior (0.6.1) does not use window functions for LIMIT/OFFSET, instead using the database specific keywords provided, and on Oracle we currently use a scheme recommended by a prominent Oracle engineer based on the special ROWNUM variable (http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html).

The recipe below allows one to replace all database-dependent LIMIT/OFFSET behavior with a single method that renders a ROW_NUMBER(). It's based on our Oracle LIMIT/OFFSET implementation previous to version 0.5, and modifies the compilation of the Select construct such that LIMIT / OFFSET are detected and converted into a subquery with ROW_NUMBER() OVER (ORDER BY col ASC|DESC). The ORDER BY is required as it's not very useful to paginate results without sorting them.

#!python

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Select
from sqlalchemy import exc, sql

@compiles(Select)
def compile_select(element, compiler, **kw):
    if not getattr(element, '_window_visit', None):
        if element._limit is not None or element._offset is not None:
            limit, offset = element._limit, element._offset

            # copy the element so that we can modify it
            element = element._generate()
            element._window_visit = True
            element._limit = element._offset = None
            
            orderby = compiler.process(element._order_by_clause) 
            if not orderby:
                raise exc.CompileError("ORDER BY is required with LIMIT/OFFSET")
                
            element = element.column(
                                    sql.literal_column("ROW_NUMBER() OVER (ORDER BY %s)" % orderby
                                    ).label("rownum")).order_by(None) 
            
            limitselect = sql.select([c for c in element.alias().c if c.key != 'rownum'])
            limitselect._window_visit = True
            limitselect._is_wrapper = True
            
            if offset is not None: 
                limitselect.append_whereclause(sql.column("rownum") > offset) 
                if limit is not None: 
                    limitselect.append_whereclause(
                                    sql.column("rownum") <= 
                                    (limit + offset)
                                )
            else: 
                limitselect.append_whereclause(sql.column("rownum") <= limit) 
        
            element = limitselect
    
    kw['iswrapper'] = getattr(element, '_is_wrapper', False)
    
    return compiler.visit_select(element, **kw)

if __name__ == '__main__':
    from sqlalchemy.sql import table, column, select
    
    t1 = table('t1', column('c1'), column('c2'), column('c3'))
    
    s = select([t1]).order_by(t1.c.c2).limit(3).offset(5)
    print s

Variants of the above are doable as well, such as only running the window function when OFFSET is present, or when ORDER BY is present, otherwise falling back to default behavior. The example below illustrates how to render the window function only if ORDER BY is present:

#!python

@compiles(Select)
def compile_select(element, compiler, **kw):
    if not getattr(element, '_window_visit', None):
        if element._limit is not None or element._offset is not None:
            limit, offset = element._limit, element._offset

            orderby = compiler.process(element._order_by_clause) 
            if orderby:
                element = element._generate()
                element._window_visit = True
                element._limit = element._offset = None

                element = element.column(
                                    sql.literal_column("ROW_NUMBER() OVER (ORDER BY %s)" % orderby
                                    ).label("rownum")).order_by(None) 

                limitselect = sql.select([c for c in element.alias().c if c.key != 'rownum'])
                limitselect._window_visit = True
                limitselect._is_wrapper = True

                if offset is not None: 
                    limitselect.append_whereclause(sql.column("rownum") > offset) 
                    if limit is not None: 
                        limitselect.append_whereclause(
                                        sql.column("rownum") <= 
                                        (limit + offset)
                                    )
                else: 
                    limitselect.append_whereclause(sql.column("rownum") <= limit) 

                element = limitselect

    kw['iswrapper'] = getattr(element, '_is_wrapper', False)

    return compiler.visit_select(element, **kw)
Clone this wiki locally