Skip to content

[DNM] An experimental branch to try to push down aggregate functions #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: REL_11_CARTO
Choose a base branch
from

Conversation

rafatower
Copy link

What this does:

  • it adds a ton of traces, that are more suitable to follow certain
    aspects of the code that raw gdb on the foreign_expr_walker
    recursive exploration of the parse tree.

  • it modifies the code related to checking of collations. I don't
    really understand if it is a bug or if there's actually a conflict
    with collations and it is unsafe to push, but I tend to think it is
    indeed a bug as no collation should affect it.

  • it allows for Row Expressions node push downs (T_RowExpr) in a very
    hacky way. Chances are that to do it properly it needs to continue
    with a recursive exploration of the tree from that node. This was
    preventing the whole aggregate function ST_AsMVt to be pushed down
    to the foreign server.

  • it provides a crappy implementation of deparseRowExpr. It makes the
    planner happy to ship some SQL but it is not really valid and fails
    with an error on the remote end. It shall not confuse the
    representation in the EXPLAIN of ROW() with the SQL ROW() row
    constructor.

In short this just half-works, and we really need both halfs for it to
do something interesting. Nevertheless I think it is good to share, to
give an idea of the type of work to accomplish to finish this and
similar cases.

What this does:

- it adds a ton of traces, that are more suitable to follow certain
  aspects of the code that raw gdb on the foreign_expr_walker
  recursive exploration of the parse tree.

- it modifies the code related to checking of collations. I don't
  really understand if it is a bug or if there's actually a conflict
  with collations and it is unsafe to push, but I tend to think it is
  indeed a bug as no collation should affect it.

- it allows for Row Expressions node push downs (T_RowExpr) in a very
  hacky way. Chances are that to do it properly it needs to continue
  with a recursive exploration of the tree from that node. This was
  preventing the whole aggregate function ST_AsMVt to be pushed down
  to the foreign server.

- it provides a crappy implementation of deparseRowExpr. It makes the
  planner happy to ship some SQL but it is not really valid and fails
  with an error on the remote end. It shall not confuse the
  representation in the EXPLAIN of ROW() with the SQL ROW() row
  constructor.

In short this just half-works, and we really need both halfs for it to
do something interesting. Nevertheless I think it is good to share, to
give an idea of the type of work to accomplish to finish this and
similar cases.
@rafatower
Copy link
Author

Here an example without the patch:

explain (verbose) SELECT ((SELECT ST_AsMVT(geometries, 'layer0', 2048, 'the_geom_webmercator')
                        FROM (
                            SELECT ST_AsMVTGeom(
                            the_geom_webmercator,
                            ST_MakeEnvelope(-13638811.830980570083, 4539747.98391318832, -13629027.891360067522, 4549531.923533690881, 3857),
                            2048,
                            8,
                            true
                        ) as the_geom_webmercator, "st_type", "cartodb_id"
                            FROM (
                SELECT
                    "st_type","cartodb_id","the_geom_webmercator"
                FROM
                    (SELECT "st_type","cartodb_id","the_geom_webmercator" FROM cdb_fdw_example.sf_stclines ) _cdb_epoch_transformation ) AS cdbq
                            WHERE the_geom_webmercator && ST_Expand(ST_MakeEnvelope(-13638811.830980570083, 4539747.98391318832, -13629027.891360067522, 4549531.923533690881, 3857), 38.21851414258813)
                        ) AS geometries
                        )) AS st_asmvt;

and the resulting query plan:

                                                                                                                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                                                                                                                               
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=19118.66..19118.67 rows=1 width=32)
   Output: $0
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=19118.65..19118.66 rows=1 width=32)
           Output: st_asmvt(ROW(st_asmvtgeom(sf_stclines.the_geom_webmercator, 'BOX(-13638811.8309806 4539747.98391319,-13629027.8913601 4549531.92353369)'::box2d, 2048, 8, true), sf_stclines.st_type, sf_stclines.cartodb_id), 'layer0'::text, 2048, 'the_geom_webmercator'::text)
           ->  Foreign Scan on cdb_fdw_example.sf_stclines  (cost=229.49..1937.53 rows=3254 width=39)
                 Output: sf_stclines.cartodb_id, sf_stclines.the_geom, sf_stclines.the_geom_webmercator, sf_stclines.zip_code, sf_stclines.district, sf_stclines.street, sf_stclines.st_type, sf_stclines.lf_fadd, sf_stclines.lf_toadd, sf_stclines.rt_fadd, sf_stclines.rt_toadd, sf_stclines.f_node_cnn, sf_stclines.t_node_cnn, sf_stclines.cnn, sf_stclines.jurisdicti, sf_stclines.nhood, sf_stclines.layer, sf_stclines.cnntext, sf_stclines.streetname, sf_stclines.classcode, sf_stclines.street_gc, sf_stclines.oneway, sf_stclines.created_at, sf_stclines.updated_at, sf_stclines.accepted
                 Remote SQL: SELECT cartodb_id, the_geom_webmercator, st_type FROM carto_lite.sf_stclines WHERE ((the_geom_webmercator OPERATOR(public.&&) '0103000020110F00000100000005000000F075954198036AC1474CFC704F515141F075954198036AC1CC4F1789F05A51412DF487B5C7FE69C1CC4F1789F05A51412DF487B5C7FE69C1474CFC704F515141F075954198036AC1474CFC704F515141'::public.geometry))
(8 rows)

so the st_asmvt aggregate is executed in the local db.

The same query with the patch:

                                                                         QUERY PLAN                                                                                                                                                                                                                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=6045.74..6045.75 rows=1 width=32)
   Output: $0
   InitPlan 1 (returns $0)
     ->  Foreign Scan  (cost=6045.71..6045.74 rows=1 width=32)
           Output: (st_asmvt(ROW(st_asmvtgeom(sf_stclines.the_geom_webmercator, 'BOX(-13638811.8309806 4539747.98391319,-13629027.8913601 4549531.92353369)'::box2d, 2048, 8, true), sf_stclines.st_type, sf_stclines.cartodb_id), 'layer0'::text, 2048, 'the_geom_webmercator'::text))
           Relations: Aggregate on (cdb_fdw_example.sf_stclines)
           Remote SQL: SELECT public.st_asmvt((public.st_asmvtgeom(the_geom_webmercator, 'BOX(-13638811.8309806 4539747.98391319,-13629027.8913601 4549531.92353369)'::public.box2d, 2048, 8, true), st_type, cartodb_id), 'layer0'::text, 2048, 'the_geom_webmercator'::text) FROM carto_lite.sf_stclines WHERE ((the_geom_webmercator OPERATOR(public.&&) '0103000020110F00000100000005000000F075954198036AC1474CFC704F515141F075954198036AC1CC4F1789F05A51412DF487B5C7FE69C1CC4F1789F05A51412DF487B5C7FE69C1474CFC704F515141F075954198036AC1474CFC704F515141'::public.geometry))
(7 rows)

so, the whole thing is sent to the foreign end, which is what I originally intended. However, the row expression is not "deparsed" correctly, producing an error:

ERROR:  parse_column_keys: no geometry column found
CONTEXT:  remote SQL command: SELECT public.st_asmvt((public.st_asmvtgeom(the_geom_webmercator, 'BOX(-13638811.8309806 4539747.98391319,-13629027.8913601 4549531.92353369)'::public.box2d, 2048, 8, true), st_type, cartodb_id), 'layer0'::text, 2048, 'the_geom_webmercator'::text) FROM carto_lite.sf_stclines WHERE ((the_geom_webmercator OPERATOR(public.&&) '0103000020110F00000100000005000000F075954198036AC1474CFC704F515141F075954198036AC1CC4F1789F05A51412DF487B5C7FE69C1CC4F1789F05A51412DF487B5C7FE69C1474CFC704F515141F075954198036AC1474CFC704F515141'::public.geometry))

indeed the query shipped to the foreign end is not correct because of the way the RowExpr is "deparsed":

SELECT public.st_asmvt((public.st_asmvtgeom(the_geom_webmercator, 'BOX(-13638811.8309806 4539747.98391319,-13629027.8913601 4549531.92353369)'::public.box2d, 2048, 8, true), st_type, cartodb_id), 'layer0'::text, 2048, 'the_geom_webmercator'::text) FROM carto_lite.sf_stclines WHERE ((the_geom_webmercator OPERATOR(public.&&) '0103000020110F00000100000005000000F075954198036AC1474CFC704F515141F075954198036AC1CC4F1789F05A51412DF487B5C7FE69C1CC4F1789F05A51412DF487B5C7FE69C1474CFC704F515141F075954198036AC1474CFC704F515141'::public.geometry));
ERROR:  parse_column_keys: no geometry column found

bool first;
ListCell *lc;

appendStringInfoString(buf, "(");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally it read like appendStringInfoString(buf, "ROW(");. I read somewhere that ROW is "syntax noise" when there are more than one columns in the row and forgot to get it back to its original, generic form.

@rafatower
Copy link
Author

The structure of a RowExpr is defined here: https://github.com/CartoDB/postgres/blob/REL_11_CARTO/src/include/nodes/primnodes.h#L1002

@rafatower
Copy link
Author

The transformation from what it gets and what it should be, done manually should be something like this:

--ROW(public.st_asmvtgeom(the_geom_webmercator, 'BOX(-9862211.13746658 5165920.11962535,-9705668.10353854 5322463.15355339)'::public.box2d, 2048, 8, true), pct_higher_edu, cartodb_id)
SELECT public.st_asmvt(some_arbitrary_alias, 'layer0'::text, 2048, 'the_geom_webmercator'::text)
FROM (
  SELECT public.st_asmvtgeom(the_geom_webmercator, 'BOX(-9862211.13746658 5165920.11962535,-9705668.10353854 5322463.15355339)'::public.box2d, 2048, 8, true) the_geom_webmercator, pct_higher_edu, cartodb_id
  FROM carto_lite.higher_edu_by_county WHERE ((the_geom_webmercator OPERATOR(public.&&) '0103000020110F00000100000005000000413647D4D4CF62C137C5E5273FB45341413647D4D4CF62C17DFD95A9504E54411F1A6F13CC8262C17DFD95A9504E54411F1A6F13CC8262C137C5E5273FB45341413647D4D4CF62C137C5E5273FB45341'::public.geometry))
) some_arbitrary_alias;

I think it is doable, but not trivial at all.

@rafatower
Copy link
Author

A small trick I learned today from https://wiki.postgresql.org/wiki/Developer_FAQ#Why_do_we_use_Node_and_List_to_make_data_structures.3F

(gdb) b deparseRowExpr
Breakpoint 1 at 0x7f9de9bac5cd: file /home/rtorre/src/postgres/pg11_build/../contrib/postgres_fdw/deparse.c, line 2432.
(gdb) c
Continuing.

Breakpoint 1, deparseRowExpr (node=0x55d7464fdbb0, context=0x7ffd98c26b50) at /home/rtorre/src/postgres/pg11_build/../contrib/postgres_fdw/deparse.c:2432
2432		StringInfo	buf = context->buf;
(gdb) call print(node)
(gdb) call pprint(node)
(gdb) 

call print(node) output in DB logs:

{ROW :args ({FUNCEXPR :funcid 2008870 :funcresulttype 54065 :funcretset false                                                                                                                               
:funcvariadic false :funcformat 0 :funccollid 0 :inputcollid 0 :args ({VAR                                                                                                                                  
:varno 4 :varattno 3 :vartype 54065 :vartypmod -1 :varcollid 0 :varlevelsup 0                                                                                                                               
:varnoold 4 :varoattno 3 :location 474} {CONST :consttype 54096 :consttypmod                                                                                                                                
-1 :constcollid 0 :constlen 65 :constbyval false :constisnull false :location                                                                                                                               
-1 :constvalue 65 [ 0 51 -62 -104 -3 127 0 0 81 32 102 100 -120 -49 98 -63 15                                                                                                                               
48 80 -125 24 -125 98 -63 24 -15 -89 7 -40 -76 83 65 -100 -47 -45 -55 -73 77                                                                                                                                
84 65 -56 -113 79 70 -41 85 0 0 32 51 -62 -104 -3 127 0 0 0 0 0 0 0 0 0 0 104                                                                                                                               
]} {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval                                                                                                                              
true :constisnull false :location 265 :constvalue 4 [ 0 8 0 0 0 0 0 0 ]}                                                                                                                                    
{CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval                                                                                                                                 
true :constisnull false :location 275 :constvalue 4 [ 8 0 0 0 0 0 0 0 ]}                                                                                                                                    
{CONST :consttype 16 :consttypmod -1 :constcollid 0 :constlen 1 :constbyval                                                                                                                                 
true :constisnull false :location 282 :constvalue 1 [ 1 0 0 0 0 0 0 0 ]})                                                                                                                                   
:location 102} {VAR :varno 4 :varattno 16 :vartype 701 :vartypmod -1                                                                                                                                        
:varcollid 0 :varlevelsup 0 :varnoold 4 :varoattno 16 :location 444} {VAR                                                                                                                                   
:varno 4 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0                                                                                                                                  
:varnoold 4 :varoattno 1 :location 461}) :row_typeid 2249 :row_format 2                                                                                                                                     
:colnames ("the_geom_webmercator" "pct_higher_edu" "cartodb_id") :location 34}

call pprint(node) output (pretty print, truncated):

   {ROW                                                                                                                                                                                                     
   :args (                                                                                                                                                                                                  
      {FUNCEXPR                                                                                                                                                                                             
      :funcid 2008870                                                                                                                                                                                       
      :funcresulttype 54065                                                                                                                                                                                 
      :funcretset false                                                                                                                                                                                     
      :funcvariadic false                                                                                                                                                                                   
      :funcformat 0                                                                                                                                                                                         
      :funccollid 0                                                                                                                                                                                         
      :inputcollid 0                                                                                                                                                                                        
      :args (                                                                                                                                                                                               
         {VAR                                                                                                                                                                                               
         :varno 4                                                                                                                                                                                           
         :varattno 3                                                                                                                                                                                        
         :vartype 54065  
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant