Skip to content

Reflect index's column operator class on PostgreSQL #12504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dlax
Copy link
Contributor

@dlax dlax commented Apr 7, 2025

Fill the postgresql_ops key of PostgreSQL's dialect_options returned by get_multi_indexes() with a mapping from column names to the operator class, if it's not the default for respective data type.

As we need to join on pg_catalog.pg_opclass, the table definition is added to postgresql.pg_catalog.

Fixes #8664.

@dlax dlax marked this pull request as ready for review April 7, 2025 14:11
@CaselIT
Copy link
Member

CaselIT commented Apr 7, 2025

Have yet to look, but it looks good.

Are there noticeable performances differences?
You can use the script https://github.com/sqlalchemy/sqlalchemy/blob/main/test/perf/many_table_reflection.py

@dlax
Copy link
Contributor Author

dlax commented Apr 8, 2025

Are there noticeable performances differences?

On main, commit d0bc9b2:

$ python test/perf/many_table_reflection.py --db 'postgresql+psycopg:///sqlaperf' --test indexes --sqlstats
Generating 250 using engine Engine(postgresql+psycopg:///sqlaperf) in schema default
Meta info {'tables': 250, 'columns': 32564, 'index': 959, 'check_con': 84, 'foreign_keys_con': 336, 'unique_con': 945, 'identity': 29}
Running create_tables ...... done
Reflected table number 250 in schema default
starting stats for subsequent tests
Running reflect_indexes ...... done
Running drop_tables ...... done
{'populate-meta': 0.7698001861572266,
 'create-tables': 3.912214756011963,
 'get_table_names': 0.012355327606201172,
 'get_indexes': 0.7168705463409424,
 'get_multi_indexes': 0.06203150749206543,
 'drop-tables': 0.7829418182373047}
--------------------------------------------------
total number of queries: 502. Total time 0.5161380767822266
--------------------------------------------------
Query times: total_t=0.3885214328765869, max_t=0.03788638114929199, min_t=0.001031637191772461, avg_t=0.0015478941548868004, median_t=0.0012700557708740234  Number of calls: 251
SELECT pg_catalog.pg_index.indrelid, pg_catalog.pg_class.relname, pg_catalog.pg_index.indisunique, pg_catalog.pg_constraint.conrelid IS NOT NULL AS has_constraint, pg_catalog.pg_index.indoption, pg_catalog.pg_class.reloptions, pg_catalog.pg_am.amname, CASE WHEN (pg_catalog.pg_index.indpred IS NOT NULL) THEN pg_catalog.pg_get_expr(pg_catalog.pg_index.indpred, pg_catalog.pg_index.indrelid) END AS filter_definition, pg_catalog.pg_index.indnkeyatts, pg_catalog.pg_index.indnullsnotdistinct, idx_cols.elements, idx_cols.elements_is_expr
FROM pg_catalog.pg_index JOIN pg_catalog.pg_class ON pg_catalog.pg_index.indexrelid = pg_catalog.pg_class.oid JOIN pg_catalog.pg_am ON pg_catalog.pg_class.relam = pg_catalog.pg_am.oid LEFT OUTER JOIN (SELECT idx_attr.indexrelid AS indexrelid, min(idx_attr.indrelid) AS min_1, array_agg(idx_attr.element ORDER BY idx_attr.ord) AS elements, array_agg(idx_attr.is_expr ORDER BY idx_attr.ord) AS elements_is_expr
FROM (SELECT idx.indexrelid AS indexrelid, idx.indrelid AS indrelid, idx.ord AS ord, CASE WHEN (idx.attnum = %(attnum_1)s::INTEGER) THEN pg_catalog.pg_get_indexdef(idx.indexrelid, idx.ord + %(ord_1)s::INTEGER, %(pg_get_indexdef_1)s) ELSE CAST(pg_catalog.pg_attribute.attname AS TEXT) END AS element, idx.attnum = %(attnum_2)s::INTEGER AS is_expr
FROM (SELECT pg_catalog.pg_index.indexrelid AS indexrelid, pg_catalog.pg_index.indrelid AS indrelid, unnest(pg_catalog.pg_index.indkey) AS attnum, generate_subscripts(pg_catalog.pg_index.indkey, %(generate_subscripts_1)s::INTEGER) AS ord
FROM pg_catalog.pg_index
WHERE NOT pg_catalog.pg_index.indisprimary AND pg_catalog.pg_index.indrelid IN (__[POSTCOMPILE_oids])) AS idx LEFT OUTER JOIN pg_catalog.pg_attribute ON pg_catalog.pg_attribute.attnum = idx.attnum AND pg_catalog.pg_attribute.attrelid = idx.indrelid
WHERE idx.indrelid IN (__[POSTCOMPILE_oids])) AS idx_attr GROUP BY idx_attr.indexrelid) AS idx_cols ON pg_catalog.pg_index.indexrelid = idx_cols.indexrelid LEFT OUTER JOIN pg_catalog.pg_constraint ON pg_catalog.pg_index.indrelid = pg_catalog.pg_constraint.conrelid AND pg_catalog.pg_index.indexrelid = pg_catalog.pg_constraint.conindid AND pg_catalog.pg_constraint.contype = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR])
WHERE pg_catalog.pg_index.indrelid IN (__[POSTCOMPILE_oids]) AND NOT pg_catalog.pg_index.indisprimary ORDER BY pg_catalog.pg_index.indrelid, pg_catalog.pg_class.relname

Query times: total_t=0.12353086471557617, max_t=0.0009763240814208984, min_t=0.00039458274841308594, avg_t=0.0004941234588623047, median_t=0.0004544258117675781  Number of calls: 250
SELECT pg_catalog.pg_class.oid, pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR, %(param_4)s::VARCHAR, %(param_5)s::VARCHAR]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s::VARCHAR AND pg_catalog.pg_class.relname IN (__[POSTCOMPILE_filter_names])

Query times: total_t=0.0040857791900634766, max_t=0.0040857791900634766, min_t=0.0040857791900634766, avg_t=0.0040857791900634766, median_t=0.0040857791900634766  Number of calls: 1
SELECT pg_catalog.pg_class.oid, pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR]) AND pg_catalog.pg_class.relpersistence != %(relpersistence_1)s::VARCHAR AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s::VARCHAR

On this branch:

$ python test/perf/many_table_reflection.py --db 'postgresql+psycopg:///sqlaperf' --test indexes --sqlstats
Generating 250 using engine Engine(postgresql+psycopg:///sqlaperf) in schema default
Meta info {'tables': 250, 'columns': 32939, 'index': 984, 'check_con': 84, 'foreign_keys_con': 358, 'unique_con': 941, 'identity': 23}
Running create_tables ...... done
Reflected table number 250 in schema default
starting stats for subsequent tests
Running reflect_indexes ...... done
Running drop_tables ...... done
{'populate-meta': 0.7974131107330322,
 'create-tables': 4.5137574672698975,
 'get_table_names': 0.014792442321777344,
 'get_indexes': 1.0036499500274658,
 'get_multi_indexes': 0.048918724060058594,
 'drop-tables': 0.776090145111084}
--------------------------------------------------
total number of queries: 502. Total time 0.6737213134765625
--------------------------------------------------
Query times: total_t=0.5131850242614746, max_t=0.027825593948364258, min_t=0.0011928081512451172, avg_t=0.0020445618496473094, median_t=0.0017914772033691406  Number of calls: 251
SELECT pg_catalog.pg_index.indrelid, pg_catalog.pg_class.relname, pg_catalog.pg_index.indisunique, pg_catalog.pg_constraint.conrelid IS NOT NULL AS has_constraint, pg_catalog.pg_index.indoption, pg_catalog.pg_class.reloptions, pg_catalog.pg_am.amname, CASE WHEN (pg_catalog.pg_index.indpred IS NOT NULL) THEN pg_catalog.pg_get_expr(pg_catalog.pg_index.indpred, pg_catalog.pg_index.indrelid) END AS filter_definition, pg_catalog.pg_index.indnkeyatts, pg_catalog.pg_index.indnullsnotdistinct, idx_cols.elements_opclass, idx_cols.elements_opdefault, idx_cols.elements, idx_cols.elements_is_expr
FROM pg_catalog.pg_index JOIN pg_catalog.pg_class ON pg_catalog.pg_index.indexrelid = pg_catalog.pg_class.oid JOIN pg_catalog.pg_am ON pg_catalog.pg_class.relam = pg_catalog.pg_am.oid LEFT OUTER JOIN (SELECT idx_attr.indexrelid AS indexrelid, min(idx_attr.indrelid) AS min_1, array_agg(idx_attr.element ORDER BY idx_attr.ord) AS elements, array_agg(idx_attr.is_expr ORDER BY idx_attr.ord) AS elements_is_expr, array_agg(idx_attr.opcname ORDER BY idx_attr.ord) AS elements_opclass, array_agg(idx_attr.opcdefault ORDER BY idx_attr.ord) AS elements_opdefault
FROM (SELECT idx.indexrelid AS indexrelid, idx.indrelid AS indrelid, idx.ord AS ord, CASE WHEN (idx.attnum = %(attnum_1)s::INTEGER) THEN pg_catalog.pg_get_indexdef(idx.indexrelid, idx.ord + %(ord_1)s::INTEGER, %(pg_get_indexdef_1)s) ELSE CAST(pg_catalog.pg_attribute.attname AS TEXT) END AS element, idx.attnum = %(attnum_2)s::INTEGER AS is_expr, pg_catalog.pg_opclass.opcname AS opcname, pg_catalog.pg_opclass.opcdefault AS opcdefault
FROM (SELECT pg_catalog.pg_index.indexrelid AS indexrelid, pg_catalog.pg_index.indrelid AS indrelid, unnest(pg_catalog.pg_index.indkey) AS attnum, unnest(pg_catalog.pg_index.indclass) AS att_opclass, generate_subscripts(pg_catalog.pg_index.indkey, %(generate_subscripts_1)s::INTEGER) AS ord
FROM pg_catalog.pg_index
WHERE NOT pg_catalog.pg_index.indisprimary AND pg_catalog.pg_index.indrelid IN (__[POSTCOMPILE_oids])) AS idx LEFT OUTER JOIN pg_catalog.pg_attribute ON pg_catalog.pg_attribute.attnum = idx.attnum AND pg_catalog.pg_attribute.attrelid = idx.indrelid LEFT OUTER JOIN pg_catalog.pg_opclass ON pg_catalog.pg_opclass.oid = idx.att_opclass
WHERE idx.indrelid IN (__[POSTCOMPILE_oids])) AS idx_attr GROUP BY idx_attr.indexrelid) AS idx_cols ON pg_catalog.pg_index.indexrelid = idx_cols.indexrelid LEFT OUTER JOIN pg_catalog.pg_constraint ON pg_catalog.pg_index.indrelid = pg_catalog.pg_constraint.conrelid AND pg_catalog.pg_index.indexrelid = pg_catalog.pg_constraint.conindid AND pg_catalog.pg_constraint.contype = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR])
WHERE pg_catalog.pg_index.indrelid IN (__[POSTCOMPILE_oids]) AND NOT pg_catalog.pg_index.indisprimary ORDER BY pg_catalog.pg_index.indrelid, pg_catalog.pg_class.relname

Query times: total_t=0.15839672088623047, max_t=0.0021893978118896484, min_t=0.000438690185546875, avg_t=0.0006335868835449219, median_t=0.0005793571472167969  Number of calls: 250
SELECT pg_catalog.pg_class.oid, pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR, %(param_4)s::VARCHAR, %(param_5)s::VARCHAR]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s::VARCHAR AND pg_catalog.pg_class.relname IN (__[POSTCOMPILE_filter_names])

Query times: total_t=0.002139568328857422, max_t=0.002139568328857422, min_t=0.002139568328857422, avg_t=0.002139568328857422, median_t=0.002139568328857422  Number of calls: 1
SELECT pg_catalog.pg_class.oid, pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR]) AND pg_catalog.pg_class.relpersistence != %(relpersistence_1)s::VARCHAR AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s::VARCHAR

Fill the `postgresql_ops` key of PostgreSQL's `dialect_options` returned
by get_multi_indexes() with a mapping from column names to the operator
class, if it's not the default for respective data type.

As we need to join on ``pg_catalog.pg_opclass``, the table definition is
added to ``postgresql.pg_catalog``.

Fixes sqlalchemy#8664.
@CaselIT
Copy link
Member

CaselIT commented Apr 8, 2025

I'll look if it's possible to improve the time of the single getindex a bit, but overall it seems mostly ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Postgres: Reflection of indexes with operators
2 participants