Skip to content

INTPYTHON-635 Slow joins for ManyToMany fields #309

@skonstant

Description

@skonstant

I am running into a tricky issue, the joins are slow for my ManyToMany fields.

observation.subscopes.all()

Generates this:

[{ '$lookup': { 'from': 'orm_observation_subscopes', 'let': { 'parent__field__0': '$_id' }, 'pipeline': [{ '$match': { '$expr': { '$and': [{ '$eq': ['$$parent__field__0', '$subscope_id'] }] } } }], 'as': 'orm_observation_subscopes' } }, { '$unwind': '$orm_observation_subscopes' }, { '$match': { '$expr': { '$eq': ['$orm_observation_subscopes.observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')] } } }, { '$facet': { 'group': [{ '$group': { '__count': { '$sum': { '$cond': { 'if': { '$in': [{ '$type': { '$literal': True } }, ['missing', 'null'] ] }, 'then': None, 'else': 1 } } }, '_id': None } }] } }, { '$addFields': { '__count': { '$getField': { 'input': { '$arrayElemAt': ['$group', 0] }, 'field': '__count' } }, '_id': { '$getField': { 'input': { '$arrayElemAt': ['$group', 0] }, 'field': '_id' } } } }, { '$project': { '__count': { '$ifNull': ['$__count', { '$literal': 0 }] } } }]

And this takes 3 seconds because the "pivot table" is very large, even though the join is on a specific "observation id", a field that is indexed.

If I move the match inside the lookup, the same qery takes milliseconds.

[{ '$lookup': { 'from': 'orm_observation_subscopes', 'let': { 'parent__field__0': '$_id' }, 'pipeline': [{ '$match': { '$expr': { '$and': [{ '$eq': ['$$parent__field__0', '$subscope_id'] },{ '$eq': ['$observation_id', ObjectId('6836e1718c5bb0a20fab6467')] }] } } }], 'as': 'orm_observation_subscopes' } }, { '$unwind': '$orm_observation_subscopes' }, { '$limit': 21 }]

I took a quick look at the code that generates the lookup and at first sight I was not clear where to make the change, I'll dig a bit deeper but perhaps you already know where to look to implement the fix.

This is particularly penalizing, as we use a few foreign keys and many to manys on fairly large collections, and would like to keep it this way.

It looks like the fix in the query is simple, it is probably much more complex in the generator.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions