-
Notifications
You must be signed in to change notification settings - Fork 94
Polyglot database support for relationships? #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
TLDR: it would be awesome if a dataloader supported bulk querying where only the source ID is known, and the batch loader could use the list of source IDs to query for the target objects. |
If you have N source IDs then in order to have efficient batch reading you MUST have a capability to perform one query that takes a list of ids. In your case your neo4j query needs to take N ids as input. I dont know neo4j syntax but maybe something like
I am not sure if neo4j can handle that. But think of it like the IN operator in SQL. Otherwise as you say for N source objects you will get 1+N neo4j queries. graphql + javadataloader can only "resolve" fields at specific levels, since in graphql the "objects from level 1" feed the values in level 2. So if you have N objects at level 1, then you MUST be able to batch loader N level2 values. if you cant - then its not batch loading. I am sorry I dont know more about neo4j syntax to do this. |
This is exactly what dataloader does. It takes a list of source IDS and asks that you provide a batch loader function that can return the same sized list of target objects. If this is not what you meant by that statement can you explain more please |
I was looking more at your explanation and let me see if I can guess what you mean
I your need a BatchLoader that does 2 steps.
If your neo4j / mongo db queries cant take multiple ids as input then indeed you cant batch. |
I'm sorry my question wasn't clear. I actually know the queries I need to write. Cypher (neo4j's query language) actually does support a SQL-esque I'm actually unclear as to how to use the dataload to provide the information I need. From what I understand, you call a dataloader's |
I guess this question really boils down to: is it OK to pass not the ID of the sub-object to load but the ID of the source object to load sub-objects for? I'm not sure what happens under the covers, so I can't assume this works correctly with regards to caching, etc. This is that answer I'm looking for I suppose. |
Can you please lay out a graphql queury so I can explain it in terms of that. I will make one up and use it but if you show your queue that would help Imagine you have orders and order items adn this queue
You would have a normal loader behind the You would have a dataloader batch function behind
Notice how its a list of list of order items eg List<List>. Its your job to take set of order ids (source objects) and do a query and break it up into a list of lists per order id In SQL this might be a "select * from orderitems where orderid in (:keysofOrderIds:) group by orderId" You would than break that flat list of records into a list of list of orderItems by traversing the result set and creating a new list when the orderid changes. And then the same on the "relatedItems field". This would be another dataloader but again it will be called with N orderitem keys and if it needs to return N values. if those N values are lists then you need to break them back on "source key" boundaries if you don't do this breaking down of a single query into keys then you will indeed run into a N+1 query by issuing 1 call per key to get the list of targets. In terms of caching, the key is ALWAYS the cache key. If it has seen KeyX before and caching is turned on it will return the previous values for KeyX and keyX will NOT be presented to the batch loader I hope this helps. |
Admittedly, this isn't so much an issue as it is a request for guidance. It might turn into a feature request.
I'm working on a graphql schema that will source information from a Mongo database and a neo4j database. The documents I'm storing in Mongo are rather large (up to 2MB in some cases), and those documents are highly related. It was easiest to provide traversal between objects via a graph database, and we're not terribly concerned about most of the schema of the documents in Mongo, so the tech makes sense for our use case.
The dataloader library is a big help in optimizing our db queries to Mongo to load documents, but I'm currently at a loss as to how to use neo4j to query for which documents I need to load. The general order of operation looks something like this, for our data:
The dataloader library makes the process easy if you can derive the IDs of related documents from a source object, but in my case I'm not able to determine that without a neo4j query. The best I know how to do this at the moment is run a neo4j query for an individual source object in a datafetcher for each source object, meaning I still end up roughly with an n+1 problem.
Any guidance on how to optimize for bulk loading in this scenario?
The text was updated successfully, but these errors were encountered: