Improving Workspace/Template/User search queries with in memory search index #12706
Emyrk
started this conversation in
Feature Requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Relevent issues:
has-agent:connected
#6439Status quo
The workspace search has become an increasing headache to support. Template & user search have essentially been ignored.
Problem
Using postgres as our search engine creates massively large queries of increasing complexity. Our
GetWorkspaces()
query is currently 313 lines long to support 11 search params.These params are inconsistent in their implementation.
Regex vs exact
name:<wrk_name>
is a substring match.template:<tpl_name>
is an exact matchowner:<usr_name>
is an exact match.List vs singular
id:<wrk_id1>,<wrk_id2> id:<wrk_id3>
allows listsname:<wrk_name
is only 1 nametemplate:<tpl_name>
is only 1 nameowner:<usr_name>
is only 1 namestatus:<status>
is only 1 statusSolution
Switch to an in memory index solution like Bleve. We can search all fields with all the bells an whistles.
See Bleve's query language here: https://blevesearch.com/docs/Query-String-Query/
Upsides
We no longer need to write code to search our fields. Each new search field becomes an increasingly large task given the size of the query.
We get the ability to search by any field with more complex search queries.
Downsides
Increased memory + disk space.
101 workspaces as json seems to create an storage size of 1.6MB. 101,000 workspaces could in theory take 1.6GB? This is stored in a boltdb database by default. KV Store options can be seen here: https://blevesearch.com/docs/Blevex/. Would need some more research to see exactly what the impact is.
Technical implementation
Each Coderd would likely maintain it's own index for workspaces, templates, and users. Each time a workspace/template/user is updated/added, we would need to update the search index. In an HA environment, we might have to use the pubsub to ensure updates on 1 coderd get picked up by the rest.
With the index, when a user does a workspace search, we first will search the index. The results will then be filtered by RBAC, and then sent to the database by workspace id.
Final thoughts
I think this would be a cool feature, and gain a lot of power in the search function, but I am unsure if it justifies the cost. Today, our search feature has some issues, but it does work well enough it seems. The cost of adding another database, even if just on disk, is not cheap.
Made this a github discussion to see if there is enough value to justify looking into it.
Audit log
Doing this for the audit log would be amazing, but I'm not sure how large that index would get. That set of data is unbounded.
Beta Was this translation helpful? Give feedback.
All reactions