Improving Workspace/Template/User search queries with in memory search index #12706

Emyrk · 2024-03-21T16:09:40Z

Emyrk
Mar 21, 2024
Collaborator

Relevent issues:

Status quo

The workspace search has become an increasing headache to support. Template & user search have essentially been ignored.

Problem

Using postgres as our search engine creates massively large queries of increasing complexity. Our GetWorkspaces() query is currently 313 lines long to support 11 search params.

These params are inconsistent in their implementation.

Regex vs exact

name:<wrk_name> is a substring match.
template:<tpl_name> is an exact match
owner:<usr_name> is an exact match.

List vs singular

id:<wrk_id1>,<wrk_id2> id:<wrk_id3> allows lists
name:<wrk_name is only 1 name
template:<tpl_name> is only 1 name
owner:<usr_name> is only 1 name
status:<status> is only 1 status

Solution

Switch to an in memory index solution like Bleve. We can search all fields with all the bells an whistles.

Exact vs fuzzy vs regex
Conjunction (AND) or disjunctive (OR) queries
Negated queries

See Bleve's query language here: https://blevesearch.com/docs/Query-String-Query/

Upsides

We no longer need to write code to search our fields. Each new search field becomes an increasingly large task given the size of the query.

We get the ability to search by any field with more complex search queries.

Downsides

Increased memory + disk space.

101 workspaces as json seems to create an storage size of 1.6MB. 101,000 workspaces could in theory take 1.6GB? This is stored in a boltdb database by default. KV Store options can be seen here: https://blevesearch.com/docs/Blevex/. Would need some more research to see exactly what the impact is.

Technical implementation

Each Coderd would likely maintain it's own index for workspaces, templates, and users. Each time a workspace/template/user is updated/added, we would need to update the search index. In an HA environment, we might have to use the pubsub to ensure updates on 1 coderd get picked up by the rest.

With the index, when a user does a workspace search, we first will search the index. The results will then be filtered by RBAC, and then sent to the database by workspace id.

Final thoughts

I think this would be a cool feature, and gain a lot of power in the search function, but I am unsure if it justifies the cost. Today, our search feature has some issues, but it does work well enough it seems. The cost of adding another database, even if just on disk, is not cheap.

Made this a github discussion to see if there is enough value to justify looking into it.

Audit log

Doing this for the audit log would be amazing, but I'm not sure how large that index would get. That set of data is unbounded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Workspace/Template/User search queries with in memory search index #12706

{{title}}

Replies: 0 comments

Select a reply

Improving Workspace/Template/User search queries with in memory search index #12706

Emyrk Mar 21, 2024 Collaborator

Relevent issues:

Status quo

Problem

Solution

Upsides

Downsides

Technical implementation

Final thoughts

Audit log

Replies: 0 comments

Emyrk
Mar 21, 2024
Collaborator