Package hnsw
implements Hierarchical Navigable Small World graphs in Go. You
can read up about how they work here. In essence,
they allow for fast approximate nearest neighbor searches with high-dimensional
vector data.
This package can be thought of as an in-memory alternative to your favorite vector database (e.g. Pinecone, Weaviate). It implements just the essential operations:
Operation | Complexity | Description |
---|---|---|
Insert | Insert a vector into the graph | |
Delete | Delete a vector from the graph | |
Search | Search for the nearest neighbors of a vector | |
Lookup | Retrieve a vector by ID |
Note
Complexities are approximate where
go get github.com/coder/hnsw@main
g := hnsw.NewGraph[hnsw.Vector]()
g.Add(
hnsw.MakeVector("1", []float32{1, 1, 1}),
hnsw.MakeVector("2", []float32{1, -1, 0.999}),
hnsw.MakeVector("3", []float32{1, 0, -0.5}),
)
neighbors := g.Search(
[]float32{0.5, 0.5, 0.5},
1,
)
fmt.Printf("best friend: %v\n", neighbors[0].Embedding())
// Output: best friend: [1 1 1]
By and large the greatest effect you can have on the performance of the graph is reducing the dimensionality of your data. At 1536 dimensions (OpenAI default), 70% of the query process under default parameters is spent in the distance function.
If you're struggling with slowness / latency, consider:
- Reducing dimensionality
- Increasing
$M$
And, if you're struggling with excess memory usage, consider:
- Reducing
$M$ a.k.aGraph.M
(the maximum number of neighbors each node can have) - Reducing
$m_L$ a.k.aGraph.Ml
(the level generation parameter)
While all graph operations are in-memory, hnsw
provides facilities for loading/saving from persistent storage.
For an io.Reader
/io.Writer
interface use Graph.Export
and Graph.Import
.
If you're storing within a filesystem, you can use the more convenient SavedGraph
instead:
path := "some.graph"
g1, err := LoadSavedGraph[hnsw.Vector](path)
if err != nil {
panic(err)
}
// Insert some vectors
for i := 0; i < 128; i++ {
g1.Add(MakeVector(strconv.Itoa(i), []float32{float32(i)}))
}
// Save to disk
err = g1.Save()
if err != nil {
panic(err)
}
// Later...
// g2 is a copy of g1
g2, err := LoadSavedGraph[Vector](path)
if err != nil {
panic(err)
}
See more: