╔═══════════════════════════════════════╗
║ _ _ ____ ║
║ | \ | | | __ ) __ _ ___ ___ ║
║ | \| |_____| |_ \ / _` / __|/ _ \ ║
║ | |\ |_____| |_) | (_| \__ \ __/ ║
║ |_| \_| |____/ \__,_|___/\___| ║
║ ║
╚═══════════════════════════════════════╝
🚀 NBase is a high-performance vector database for efficient similarity search, designed for machine learning embeddings and neural search applications.
- 📦 Enterprise-Grade Storage
- Store and manage millions of high-dimensional vectors
- Built for production workloads
- Automatic backup and recovery
- 🎯 State-of-the-Art Algorithms
- 🕸️ HNSW (Hierarchical Navigable Small World)
- Logarithmic search time complexity
- Optimized graph structure
- 🎲 LSH (Locality-Sensitive Hashing)
- Ultra-fast similarity search
- Configurable hash functions
- 📊 Smart Partitioning
- Distributed search capabilities
- Automatic load balancing
- 🕸️ HNSW (Hierarchical Navigable Small World)
- 📐 Flexible Dimensionality
- Support for any vector dimension
- Dynamic dimension handling
- 🗜️ Intelligent Compression
- Advanced vector compression
- Minimal quality loss
npm i @n2flowjs/nbase
Clone the repository and install dependencies:
git clone https://github.com/N2FlowJS/nbase.git
cd nbase
npm install
Build the project:
npm run build
Run tests:
npm test
const { Database } = require('@n2flowjs/nbase');
// Initialize the database
const db = new Database({
vectorSize: 1536, // OpenAI's text-embedding-ada-002 size
indexing: {
buildOnStart: true
}
});
// Add vectors
await db.addVector('doc1', [0.1, 0.2, ...], { title: 'Document 1' });
await db.addVector('doc2', [0.3, 0.4, ...], { title: 'Document 2' });
// Search for similar vectors
const results = await db.search([0.15, 0.25, ...], {
k: 5,
includeMetadata: true,
useHNSW: true
});
console.log(results);
// [
// { id: 'doc1', dist: 0.12, metadata: { title: 'Document 1' } },
// { id: 'doc2', dist: 0.45, metadata: { title: 'Document 2' } },
// ...
// ]
The main interface for interacting with NBase.
const db = new Database(options);
vectorSize
: Default size of vectors (default: 1536)clustering
: Options for vector clusteringpartitioning
: Options for database partitioningindexing
: Options for index creation (HNSW, LSH)persistence
: Options for saving/loading the databasemonitoring
: Options for performance monitoring
addVector(id, vector, metadata?)
: Add a vector to the databasebulkAdd(vectors)
: Add multiple vectors in one operationfindNearest(query, k, options)
: Find k nearest neighborssearch(query, options)
: Alias for findNearestdeleteVector(id)
: Delete a vectorgetVector(id)
: Retrieve a vectorgetMetadata(id)
: Retrieve metadata for a vectorupdateMetadata(id, data)
: Update metadata for a vectorextractRelationships(threshold, options)
: Find relationships between vectors within partitionsbuildIndexes()
: Build search indexessave()
: Save the database to diskclose()
: Close the database and release resources
const results = await db.search(queryVector, {
k: 10, // Number of results to return
filter: (id) => true, // Function to filter results
includeMetadata: true, // Include metadata in results
distanceMetric: 'cosine', // Distance metric to use
useHNSW: true, // Use HNSW index for search
rerank: false, // Rerank results for diversity
rerankingMethod: 'diversity', // Method for reranking
partitionIds: ['p1', 'p2'], // Specific partitions to search
efSearch: 100, // HNSW search parameter
});
For best performance:
- Choose the right index: HNSW provides the best search performance for most use cases
- Adjust efSearch: Higher values improve recall at the cost of speed
- Use partitioning: For large datasets, enable partitioning to reduce memory usage
- Filter wisely: Complex filters may slow down search
- Dimension reduction: Consider reducing vector dimensions if possible
NBase includes a built-in HTTP server:
const { Server } = require('@n2flowjs/nbase');
const server = new Server({ port: 1307 });
server.start();
POST /vectors
: Add a vectorGET /vectors/:id
: Get a vectorDELETE /vectors/:id
: Delete a vectorPOST /search
: Search for similar vectorsGET /health
: Check server healthPOST /search/metadata
: Search with metadata filteringPOST /search/relationships
: Extract relationships between vectorsPOST /search/communities
: Finds communities (clusters) of vectors based on a distance threshold across loaded partitions.
For more advanced usage examples, check the examples directory in the repository.
Comprehensive benchmark results and analysis can be found in the Benchmarks Guide.
Key performance highlights:
- HNSW Search: Up to 5.86x faster than standard search
- Bulk Operations: 320x more efficient than single operations
- Scale tested: 50,000+ vectors across multiple partitions
Benchmark scenarios:
Operation Type | Time (ms) | Speedup Factor |
---|---|---|
Standard Search | 37.01 | 1.00x |
HNSW Search | 39.12 | 0.95x |
HNSW Search (After Reload) | 4.24 | 8.73x |
Contributions are welcome! Please feel free to submit a Pull Request or open an issue.
This project is licensed under the MIT License - see the LICENSE file for details.