Skip to content
Discussion options

You must be logged in to vote

Below is quite a nice hack, which creates a mapping from strings to double-precision float vectors and back. Since double-precision float vectors can represent integers of up to 2^53 strings, this will also be the maximum size of a dataset. However, this is not a serious limitation for most datasets, since 2^53 one-character strings correspond to ca 10PiB of data.

$ pip install sklearn python-Levenshtein
$ python
>>> from typing import Optional
>>> import Levenshtein
>>> import numpy as np
>>> from sklearn.neighbors import BallTree, DistanceMetric
>>> 
>>> texts = ["Some string", "Some other string", "Yet another string"]
>>> query = "Smeo srting"
>>> knn = 2
>>> 
>>> def text_to_vector(text

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@Witiko
Comment options

@jjerphan
Comment options

@Witiko
Comment options

@jjerphan
Comment options

@jjerphan
Comment options

Answer selected by jjerphan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants