Vector Quantization Techniques for Approximate Nearest Neighbor Search on Large-Scale Datasets

Author:

Ozan, Ezgi Can [claim]

Description:

The technological developments of the last twenty years are leading the world to a new era. The invention of the internet, mobile phones and smart devices are resulting in an exponential increase in data. As the data is growing every day, finding similar patterns or matching samples to a query is no longer a simple task because of its computational costs and storage limitations. Special signal processing techniques are required in order to handle the growth in data, as simply adding more and more computers cannot keep up. Nearest neighbor search, or similarity search, proximity search or near item search is the problem of finding an item that is nearest or most similar to a query according to a distance or similarity measure. When the reference set is very large, or the distance or similarity calculation is complex, performing the nearest neighbor search can be computationally demanding. Considering today’s ever-growing datasets, where the cardinality of samples also keep increasing, a growing interest towards approximate methods has emerged in the research community. Vector Quantization for Approximate Nearest Neighbor Search (VQ for ANN) has proven to be one of the most efficient and successful methods targeting the aforementioned problem. It proposes to compress vectors into binary strings and approximate the distances between vectors using look-up tables. With this approach, the approximation of distances is very fast, while the storage space requirement of the dataset is minimized thanks to the extreme compression levels. The distance approximation performance of VQ for ANN has been shown to be sufficiently well for retrieval and classification tasks demonstrating that VQ for ANN techniques can be a good replacement for exact distance calculation methods. This thesis contributes to VQ for ANN literature by proposing five advanced techniques, which aim to provide fast and efficient approximate nearest neighbor search on very large-scale datasets. The proposed methods can be divided into two groups. The first ...

Publisher:

Tampere University of Technology

Contributors:

Signaalinkäsittely - Signal Processing ; Teknis-taloudellinen tiedekunta - Faculty of Business and Technology Management

Year of Publication:

2017

Document Type:

fi=Artikkeliväitöskirja | en=Doctoral dissertation (article-based) | ; doctoralThesis ; [Doctoral and postdoctoral thesis]

Language:

Subjects:

Tietojenkäsittely ja informaatiotieteet - Computer and information sciences

DDC: