mmLSH: A Practical and Efficient Technique for Processing Approximate Nearest Neighbor Queries on Multimedia Data
Many large multimedia applications require efficient processing of nearest neighbor queries. Often, multimedia data are represented as a collection of important high-dimensional feature vectors. Locality Sensitive Hashing (LSH) is a very popular approximate technique for finding nearest neighbors in high-dimensional spaces. In order to find top-k similar multimedia objects, existing LSH techniques require users to find top-k similar feature vectors for each of the feature vectors that represent the query object. This leads to wasted and redundant work due to two main reasons: 1) not all feature vectors may contribute equally in finding the top-k similar multimedia objects, and 2) feature vectors are treated independently during query processing. Additionally, there is no theoretical guarantee on the returned multimedia results. In this work, we propose a practical and efficient indexing approach for finding top-k approximate nearest neighbors for multimedia data using LSH, called mmLSH. In mmLSH, we present novel strategies to find nearest neighbor objects for a given multimedia object query. We also provide theoretical guarantees on the returned multimedia results. Additionally, we present a buffer-conscious strategy to speedup the query processing. Experimental evaluation shows significant gains in performance time and accuracy for different real multimedia datasets when compared against state-of-the-art LSH techniques.
READ FULL TEXT