ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks

11/22/2017
by   Qiang Qiu, et al.
0

Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests, with near-optimal information-theoretic code aggregation among trees. We start with a simple hashing scheme, where random trees in a forest act as hashing functions by setting `1' for the visited tree leaf, and `0' for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees, rendering the random forests approach to hashing challenging. To address this, we propose to first randomly group arriving classes at each tree split node into two groups, obtaining a significantly simplified two-class classification problem, which can be handled using a light-weight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A non-conventional low-rank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intra-class variations and maximizing inter-class distance for the two random class groups. Finally, we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. The proposed approach significantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets, while performing at the level of other state-of-the-art image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs, instead of simply going deeper.

READ FULL TEXT

page 5

page 8

research
12/16/2014

Random Forests Can Hash

Hash codes are a very efficient data representation needed to be able to...
research
07/27/2020

Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings

Image hash codes are produced by binarizing the embeddings of convolutio...
research
05/21/2019

PDH : Probabilistic deep hashing based on MAP estimation of Hamming distance

With the growth of image on the web, research on hashing which enables h...
research
04/07/2017

Supervised Deep Hashing for Hierarchical Labeled Data

Recently, hashing methods have been widely used in large-scale image ret...
research
02/06/2020

Random VLAD based Deep Hashing for Efficient Image Retrieval

Image hash algorithms generate compact binary representations that can b...
research
03/16/2017

Learning Robust Hash Codes for Multiple Instance Image Retrieval

In this paper, for the first time, we introduce a multiple instance (MI)...
research
12/19/2013

Learning Transformations for Classification Forests

This work introduces a transformation-based learner model for classifica...

Please sign up or login with your details

Forgot password? Click here to reset