BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature Selection in Sublinear Memory

10/26/2020
by   Amirali Aghazadeh, et al.
6

We consider feature selection for applications in machine learning where the dimensionality of the data is so large that it exceeds the working memory of the (local) computing machine. Unfortunately, current large-scale sketching algorithms show poor memory-accuracy trade-off due to the irreversible collision and accumulation of the stochastic gradient noise in the sketched domain. Here, we develop a second-order ultra-high dimensional feature selection algorithm, called BEAR, which avoids the extra collisions by storing the second-order gradients in the celebrated Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm in Count Sketch, a sublinear memory data structure from the streaming literature. Experiments on real-world data sets demonstrate that BEAR requires up to three orders of magnitude less memory space to achieve the same classification accuracy compared to the first-order sketching algorithms. Theoretical analysis proves convergence of BEAR with rate O(1/t) in t iterations of the sketched algorithm. Our algorithm reveals an unexplored advantage of second-order optimization for memory-constrained sketching of models trained on ultra-high dimensional data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2014

Large-scale Online Feature Selection for Ultra-high Dimensional Sparse Data

Feature selection with large-scale high-dimensional data is important ye...
research
06/12/2018

MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

Feature selection is an important challenge in machine learning. It play...
research
06/06/2017

Embedding Feature Selection for Large-scale Hierarchical Classification

Large-scale Hierarchical Classification (HC) involves datasets consistin...
research
09/23/2016

Efficient Feature Selection With Large and High-dimensional Data

Driven by the advances in technology, large and high-dimensional data ha...
research
03/26/2023

FAStEN: an efficient adaptive method for feature selection and estimation in high-dimensional functional regressions

Functional regression analysis is an established tool for many contempor...
research
10/08/2013

Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

The amount of collected data in many scientific fields is increasing, al...

Please sign up or login with your details

Forgot password? Click here to reset