Distributed Maximization of "Submodular plus Diversity" Functions for Multi-label Feature Selection on Huge Datasets

03/20/2019
by   Mehrdad Ghadiri, et al.
0

There are many problems in machine learning and data mining which are equivalent to selecting a non-redundant, high "quality" set of objects. Recommender systems, feature selection, and data summarization are among many applications of this. In this paper, we consider this problem as an optimization problem that seeks to maximize the sum of a sum-sum diversity function and a non-negative monotone submodular function. The diversity function addresses the redundancy, and the submodular function controls the predictive quality. We consider the problem in big data settings (in other words, distributed and streaming settings) where the data cannot be stored on a single machine or the process time is too high for a single machine. We show that a greedy algorithm achieves a constant factor approximation of the optimal solution in these settings. Moreover, we formulate the multi-label feature selection problem as such an optimization problem. This formulation combined with our algorithm leads to the first distributed multi-label feature selection method. We compare the performance of this method with centralized multi-label feature selection methods in the literature, and we show that its performance is comparable or in some cases is even better than current centralized multi-label feature selection methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

Nonmonontone submodular maximization under routing constraints

In machine learning and big data, the optimization objectives based on s...
research
06/05/2019

Greed is Not Always Good: On Submodular Maximization over Independence Systems

In this work, we consider the maximization of submodular functions const...
research
11/07/2016

Reinforcement Learning Approach for Parallelization in Filters Aggregation Based Feature Selection Algorithms

One of the classical problems in machine learning and data mining is fea...
research
11/14/2018

Submodular Optimization Over Streams with Inhomogeneous Decays

Cardinality constrained submodular function maximization, which aims to ...
research
05/15/2023

Influential Billboard Slot Selection using Spatial Clustering and Pruned Submodularity Graph

Billboard advertising is a popular out-of-home advertising technique ado...
research
10/04/2022

Concise and interpretable multi-label rule sets

Multi-label classification is becoming increasingly ubiquitous, but not ...
research
07/14/2022

Influential Billboard Slot Selection using Pruned Submodularity Graph

Billboard Advertisement has emerged as an effective out-of-home advertis...

Please sign up or login with your details

Forgot password? Click here to reset