Low-rank Interaction with Sparse Additive Effects Model for Large Data Frames

12/20/2018
by   Geneviève Robin, et al.
0

Many applications of machine learning involve the analysis of large data frames-matrices collecting heterogeneous measurements (binary, numerical, counts, etc.) across samples-with missing values. Low-rank models, as studied by Udell et al. [30], are popular in this framework for tasks such as visualization, clustering and missing value imputation. Yet, available methods with statistical guarantees and efficient optimization do not allow explicit modeling of main additive effects such as row and column, or covariate effects. In this paper, we introduce a low-rank interaction and sparse additive effects (LORIS) model which combines matrix regression on a dictionary and low-rank design, to estimate main effects and interactions simultaneously. We provide statistical guarantees in the form of upper bounds on the estimation error of both components. Then, we introduce a mixed coordinate gradient descent (MCGD) method which provably converges sub-linearly to an optimal solution and is computationally efficient for large scale data sets. We show on simulated and survey data that the method has a clear advantage over current practices, which consist in dealing separately with additive effects in a preprocessing step.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2018

Main effects and interactions in mixed and incomplete data frames

A mixed data frame (MDF) is a table collecting categorical, numerical an...
research
09/06/2017

The low-rank hurdle model

A composite loss framework is proposed for low-rank modeling of data con...
research
02/20/2018

Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach

We study the problem of recovery of matrices that are simultaneously low...
research
03/09/2023

Fitting Low-rank Models on Egocentrically Sampled Partial Networks

The statistical modeling of random networks has been widely used to unco...
research
12/29/2018

Imputation and low-rank estimation with Missing Non At Random data

Missing values challenge data analysis because many supervised and unsu-...
research
02/06/2023

Network Autoregression for Incomplete Matrix-Valued Time Series

We study the dynamics of matrix-valued time series with observed network...
research
05/18/2017

Generalized linear models with low rank effects for network data

Networks are a useful representation for data on connections between uni...

Please sign up or login with your details

Forgot password? Click here to reset