Scalable logistic regression with crossed random effects

by   Swarnadip Ghosh, et al.

The cost of both generalized least squares (GLS) and Gibbs sampling in a crossed random effects model can easily grow faster than N^3/2 for N observations. Ghosh et al. (2020) develop a backfitting algorithm that reduces the cost to O(N). Here we extend that method to a generalized linear mixed model for logistic regression. We use backfitting within an iteratively reweighted penalized least square algorithm. The specific approach is a version of penalized quasi-likelihood due to Schall (1991). A straightforward version of Schall's algorithm would also cost more than N^3/2 because it requires the trace of the inverse of a large matrix. We approximate that quantity at cost O(N) and prove that this substitution makes an asymptotically negligible difference. Our backfitting algorithm also collapses the fixed effect with one random effect at a time in a way that is analogous to the collapsed Gibbs sampler of Papaspiliopoulos et al. (2020). We use a symmetric operator that facilitates efficient covariance computation. We illustrate our method on a real dataset from Stitch Fix. By properly accounting for crossed random effects we show that a naive logistic regression could underestimate sampling variances by several hundred fold.


page 1

page 2

page 3

page 4


Backfitting for large scale crossed random effects regressions

Regression models with crossed random effect error models can be very ex...

Scalable solution to crossed random effects model with random slopes

The crossed random-effects model is widely used in applied statistics, f...

Efficient Methods for Online Multiclass Logistic Regression

Multiclass logistic regression is a fundamental task in machine learning...

PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification

We implement extensions of the partial least squares generalized linear ...

Geometric ergodicity of Polya-Gamma Gibbs sampler for Bayesian logistic regression with a flat prior

Logistic regression model is the most popular model for analyzing binary...

Unifying Width-Reduced Methods for Quasi-Self-Concordant Optimization

We provide several algorithms for constrained optimization of a large cl...

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

In this paper we purpose a blockwise descent algorithm for group-penaliz...

Please sign up or login with your details

Forgot password? Click here to reset