Choosing the number of factors in factor analysis with incomplete data via a hierarchical Bayesian information criterion

by   Jianhua Zhao, et al.

The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size N, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the `complete' sample size N is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only N_i<N observations for variable i, which means that using the `complete' sample size N implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel criterion called hierarchical BIC (HBIC) for factor analysis with incomplete data is proposed. The novelty is that it only uses the actual amounts of observed information, namely N_i's, in the penalty term. Theoretically, it is shown that HBIC is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBIC, which means that HBIC shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBIC, BIC, and related criteria with various missing rates. The results show that HBIC and BIC perform similarly when the missing rate is small, but HBIC is more accurate when the missing rate is not small.


page 1

page 2

page 3

page 4


Determining the Number of Factors in High-dimensional Generalised Latent Factor Models

As a generalisation of the classical linear factor model, generalised la...

A hierarchical Bayesian model to find brain-behaviour associations in incomplete data sets

Canonical Correlation Analysis (CCA) and its regularised versions have b...

Data Consistency Approach to Model Validation

In scientific inference problems, the underlying statistical modeling as...

The Effective Sample Size in Bayesian Information Criterion for Level-Specific Fixed and Random Effects Selection in a Two-Level Nested Model

Popular statistical software provides Bayesian information criterion (BI...

Predictive Criteria for Prior Selection Using Shrinkage in Linear Models

Choosing a shrinkage method can be done by selecting a penalty from a li...

Forest Learning from Data and its Universal Coding

This paper considers structure learning from data with n samples of p va...

What is really needed to justify ignoring the response mechanism for modelling purposes?

With incomplete data, the standard argument for when the response mechan...

Please sign up or login with your details

Forgot password? Click here to reset