Overcoming the inconsistences of the variance inflation factor: a redefined VIF and a test to detect statistical troubling multicollinearity

by   Román Salmerón, et al.

Multicollinearity is relevant to many different fields where linear regression models are applied, and its existence may affect the analysis of ordinary least squares (OLS) estimators from both the numerical and statistical points of views. Thus, multicollinearity can lead to incoherence in the statistical significance of the independent variables and the global significance of the model. The variance inflation factor (VIF) is traditionally applied to diagnose the possible existence of multicollinearity, but it is not always the case that detection by VIF of a troubling degree of multicollinearity corresponds to negative effects on the statistical analysis. The reason for the lack of specificity of VIF is that there are other factors, such as the size of the sample and the variance of the random disturbance, that can lead to high values of the VIF but not to problematic variance in the OLS estimators (see O'Brien 2007). This paper presents a new variance inflation factor (TVIF) that consider all these additional factors. Thresholds for this new measure and from the index provided by Stewart (1987) are also provided. These thresholds are reinterpreted and presented as a new statistical test to diagnose the existence of statistical troubling multicollinearity. The contributions of this paper are illustrated with two real data examples previously applied in the scientific literature.


page 1

page 2

page 3

page 4


Model-free Study of Ordinary Least Squares Linear Regression

Ordinary least squares (OLS) linear regression is one of the most basic ...

Adaptative significance levels in linear regression models with known variance

The Full Bayesian Significance Test (FBST) for precise hypotheses was pr...

On estimating the structure factor of a point process, with applications to hyperuniformity

Hyperuniformity is the study of stationary point processes with a sub-Po...

Estimating variances in time series linear regression models using empirical BLUPs and convex optimization

We propose a two-stage estimation method of variance components in time ...

Integrative Factor Regression and Its Inference for Multimodal Data Analysis

Multimodal data, where different types of data are collected from the sa...

Orthogonal Subsampling for Big Data Linear Regression

The dramatic growth of big datasets presents a new challenge to data sto...

The look-elsewhere effect from a unified Bayesian and frequentist perspective

When searching over a large parameter space for anomalies such as events...

Please sign up or login with your details

Forgot password? Click here to reset