Model Selection as a Multiple Testing Procedure: Improving Akaike's Information Criterion

03/06/2018
by   Adrien Saumard, et al.
0

By interpreting the model selection problem as a multiple hypothesis testing task, we propose a modification of Akaike's Information Criterion that avoids overfitting, even when the sample size is small. We call this correction an over-penalization procedure. As a proof of concept, we show nonasymptotic optimality of our procedure for histogram selection in density estimation, by establishing sharp oracle inequalities for the Kullback-Leibler divergence. A strong feature of our theoretical results is that they include the estimation of unbounded log-densities. To do so, we prove several analytical and probabilistic lemmas that are of independent interest. In an experimental study, we also demonstrate state-of-the-art performance of our over-penalization procedure for bin size selection.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset