Addressing patient heterogeneity in disease predictive model development
This paper addresses patient heterogeneity associated with prediction problems in biomedical applications. We propose a systematic hypothesis testing approach to determine the existence of patient subgroup structure and the number of subgroups in patient population if subgroups exist. A mixture of generalized linear models is considered to model the relationship between the disease outcome and patient characteristics and clinical factors, including targeted biomarker profiles. We construct a test statistic based on expectation maximization (EM) algorithm and derive its asymptotic distribution under the null hypothesis. An important computational advantage of the test is that the involved parameter estimates under the complex alternative hypothesis can be obtained through a small number of EM iterations, rather than optimizing the objective function. We demonstrate the finite sample performance of the proposed test in terms of type-I error rate and power, using extensive simulation studies. The applicability of the proposed method is illustrated through an application to a multi-center prostate cancer study.
READ FULL TEXT