Bayesian Hierarchical Bernoulli-Weibull Mixture Model for Extremely Rare Events
Estimating the duration of user behavior is a central concern for most internet companies. Survival analysis is a promising method for analyzing the expected duration of events and usually assumes the same survival function for all subjects and the event will occur in the long run. However, such assumptions are inappropriate when the users behave differently or some events never occur for some users, i.e., the conversion period on web services of the light users with no intention of behaving actively on the service. Especially, if the proportion of inactive users is high, this assumption can lead to undesirable results. To address these challenges, this paper proposes a mixture model that separately addresses active and inactive individuals with a latent variable. First, we define this specific problem setting and show the limitations of conventional survival analysis in addressing this problem. We demonstrate how naturally our Bernoulli-Weibull model can accommodate the challenge. The proposed model was extended further to a Bayesian hierarchical model to incorporate each subject's parameter, offering substantial improvements over conventional, non-hierarchical models in terms of WAIC and WBIC. Second, an experiment and extensive analysis were conducted using real-world data from the Japanese job search website, CareerTrek, offered by BizReach, Inc. In the analysis, some research questions are raised, such as the difference in activation rate and conversion rate between user categories, and how instantaneously the rate of event occurrence changes as time passes. Quantitative answers and interpretations are assigned to them. Furthermore, the model is inferred in a Bayesian manner, which enables us to represent the uncertainty with a credible interval of the parameters and predictive quantities.
READ FULL TEXT