Robust Counterfactual Inferences using Feature Learning and their Applications
In a wide variety of applications, including personalization, we want to measure the difference in outcome due to an intervention and thus have to deal with counterfactual inference. The feedback from a customer in any of these situations is only 'bandit feedback' - that is, a partial feedback based on whether we chose to intervene or not. Typically randomized experiments are carried out to understand whether an intervention is overall better than no intervention. Here we present a feature learning algorithm to learn from a randomized experiment where the intervention in consideration is most effective and where it is least effective rather than only focusing on the overall impact, thus adding a context to our learning mechanism and extract more information. From the randomized experiment, we learn the feature representations which divide the population into subpopulations where we observe statistically significant difference in average customer feedback between those who were subjected to the intervention and those who were not, with a level of significance l, where l is a configurable parameter in our model. We use this information to derive the value of the intervention in consideration for each instance in the population. With experiments, we show that using this additional learning, in future interventions, the context for each instance could be leveraged to decide whether to intervene or not.
READ FULL TEXT