PlanAlyzer: Assessing Threats to the Validity of Online Experiments

by   Emma Tosch, et al.

Online experiments are ubiquitous. As the scale of experiments has grown, so has the complexity of their design and implementation. In response, firms have developed software frameworks for designing and deploying online experiments. Ensuring that experiments in these frameworks are correctly designed and that their results are trustworthy—referred to as *internal validity*—can be difficult. Currently, verifying internal validity requires manual inspection by someone with substantial expertise in experimental design. We present the first approach for statically checking the internal validity of online experiments. Our checks are based on well-known problems that arise in experimental design and causal inference. Our analyses target PlanOut, a widely deployed, open-source experimentation framework that uses a domain-specific language to specify and run complex experiments. We have built a tool, PlanAlyzer, that checks PlanOut programs for a variety of threats to internal validity, including failures of randomization, treatment assignment, and causal sufficiency. PlanAlyzer uses its analyses to automatically generate *contrasts*, a key type of information required to perform valid statistical analyses over experimental results. We demonstrate PlanAlyzer's utility on a corpus of PlanOut scripts deployed in production at Facebook, and we evaluate its ability to identify threats to validity on a mutated subset of this corpus. PlanAlyzer has both precision and recall of 92 of the contrasts it automatically generates match hand-specified data.


page 1

page 2

page 3

page 4


On the probability of a causal inference is robust for internal validity

The internal validity of observational study is often subject to debate....

A Review of Generalizability and Transportability

When assessing causal effects, determining the target population to whic...

A Calibration Approach to Transportability with Observational Data

An important consideration in clinical research studies is proper evalua...

Trustworthy Online Marketplace Experimentation with Budget-split Design

Online experimentation, also known as A/B testing, is the gold standard ...

On the Use of Causal Graphical Models for Designing Experiments in the Automotive Domain

Randomized field experiments are the gold standard for evaluating the im...

AudExpCreator: A GUI-based Matlab tool for designing and creating auditory experiments with the Psychophysics Toolbox

We present AudExpCreator, a GUI-based Matlab tool for designing and crea...

Two Formal Systems of the λδ Family Revised

We present the framework λδ-2B that significantly improves and generaliz...

Please sign up or login with your details

Forgot password? Click here to reset