Survey Experiments as Supervised Learning
Survey experiments have become an indispensable tool for causal inference in political science thanks to their ability to elicit otherwise invisible factors that drive critical political outcomes, including perceptions, beliefs, and preferences. Crucially, surveys allow researchers to gather data and empower them to influence the process that will generate this data. However, multiple challenges complicate the design of survey experiments. First, randomization allows researchers to isolate treatment effects as a whole, but it can be challenging to identify which instrument feature caused the observed effect. Second, survey experiments become quickly intractable once the number of dimensions increases. Finally, researchers must also wrestle with practical considerations that restrict their design space, including their overall budget and desired statistical power, as well as the risk of participant disengagement or attrition. Traditionally, researchers formulate a hypothesis and design an experiment to test it. Implicitly, this process is parametric: it makes ex-ante assumptions about the functional form of the data-generating process and its parameters. Unfortunately, if these assumptions do not represent the true data-generating process, this approach risks introducing biases in the data collection and analysis. To address these issues, I propose a two-stage experimental paradigm that decouples data gathering from inference. In the first stage, “sketching,” a non-parametric active learning method is used to approximate the internal data-generating process of participants. In the second stage, “modeling,” this approximation is then used for model development and hypothesis testing. Compared to existing approaches, this approach does not require a model to be correctly specified ex-ante, is more efficient—it yields more informative data for a lower cost (in terms of observations)—, and provides a richer dataset for model development ex-post.