Polling is ubiquitous in US elections, as well as in countries around the world, and for many voters they may seem more noise than information. However, polls serve important functions beyond predicting likely winners; they also establish support rankings during the election, for example, which can have important consequences. In the United States, presidential candidates are invited to speak at nationally broadcast primary debates based on their performance in various polls. Given the importance of these debates in informing voters and in influencing the trajectory of campaigns, the accuracy of polls is paramount. Currently, the rankings for US presidential primary debates are computed using only estimates of the underlying share of a candidate’s support. As a result, there may be considerable uncertainty concerning the true rank.
Practical examples like this motivate the deep statistical and mathematical analysis in this important new paper. In the above example, data on choices, including polls of political attitudes, commonly feature limited sample sizes and/or categories whose true share of support is small. For reasons explained in detail within the paper, these features pose challenges to inference methods justified using large-sample arguments. In contrast, this paper considers the problem of constructing confidence sets for the rank of each category that are valid in finite samples, even when some categories are chosen with probability close to zero.
Very broadly, the authors consider two types of confidence sets (or ranges of values that contain the true value of a given parameter with a specified probability) for the rank of a particular population. One confidence set provides a way of accounting for uncertainty when answering questions pertaining to the rank of a particular category (marginal confidence sets), and the second provides a way of accounting for uncertainty when answering questions pertaining to the ranks of all categories (simultaneous confidence sets). As a further contribution, the authors also develop bootstrap methods to construct such confidence sets.
What does this mean in practice? The authors applied their inference procedures to re-examine the ranking of political parties in Australia using data from the 2019 Australian Election Survey. The authors find that the finite-sample (marginal and simultaneous) confidence sets are remarkably informative across the entire ranking of political parties, even in Australian territories with few survey respondents and/or with parties that are chosen by only a small share of the survey respondents.
To illustrate this point, the authors show that at conventional significance levels, the finite-sample marginal confidence set for the rank of the Green Party contains only rank 4. In contrast, the bootstrap-based marginal confidence sets contain the ranks 3 to 7, thus exhibiting significantly more uncertainty about the true rank of the Green Party.
While details of the authors’ work will certainly engage statistically and mathematically inclined researchers, general readers should also take note of this work. Better polling techniques matter.