ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Inferring individual electoral behaviour from aggregate data using machine learning approaches

Elections
Methods
Quantitative
Electoral Behaviour
Voting Behaviour
Big Data
Jose M Pavia
University of Valencia
Jose M Pavia
University of Valencia

Abstract

Surveys and polls are powerful tools that allow parties knowing opinions, attitudes, and behaviour of electorates. They are basic for designing campaign strategies able to attract the support from the maximum number of voters and deciding where to place the candidate position in some relevant issues. In historical elections and at the local level, however, they are more scarce or simply unavailable. Thus, other methodologies must be employed in order to know how voters behaved or to discern the preferences of different subgroups of the electorate. Ecological inference methodologies could be used to shed light on this problem by combining official statistics and election results. That is, to discover how different subgroups of population (who are grouped according to some variable, such as social class, age, race, religion, gender or electoral behaviour) vote. Within the ecological inference literature, this problem is usually stated as a two-way contingency table where the goal is to infer the unknown inner-cell values from the known margins. The estimation of the inner cells of a set of RxC tables when only the row and column sums are known defines one of the most complex problems in the social sciences. In recent years we have experienced an explosion of methods to solve these problems from Bayesian statistics, practically all of them based on a hierarchical multinomial-Dirichlet Bayesian model (Rosen et al., 2001). The use of this methodology (Olivia et al., 2020), however, requires highly trained analysts and usually entails high computational costs (Romero and Pavía, 2021; Pavía and Romero, 2023). A new methodology has recently appeared, based on mathematical programming, which significantly simplifies the resolution of these problems (Pavía and Romero, 2022a), to the point of turning them into something almost mechanical. Many of the new algorithms are available in the lphom package (Pavía and Romero, 2022b) of the R statistical software. The aim of this presentation is to expose the potential of this new approach, showing its use through various examples and going deeper into in the precision improvements that advance the new extensions we are developiong based on the use of machine learning algorithms: bagging, boosting and reinforcement learning.