ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

The Meaning of 'Public' and 'Opinion' when using Big Data to Study Public Opinion

Comparative Politics
Political Methodology
Social Media
Heinz Brandenburg
University of Strathclyde
Heinz Brandenburg
University of Strathclyde
Robert Johns
University of Essex
Maarja Lühiste
Newcastle University
Maria Laura Sudulich
University of Essex
Marcel Van Egmond
University of Amsterdam

Abstract

Recent years have seen an increasing proliferation of big data research that aims to track changes in the state of public opinion or predict election outcomes. Yet, it remains unclear what exactly is being measured and what inferences we can make. This paper aims to address this question by reviewing the extent to which and the means by which the existing literature has addressed the issues of representativeness of Twitter communities and the validity of opinion measures derived from sentiment analysis. While some studies aim no further than to gauge the dynamics of Twitter debates, many others seek to generalise to larger public opinion trends. But just because sample sizes are huge does not mean predictions will be precise or reliable. And the inference problem is quite complex, insofar as there are various elements involved in skewing the samples in social media debates - how representative are Twitter users of the general public, how asymmetric is social media use and how do participation routines vary. Furthermore, representativeness is not the only issue – we also assess attempts to validate opinion measures resulting from sentiment analysis against external, alternative measures. To what extent can sentiment analysis translate individual tweets into meaningful measures of expressed opinion, and beyond that, how sensitive are means of aggregating opinion from sentiment analysis to variation in terms of tone and amount of text within and across individuals? Following a comprehensive review of existing research, this paper also aims to derive a clearer understanding of how collective Twitter opinion relates to public opinion, and to suggest ways to design sampling and coding procedures as well as validation exercises to address measurement bias and error. We argue that without such strategies, the full potential of using big data as a tool to gauge public opinion remains unexploited.