Over the past decades, corruption has attracted attention in the highest policy circles, considered a widespread problem, in developing and developed countries alike, against which democratization cannot be regarded a safe remedy. By now, a substantial body of literature has demonstrated the negative effects of corruption on economic growth and social progress. Even as the scholarship in making increasing efforts to disentangle the causes and consequences of corruption as well as the mechanisms of various forms of corruption, the abuse of public power and resources for private gains remains widespread, also in established and developed democracies. This has sparked an ongoing debate about the theoretical underpinnings of the concept of corruption, and, importantly, whether there actually exists a universal understanding of corruption amongst citizens across the world.
Due to its shadowy nature, corruption is a notoriously difficult concept to measure and scholars have mostly relied on opinion data of experts or citizens to assess levels of corruption across countries and regions. From such data, we know that most people tend to disapprove of corruption and are fully aware of the negative impact corruption exerts on socio-economic development. Yet, individuals still engage in corrupt practices or support corrupt leaders and regimes. The survey approach has its advantages in that it can be used to target different forms of corruption, and enables systematic studies of corruption perceptions across populations. Although surveys most often promise anonymity for respondents the actual survey setting is an act of observation. Survey questions aiming at capturing the existing forms of corruption may thus induce large social desirability biases, especially so in less democratic societies. The ways in which the concept of corruption is used and understood outside the realm of surveys remains to be more thoroughly and systematically explored.
Leveraging on recent advances in distributional semantics, a field within natural language processing, this paper attempts to fill this gap in the corruption literature by mapping the meaning and usage of the word corruption around the world. Using a distributional semantic lexicon that contains a large amount of geo-coded languages, we explore how corruption is used in online editorial and social data across a substantial number of countries. The novel study design allows us to disentangle, more specifically, what Internet users talk about when they talk about corruption. Controlling for a variety of institutional, cultural and linguistic factors, our preliminary findings further indicate that the regime type appears to determine the corruption discourse in online media. While users in general tend to associate corruption with something that largely resonates with a universal understanding of corruption, the discourse in less democratic regimes is framed so as to consolidate the regime rather than acknowledging the problem of corruption as such.