The Meaning of Democracy: Using a Distributional Semantic Model for Collecting Co-occurrence Information from Online Data across Languages
Download Full Paper
International survey research on democracy has made significant efforts to map popular support for democracy across the world; be it as an ideal, political procedures, or a set of political outcomes. Yet how are we to know what democracy means to the people answering surveys, and thus be able to identify what they are expressing support for? While some scholars emphasize the procedural and institutional aspects that need to be present in a democracy, most theoretical definitions of democracy also include references to the ideals and values associated with democracy. The literature on public support for democracy has revealed significant cross-country differences in people’s attitudes towards democracy. The variance is partly due to differences between high “diffuse” support for the principles of democracy, which can be found in Western, consolidated democracies, and “specific” support for the performance of democracies, which is more prevalent in new democracies (see Easton 1975; Norris 1999; Linde & Ekman 2003; Dahlberg & Holmberg 2012).
Cross-cultural survey research rests upon the assumption that if survey features are kept constant to the maximum extent, data will remain comparable across languages, cultures and countries (Diamond 2010). Yet translating concepts across languages, cultures and political contexts is complicated by linguistic, cultural, normative or institutional discrepancies. Recognizing that language, culture and other social and political aspects affect survey results has been equated with “giving up on comparative research”, and consequently, the most commonly used “solution” to equivalence problems has been for researchers to simply ignore the issue of comparability across languages, cultures and countries (King et al 2004; Hoffmeyer-Zlotnik & Harkness 2005).
This paper contributes to the debate, using a distributional semantic model, which is a statistical technique for collecting co-occurrence information from large text data (Turney & Pantel, 2010). Distributional semantic models represent terms as vectors in high-dimensional context space, in which relative similarity between vectors indicate similarity of usage, which is often equated with semantic similarity. The method is motivated by a structuralist meaning theory known as the distributional hypothesis, which states that words with similar meanings tend to occur in similar contexts, and that the contexts shape and define the meanings of the words (Sahlgren 2006). According to the hypothesis, if we observe two words that constantly occur within the same contexts, we are justified in assuming that they mean similar things. Compared to other methodological approaches aimed at identifying and measuring cross-cultural discrepancies, this approach has the advantage of enabling us to analyze how concepts are used in their “natural habitat”.
Deriving our data from social media allows us to explore the varieties of understandings of democracy that exist among populations across different societies. We have used an online distributional semantic model - the Living Lexicon provided by Gavagai - that continuously learns from online data (both social and news media). The data we are using is based on a fairly comprehensive sample of the blogosphere i.e. blogs, forums and news on 30 languages.