ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Contribution (5): Assessing Strategies for Topic Modeling of Multilingual Text Collections in Communication Research

Conflict
Ethnic Conflict
Methods
Quantitative
Social Media
Communication
Comparative Perspective
Big Data
Daniel Maier
Freie Universität Berlin
Daniel Maier
Freie Universität Berlin
Christian Baden
Hebrew University of Jerusalem

;

Abstract

Debates about political conflicts often involve groups with different cultural and linguistic backgrounds. The analysis of debates in conflict areas such as the West Bank, thus, requires a methodology that is able to deal with the challenge of multiple languages. The goal of this paper is to evaluate two different methods for topic modeling of multilingual document collections: (1) machine translation (MT), and (2) a dictionary approach to code words to concepts prior to topic modeling (DC). We empirically assess the consequences (costs and benefits) of these approaches – involving qualitative validation and quantitative comparison – and highlight the potentials and weaknesses of each method. For our case study we use a data set of tweets in three languages (English, Hebrew, Arabic) focusing on the ongoing local conflicts between Israeli authorities, settlers and Bedouins in the West Bank. Comparing the two methods we find a large share of equivalent topics in both models unambiguously outlining the debate over the conflict and violent events. Beyond the commonalities, the DA model delivers a slightly more nuanced picture of the conflict-related topics, while the MT model obscures some nuances and offers substantively peripheral topics (such as weather and religious talk) instead. Our study is a first step towards instrument validation, indicating that both methods yield valid, exploitable results.