ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ClaimR - Comparison of Classifier Performance for the identification of representative claims in large text corpora

Political Methodology
Representation
Quantitative
Darius Ribbe
University Greifswald
Darius Ribbe
University Greifswald

Abstract

The study of representative claims in political speeches plays an important role in understanding political representation. However, the identification of these claims has been predominantly confined to case studies or small-n analyses, primarily due to the challenges associated with processing vast text corpora, as well as time and funding limitations. In response, I introduce a workflow and an R-package designed to facilitate the identification process of representative claims within large text corpora in this research note. I evaluate and compare the performances of various classifiers in the identification of representative claims. Therefore, I harness the capabilities of Support Vector Machines (SVM), Naive Bayes, GPT-3, BERT, and XLNet, employing them as tools to sift through extensive textual data. The research note first outlines the workflow implemented for efficient claim identification, providing transparency and replicability for future studies. Subsequently, I introduce the R-package to increase accessibility for researchers engaged in large-scale analyses of political discourse. To assess the efficacy of the classifiers, I conduct a comparative analysis, evaluating their respective performances in identifying claims to represent women in terms of precision, recall, and overall accuracy. I compare the classifiers across speech data of the European Commission and German subnational governments in English, German, and automated translations. The findings shed light on the strengths and limitations of each classifier, offering valuable insights into their applicability for identifying representative claims. Thereby, my results contribute to methodological advancements in representation research, enhance the toolkit available for the exploration of representative claims, and provide a practical guide for scholars seeking to employ the package for their large-scale text analysis.