ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Classifying the Many with Little: A Three-Step Approach Using LLMs and NLI for Multi-Label classification

Parliaments
Political Methodology
Methods
Quantitative
Political Ideology
Nelson Santos
University of Namur
Nelson Santos
University of Namur

To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.


Abstract

Transformer-based models have significantly impacted empirical research in the social sciences. Analyses that once were limited to be applied on limited samples or in need of substantial financial resources can now be conducted on vast textual corpora by leveraging these models. Nonetheless, two key limitations persist. First, the most performant models are not open-source and entail escalating financial costs, even when applied to medium-sized samples. Second, achieving high performance with smaller, open-source models typically demands extensive manually annotated data—particularly in complex tasks such as multi-label classification involving several dozens of categories. To address these challenges, this paper introduces a novel three-step approach. First, we manually code a small subset of the data, ensuring high-quality annotations at a manageable cost. Second, we leverage a state-of-the-art large language model (LLM) to code an additional sample. This process is adaptable and can be evaluated against the gold-standard benchmark established in the first step. Finally, we use the resulting annotated data to fine-tune a smaller, open-source transformer model within a natural language inference (NLI) framework. This strategy substantially reduces the amount of training data required while preserving strong performance, particularly when compared to conventional multi-label classification methods. We demonstrate this approach through the identification of group-based appeals in parliamentary speeches in both English and German. Our results show that this method achieves comparable performance with substantially less annotated data and financial investment, offering a scalable and cost-effective solution for computational research in multilingual contexts.