ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

AI-Driven Text Analysis in the Political Economy of Sustainability: Hybrid Retrieval-Augmented Generation and LLM Multi-Agent Approach

Environmental Policy
Green Politics
Political Economy
Investment
Methods
Quantitative
Big Data
Bastián González-Bustamante
Leiden University
Natascha van der Zwan
Leiden University

To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.


Abstract

Political scientists have long been concerned with the institutional characteristics of national political economies. Early scholarship investigated the role of national institutions in economic development, while later political scientists studied how different institutional complementarities produced different economic and social outcomes. For instance, stock market-based political economies with smaller welfare states produce more short-term oriented finance than bank-based political economies with larger welfare states. A recent incarnation of this scholarly debate focuses on the institutional variegation that produces more or less sustainable finance in different national political economies. Unfortunately, these scholarly endeavours are considerably complicated by substantial data gaps: comparative data on sustainable finance across countries or sectors is simply unavailable. The current paper aims to develop a novel methodology, a breakthrough in the field, that will allow us to extract, summarise and classify such data through AI-driven text analysis. Taking advantage of the rapid, recent advancements in AI capabilities that have revolutionised the text-as-data approach and Natural Language Processing (NLP), this study offers a proof-of-concept on a sample drawn from the 1,000 most prominent pension funds and insurance companies in OECD countries. These entities are significant as they represent a substantial portion of the global financial market, and their sustainability practices can significantly impact the overall sustainability of the economy. We leverage the capabilities of several state-of-the-art Large Language Models (LLMs) such as OpenAI’s Generative Pre-Trained Transformers (GPTs), Anthropic Claude and open-source models (e.g., Llama 3.1, 3.2, Gemma 2, Mistral’s models, among others) to conduct a dual task of summarisation and classification. We propose a hybrid approach that involves three steps: 1) We pre-processed the documents by using state-of-the-art AI models for layout and table structure recognition (i.e., DocLayNey and TableFormer), converting the information into lightweight markup and splitting documents into smaller but relevant chunks for two purposes: (i) machine-translation of documents to English since around 15% of the reports are in a different language; and (ii) indexed chunks using a vector database. 2) We used a Retrieval-Augmented Generation (RAG) pipeline to summarise each chunk context-awarely. We privilege the use of Claude models because they can handle larger contexts and facilitate the processing of the documents. 3) We perform dynamic classification using multi-agent orchestration, incorporating recent GPTs and open-source models in different roles, such as analysing each chunk, classifying it, and revising it. We use a categorisation of narratives related to sustainable investment topics such as sustainable development, green finance, and climate-related risks. This process offers insights into multi-agent tasks and demonstrates the practical implications of AI-driven text analysis in enhancing not only traditional NLP research but also zero or few-shot classification tasks using LLMs. These findings have direct relevance and applicability in political science, political economy and related fields, making them more accessible and valuable for researchers and practitioners.