ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Disentangling the past: Automatic network reconstruction from unstructured textual sources using large language models

Elites
Gender
Latin America
Political Economy
Political Methodology
Social Capital
Political Sociology
Big Data
Felipe Perilla Reyes
University of Zurich
Felipe Perilla Reyes
University of Zurich

To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.


Abstract

This paper introduces a Python package based on large language models (LLMs) designed to extract individual attributes, relationships, and events from unstructured printed sources with diverse layouts and languages. Traditional information extraction methods, including question-answering systems (QAS), often suffer from accuracy issues and limited transferability across contexts, which substantially restrict their applicability for both contemporary and historical research. The package employs a Retrieval-Augmented Generation (RAG) approach, applicable in both cloud-based (e.g., using ChatGPT’s API) and on-device LLMs (e.g., Llama 2 or Mistral 7B), thereby enhancing versatility and considerably reducing the risk of factually incorrect outputs, known as “hallucinations.” It addresses significant obstacles in social network analysis by enabling fine-grained, reliable, and scalable data collection and supports comprehensive information extraction tasks crucial for reconstructing static and dynamic networks. Empirical evaluations show F1-scores between 0.82 and 1.00 for relation extraction and between 0.17 and 1.00 for event extraction, depending on how strict the similarity threshold between hand-labeled and automatically extracted text is required to be to impute a match. I demonstrate its effectiveness through a case study reconstructing elite kinship networks from digitized genealogies and biographies, a task representative of broader applications in the social sciences. The package ultimately aims to facilitate the production of reproducible, transparent, and cumulative knowledge, particularly through its capacity for detailed and systematic reconstructions of historical processes and societal structures. These capabilities are essential for macro-historical studies of state capacity, political selection and election, institutional change, and inequality in a broad sense.