ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Automating Integrity: The Promise and Limits of LLMs for Anti-Corruption Knowledge and Policy

Methods
Corruption
Technology
Policy-Making
Giovanna Rodriguez-Garcia
Autonomous University of Bucaramanga
Giovanna Rodriguez-Garcia
Autonomous University of Bucaramanga

To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.


Abstract

Large language models (LLMs) are reshaping anti-corruption work, not only through operational analytics, but increasingly as tools for producing and communicating knowledge—from rapid evidence syntheses and coding support to drafting policy memos and measurement frameworks. Yet beyond their technical promise, questions remain about whether LLMs can reliably generate anti-corruption knowledge and policy implications without displacing the contextual judgment and normative responsibility that human experts provide. This paper empirically evaluates the accuracy, usefulness, and limits of LLMs as tools for creating anti-corruption knowledge and policy-relevant insights, benchmarked against human-expert reference outputs, across qualitative and quantitative anti-corruption research. We construct a corpus of 100 seminal studies (in criminology, sociology, public policy, economics, and political science fields). Under controlled conditions, we task four widely used LLM systems (ChatGPT, Gemini, Copilot, and DeepSeek) with reproducing core research and analyst tasks that underpin anti-corruption knowledge production: in qualitative studies, identifying mechanisms, coding actors, and synthesizing conclusions; in quantitative studies, reproducing model specifications from the text, checking whether reported results follow from those specifications, and drafting results sections and policy implications consistent with the evidence presented. Outputs are evaluated along three dimensions: epistemic accuracy and reliability (including error types such as omissions and unsupported claims), policy applicability, and normative responsibility. Our findings highlight two scenarios with direct implications for governance. In a convergence scenario, LLMs closely track expert benchmarks on descriptive, well-scaffolded tasks (e.g., structured extraction, summarization, and low-ambiguity synthesis), supporting scalable knowledge translation and the development of new approaches to measuring corruption and anti-corruption. In a divergence scenario, LLMs systematically struggle when tasks require contextual interpretation, cross-case comparability judgments, or value-laden trade-offs, underscoring risks of overreliance on automated systems for policy advice. Beyond speculation, this study demonstrates the conditions under which LLMs can responsibly augment anti-corruption research workflows and policy drafting, and where they may entrench bias, hallucinate causal mechanisms, or overlook power dynamics. It offers guidance for international organizations and civil society on when to use LLMs as evidence-to-policy support tools, and when human expert oversight is non-negotiable.