Automating Integrity: The Promise and Limits of LLMs for Anti-Corruption Knowledge and Policy

Methods

Corruption

Technology

Policy-Making

Presenter(s)

Giovanna Rodriguez-Garcia

Autonomous University of Bucaramanga

Author(s)

Giovanna Rodriguez-Garcia

Autonomous University of Bucaramanga

Panel The Digital Turn: AI, Algorithmic Tools, Big Data, and Integrity Risks

To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.

Abstract

Large language models (LLMs) are reshaping anti-corruption work, not only through operational analytics, but increasingly as tools for producing and communicating knowledge—from rapid evidence syntheses and coding support to drafting policy memos and measurement frameworks. Yet beyond their technical promise, questions remain about whether LLMs can reliably generate anti-corruption knowledge and policy implications without displacing the contextual judgment and normative responsibility that human experts provide. This paper empirically evaluates the accuracy, usefulness, and limits of LLMs as tools for creating anti-corruption knowledge and policy-relevant insights, benchmarked against human-expert reference outputs, across qualitative and quantitative anti-corruption research. We construct a corpus of 100 seminal studies (in criminology, sociology, public policy, economics, and political science fields). Under controlled conditions, we task four widely used LLM systems (ChatGPT, Gemini, Copilot, and DeepSeek) with reproducing core research and analyst tasks that underpin anti-corruption knowledge production: in qualitative studies, identifying mechanisms, coding actors, and synthesizing conclusions; in quantitative studies, reproducing model specifications from the text, checking whether reported results follow from those specifications, and drafting results sections and policy implications consistent with the evidence presented. Outputs are evaluated along three dimensions: epistemic accuracy and reliability (including error types such as omissions and unsupported claims), policy applicability, and normative responsibility.
Our findings highlight two scenarios with direct implications for governance. In a convergence scenario, LLMs closely track expert benchmarks on descriptive, well-scaffolded tasks (e.g., structured extraction, summarization, and low-ambiguity synthesis), supporting scalable knowledge translation and the development of new approaches to measuring corruption and anti-corruption. In a divergence scenario, LLMs systematically struggle when tasks require contextual interpretation, cross-case comparability judgments, or value-laden trade-offs, underscoring risks of overreliance on automated systems for policy advice. Beyond speculation, this study demonstrates the conditions under which LLMs can responsibly augment anti-corruption research workflows and policy drafting, and where they may entrench bias, hallucinate causal mechanisms, or overlook power dynamics. It offers guidance for international organizations and civil society on when to use LLMs as evidence-to-policy support tools, and when human expert oversight is non-negotiable.

Install the app

Automating Integrity: The Promise and Limits of LLMs for Anti-Corruption Knowledge and Policy

Abstract