Rating Fragility: Project Success and Contextual Bias in Fragile States

Development

Governance

Empirical

Presenter(s)

Bernhard Reinsberg

University of Glasgow

Author(s)

Timon Forster

University of St. Gallen

Bernhard Reinsberg

University of Glasgow

Thomas Wencker

DEval

Workshop From Code to Conversation: What LLMs Mean for Political Science

To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.

Abstract

Governments and international development organizations spend vast sums evaluating the effectiveness of their policies. Yet an emerging literature on the ‘governance by numbers’ questions whether evaluations, especially numerical scores frequently used for accountability, are unbiased. To date, scholars have rarely examined this directly.
Our study introduces a new approach: using large language models (LLMs) to infer project success based on evaluation texts. Evaluation reports emphasize institutional learning rather than strict accountability, and we therefore suggest this method is less biased than numerical scores. We hypothesize that project ratings are susceptible to contextual bias: raters may inflate success ratings in fragile states, recognizing the inherent difficulties of working in such environments and potentially aiming for leniency. Using a novel dataset of 1,824 evaluation sub-sections from 359 projects of the KfW Development Bank, we employ LLMs and multivariate analysis to test our hypothesis. Our findings unpack the bias of project ratings in fragile states relative to those in more stable contexts. The results have significant implications for interpreting project ratings in development cooperation and they highlight the crucial tension between accountability and learning in evaluation practices.

Install the app

Rating Fragility: Project Success and Contextual Bias in Fragile States

Abstract