ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Natural language processing of international organization evaluation reports: Introducing a novel performance metric

Public Administration
International
Quantitative
Vytas Jankauskas
Zeppelin University Friedrichshafen
Vytas Jankauskas
Zeppelin University Friedrichshafen
Steffen Eckhard
Zeppelin University Friedrichshafen

Abstract

Is it possible to reduce the complexity of information contained in evaluation reports of international organizations to a single performance metric? We present a novel approach that employs deep-learning based text classification. At the core is a classification of sentences as negative or positive sentences regarding the evaluated IO activity. Descriptive information classifies as neutral. This enables calculation of the share of sentences containing positive versus negative assessments in evaluation reports – our performance metric. First, we use a deep learning-based contextualized language model to classify sentences. We fine-tune the pre-trained model (BERT) based on 10,296 hand-coded sentences in 180 evaluation reports of nine international organizations. We use 90% of these sentences for the fine-tuning, and 10% to evaluate model performance and find that it classifies with an accuracy of 89%. We secondly apply the model to evaluation reports by the World Bank Independent Evaluation Group, which were not in our original training data. These reports contain a single performance metric provided by evaluators, ranging from unsatisfactory to highly satisfactory. We find that our novel performance metric correlates extremely well with human-provided classifications of IEG evaluation reports. Last, we classify close to one million sentences in 1,028 evaluation reports of nine international organizations between 2012 and 2020 to calculate a performance score. We present descriptive statistics and argue that the novel data set offers a solid conceptual and empirical basis for studying IO performance as well as the content of IO evaluations, allowing us to compare them both within IOs (over time, across projects or countries) as well as between IOs. We hope this research brings new momentum to employing the results of evaluation reports, both from a methodological perspective and in terms of theoretical and more practical insights.