ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Hidden barriers to open competition: Using text mining to uncover corrupt restrictions to competition in Public Procurement

Europe (Central and Eastern)
Quantitative
Corruption
Big Data
Mihaly Fazekas
Central European University
Mihaly Fazekas
Central European University
Eszter Katona
Eötvös Loránd University

Abstract

Public procurement represents 15% of Europe’s GDP and about one third of total government spending. Allegations of corruption and politicians favouring connected companies are rife both in the Western and Eastern parts of the continent. Yet, we know very little about how corruption is conducted and what drives it. As a result of the explosion of structured, announcement-level data based on hundreds of thousands of official government records, new opportunities have arisen to employ text mining methods to study corruption and limited competition. Hence, the goal of this research paper is to predict limitations of open competition, likely linked to corruption with the help of detailed, procurement tender-level textual information. Such an innovation is expected to play a crucial role next to already established indicators of corrupt features of procurement tenders which characterize procedures and outcomes such as not publishing bidding opportunities for potential bidders, limiting the pool of potential contenders, or directly awarding a government contract without any open competition. Our text-as-data approach goes beyond and builds on such indicators by including textual information describing the purchased goods and services as well as the conditions for tenderers such as prior experience required from eligible bidders. In qualitative case studies, it has been shown that tailoring tendering terms to a favored bidder, whereby eliminating competitors from a tender, often happens through conditions and requirements buried in lengthy tendering conditions. Our approach builds on qualitative insights and applies state-of-the-art machine learning methods to a large-scale dataset. We analyze online available, official government data on more than 200,000 Hungarian public procurement contracts from between 2011 and 2020. We used Text Mining to extract, pre-process and analyze the textual information. First, we replicated past research predicting a single bid submitted on an otherwise competitive tender, that is elimination of competition, likely indicating corrupt intent. Then we trained Logistic Regression and Random Forest models using word n-grams to predict the same outcome. We used grid-search to find the optimal hyperparameter settings. On top of textual data, we included control variables (year, division based on product code, bid price, location, and buyer type) in the models. Our preliminary findings indicate that the models using textual information outperform the replicated baseline models in predicting single bidding. To explore which texts are most important for measuring corruption risks, we used three different parts of the text separately. We trained different models for those parts of the texts which contain the tender requirements, the award criteria and product description. We found that different parts of the text have different predictive efficiency. Contrary to expectations, award criteria are less impactful for predicting single bidding, while the text in product description has the highest predictive efficiency. To test external validity of our measurement, we correlate our risk predictions with prices and known cases of favouritism in Hungarian procurement.