ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Tackling Human Labeling Errors in Machine Learning for AI Services in the Public Sector: A Case Study on Korean Regulation Classification

Public Administration
Regulation
Methods
Mixed Methods
Ha Hwang
Korea Institute of Public Administration
Seung-Hun Hong
Korea Institute of Public Administration
Ha Hwang
Korea Institute of Public Administration

Abstract

As machine learning and AI services expand, the importance of data quality, particularly the quality of labeling data for learning, is increasing. In the public and administrative sectors, many data are labeled by humans, resulting in inaccurate labeling. Human labeling errors can lead to inaccurate predictions and faulty decisions, which can have serious consequences. In this article, we emphasize the importance of solving human labeling errors in developing AI services in the public sector and present a case study on Korean regulatory classification. In Korea, regulations are classified in order to register them. However, the regulations are not accurately classified and registered due to the lack of clear guidelines and expertise of civil servants. Fundamentally, the conceptual definition of regulation is complex and ambiguous, leading to disagreement among regulatory experts. We built an AI language model for regulatory classification based on Korean legal data to tackle this issue. We revised the regulatory classification labels through Active Learning with regulatory experts and the AI language model. Additionally, through the mutual learning process between humans and AI, we created conceptual definitions of regulations and guidelines for regulatory classification. Our case study highlights the importance of addressing human labeling errors in machine learning for AI services in the public sector and the potential benefits of using active learning to improve labeling quality. We conclude by discussing the broader implications of our findings for developing AI services in the public sector and the need for continued research in this area.