ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Investigating Censorship Mechanisms in Chinese Large Language Models: An ABM-Enhanced Examination of Digital Authoritarianism

China
Internet
Social Media
Communication
Technology
Empirical
Xin Zhou
Friedrich-Schiller Universität Jena
Xin Zhou
Friedrich-Schiller Universität Jena

Abstract

Recent advances in Generative AI have brought to the fore questions about how large language models (LLMs) might be governed or manipulated by authoritarian regimes to suppress dissent. This paper focuses on Chinese-developed LLMs—including Wenxin Yiyan (Ernie Bot), Tongyi Qianwen, Deepseek, and Qwen—and explores the extent to which these systems incorporate complex censorship mechanisms. By integrating agent-based modeling (ABM) with fine-tuned AI techniques, our ongoing study seeks to offer empirical insights into how targeted content filtering may operate beyond simple keyword blocking. Our methodology employs a multi-stage process. First, we compile and preprocess social media posts from self-identified dissidents on international platforms, capturing a wide range of political sentiments, tones, and levels of sensitivity. Using these data, we fine-tune a transformer-based model to generate synthetic prompts aimed at systematically probing potential red lines in Chinese LLMs. We then implement an ABM framework that orchestrates large-scale experiments, simulating diverse user interactions with each target LLM. Agents dynamically adjust their query strategies based on real-time responses, ensuring the testing protocol reflects shifting content sensitivities, linguistic variations, and temporal patterns of engagement. For data analysis, we employ advanced natural language processing (NLP) methods to identify and categorize censorship triggers. We utilize BERTopic—particularly suited for tracking evolving censorship patterns due to its dynamic topic modeling capabilities and efficient handling of short-text content—to monitor suppression patterns across different domains and timeframes. This approach is complemented by transformer-based sentiment analysis to capture emotional undertones in both prompts and system outputs, as well as sequence alignment algorithms designed to detect subtle reformulations that may indicate self-censorship. Early pilot tests suggest the presence of nuanced, adaptive filtering protocols that vary based on contextual understanding rather than rigid keyword lists, though comprehensive analysis is still ongoing. While our findings are preliminary, we anticipate discovering systematic variations in how different Chinese LLMs implement censorship. We expect to observe distinct censorship patterns, from outright query rejection to subtle topic deflection and contextual reframing, varying by model ownership and regulatory context. Initial testing suggests state-affiliated models may employ more sophisticated control mechanisms, including the ability to recognize and suppress politically sensitive content even when expressed through metaphors or indirect references. Moreover, our ongoing investigation is poised to shed light on the degree to which commercial AI applications can be repurposed—or preemptively tailored—to reinforce digital authoritarian practices. By sharing our preliminary observations and research design, this paper contributes to broader debates on AI governance and political communication. Our methodology provides a replicable framework for detecting and dissecting algorithmic censorship in emerging GenAI platforms. Ultimately, we aim to highlight how such censorship mechanisms may be exportable to other contexts, raising urgent ethical and policy considerations for the global AI community. We plan to refine and expand our analysis in forthcoming work, thereby offering a deeper understanding of how these systems operate at scale—and how they might shape or constrain public discourse under authoritarian regimes.