Algorithmic Deceptions: An LLM-Assisted Analysis of Russian Propaganda Curation on Search Engines
Methods
Quantitative
Social Media
Communication
Mixed Methods
Big Data
Abstract
The influence of Russian propaganda on global public discourse has been a major concern for democratic societies, particularly since the 2016 US Election. The intensity of Kremlin’s digital deception campaigns has amplified following the Russian full-scale invasion of Ukraine (Litvinenko, 2022). Despite a plethora of literature on Russian digital propaganda, we still lack a comprehensive account of how online platforms manage problematic content produced by the Kremlin and how their algorithms shape information environments where users are exposed to propaganda-related topics. To remedy this limitation, our paper uses a novel large language model (LLM)-assisted methodology to process a large multilingual set of data of search engines’ performance in the context of Russian propaganda. Through this analysis, we investigate how search engines’ algorithmic information curation - understood as a process of “organizing, selecting and presenting subsets of a corpus of information” to users (Rader & Gray, 2015) - counter and sometimes facilitate the spread of various dimensions of Russian propaganda, varying from anti-system claims about failing Western democracies to pro-Kremlin advocacy of conservative values.
The data for the study is acquired via a series of agent-based comparative algorithm audits of the world’s most used search engines: Google, Bing, Yandex, and DuckDuckGo. Earlier research (e.g. Toepfl et al., 2023; Kuznetsova et al., 2024) has already been looking at how search engines interact with Russian propaganda, albeit mostly for a small sample of propaganda-related topics and in the Russian domestic information environment. By contrast, we use Google Compute Cloud to simulate user browsing activity in six countries (USA, India, Qatar, Brazil, Poland, Germany), which are to various degrees targeted by the Russian propaganda campaigns and use a structured corpus of propaganda statements based on dimensionality analysis translated into nine languages (English, Spanish, Portuguese, German, Hindi, Polish, Arabic, Russian, and Ukrainian). Altogether, our dataset consists of search outputs to 144 queries and we use it to investigate what sources different search engines return in response to queries, whether these sources support or debunk common propaganda narratives, and how the outputs vary depending on the location and the language of the query.
Our collected dataset contains more than 30,000 unique search results coming from 7,200 unique domains. Because of the dataset’s size, it is challenging to process it using traditional qualitative approaches, so instead, we conduct LLM-based labeling of sources as well as content associated with these sources. Our preliminary observations highlight the profound impact of queries’ language on search engines’ outputs, which has major implications for the algorithmic information curation and the selection of outputs resulting from it. Particularly, we notice a concerning amount of Russian state-controlled sources retrieved by search engines, which varies across individual search engines. This finding is troubling due to search algorithms potentially facilitating the spread of Russian propaganda and doing it in a differentiated manner (thus complicating its detection) and points to the importance of a comprehensive understanding of information environments surrounding critical political topics and the role of algorithmic systems in shaping these environments.