The curse of dimensionality of comparative data collection across platforms and time - Challenges and approaches
Political Methodology
Internet
Methods
Quantitative
Social Media
Communication
Comparative Perspective
Technology
Abstract
Comparative data collection across social networking platforms such as Facebook or Twitter, messenger and microblogging sites such as Telegram, or image and discussion boards such as 4chan or Reddit presents a challenge. All these venues come with specific platform architectures, features, and utilities for specific actor groups (Bossetta, 2019; Evans et al., 2017), governance structures and access regimes, which fundamentally influence the possibilities and limitations of data collection from those sites. Although decisions concerning these multifaceted platform characteristics can consequently impact empirical analyses, they are rarely discussed at length (Mahl, von Nordheim, et al., 2022). If we acknowledge the inherently cross-platform nature of digital communication that spreads across platforms and communication venues online, the question is how equivalent data collection across several platforms and communication venues is possible, and whether there are viable approaches to deal with the varieties of platform differences. Against this background, our study uses the phenomenon of conspiracy theories as an exemplary case for discussing challenges and approaches for data collection in multi-dimensional comparative studies on political contention. We ask: What are the methodological and practical challenges of different platform architectures, governance and access regimes, and use cultures for data collection across platforms and time? What approaches could facilitate equivalent data collection across platforms, also taking temporal dynamics and cultural embeddings into account? To address these questions, we compare the architectures and features relevant for data collection, governance and access regimes as well as aspects that stand out for particular use cultures in the context of conspiracy theories for a set of platforms and digital communication venues, namely discussion platforms, networked social media, publishing-oriented platforms and media as well as hybrid platforms. We highlight differences in the structures of particular units of analysis (e.g., self-contained or interconnected), their attributability to authors, and their accessibility due to differences in access and archive options. We address challenges researchers face when collecting data in a cross-platform and cross-time comparative design. To do so, we juxtapose data collection possibilities from these sites, organized by the differentiation between actor- and content-based strategies. We discuss these approaches' potentials and limitations, considering differences in platforms, temporal dynamics on a platform-, individual user- and contextual level, as well as several layers of equivalence. The discussion highlights crucial insights for designing data collection strategies in multi-dimensional comparative studies that extend beyond our example of conspiracy-related content to a broader range of digital political communication.
The paper’s topic addresses panel 10 “Advancing Research Designs and Methods in Political Communication”.