Validity as a Challenge to Mainstream Computational Methods

Matti Nelimarkka
University of Helsinki
Matti Nelimarkka
University of Helsinki

In the past few years, social scientists have applied computational tools to explore data and moved towards using data science as a method. There are several guidelines reflecting how to apply computation methods to various traditional social science topics. Among others, Grimmer & Stewart (2013) and Schwartz & Ungar (2015) have reviewed computational methods for text classification. If conducted successfully, these methods support scaling even qualitative data analysis, while in rather infant form of classification (vs. e.g., discourse analysis), to text scales not previously addressed by social scientists, even to ‘big data.’

However, I argue we ought to be careful on applying computational methods and reviewing their uses. I will present an extended discussion on the validity of computational methods based on two articles (and research projects) I have been involved. In the first one we presented an approach to integrate ethnography with computational data analysis (Laaksonen et al., 2017). We argue that doing so will address both challenges of data collection (especially in online venues) but furthermore, the qualitative research can be used to inform data analysis.

The second case examines challenges of human interpretation of unsupervised learning approaches. I use recently trendy approach on topic models (Blei 2012) to illustrate the challenges. First, researchers interpreted the output rather different compared with (Chang et al. 2009). Second, as I have shown (Nelimarkka, unpublished), people are also extremely poor choosing the number of topics and agreements rarely take place. Thus, the application on unsupervised learning might lead to rather noisy results.

I argue that to support social scientists towards with this topic, we can must develop clear guidelines for using computational, big data, methods. While today, the best practice is to use human validation (Grimmer and Stewart 2013), it is unclear how this is conducted. I suggest that the solution here may be on mixed methods approach or method triangulation, including traditional grounded theory (Muller et al. 2016), as those are currently better established in the field.
Share this page

"Wherever you have an efficient government you have a dictatorship" - Harry S. Truman

Back to top