Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Monday 29 July – Friday 2 August
09:00–10:30 & 11:00–12:30
This course introduces you to quantitative text analysis.
It starts with an overview of key concepts and basic workflow in manual and computational text analysis.
We then discuss how to develop a good coding scheme, conduct manual content analysis, and preprocess textual data for computational analysis.
Next, we apply two of the most popular bag-of-words models for document scaling.
Finally, you will learn about supervised and unsupervised models for document classification.
2 credits (pass/fail grade) Attend 90% of course hours and participate fully in in-class activities. Carry out the necessary reading and/or other work prior to, and after, class.
3 credits (to be graded) As above, plus complete daily assignments based on the methods illustrated during the seminars.
4 credits (to be graded) As above, plus complete collect, preprocess, and scale another corpus than the one offered in class.
Lisa Lechner is Assistant professor for methods and methodology in political science at the University of Innsbruck.
In her research, Lisa studies international treaties such as trade agreements, bilateral tax treaties, and environmental agreements, as well as national and international jurisdictions by dint of inferential network- and quantitative text-analysis.
Kohei Watanabe is an assistant professor at the Department of Political Science / Center for Digital Science at the University of Innsbruck.
He holds an MA from CEU, and studied for his PhD at the London School of Economics and Political Science.
Kohei develops quanteda, the R package for quantitative text analysis to research international and political communication.
Quantitative text analysis offers powerful tools to study textual data, such as newspapers, speeches, laws, and treaties, produced in everyday political activity.
Through lectures and seminars, you will learn theoretical and practical aspects of quantitative text analysis.
By the end of this course, you will be able to conduct quantitative text analysis independently.
Day 1
We start with a lecture on key concepts, basic workflow for manual and computational analyses as an overview of the quantitative text analysis. This will be followed by a lab seminar on how to use the R package quanteda, of which Kohei is a developer.
Day 2
We discuss reliability and validity in manual and computational analysis of texts. Although reliability and validity concerns are more pronounced in manual and computational analysis respectively, both approaches must achieve reliability and validity. In the seminar we cover dictionary making and sentiment analysis, which offer good examples to help us understand reliability and validity.
Day 3
You will learn how to segment, clean and simplify texts in preparation for statistical analysis. Beginners in quantitative text analysis often find this preprocessing difficult because it requires a series of decisions, but we will explain the principle to make it easier for you. In the seminar, we implement the preprocessing using quanteda’s various functions.
Day 4
We will discuss algorithms and applications of two famous models (Wordscore and Wordfish) for document scaling in political sciences. However, results of analysis by these bag-of-words models change depending on how texts are preprocesed as we will demonstrate. We explore these models’ sensitivity to feature selection in the seminar.
Day 5
In the lecture, you will learn different types of supervised and unsupervised models (naïve Bayes, topic models) for document classification. In the seminar, you will apply those methods yourself using quanteda, to understand the entire workflow of quantitative text analysis.
You should have experience in statistical analysis in R.
Prior knowledge of programming or quantitative text analysis is not required.
Day | Topic | Details |
---|---|---|
1 | Introduction |
Lecture |
2 | Manual content analysis and sentiment analysis |
Lecture Lab |
3 | Text preprocessing and similarity measures |
Lecture Lab |
4 | Document scaling techniques |
Lecture Lab |
5 | Document classification techniques |
Lecture Lab |
Day | Readings |
---|---|
1 |
Grimmer and Stewart (2013) Liddy (2001) Lowe and Benoit (2013) Welbers, Van Atteveldt, and Benoit (2017) |
2 |
Krippendorff (1989) Krippendorff (2013) (suggested) Pennebaker and Francis (1996) Young and Soroka (2012) |
3 |
Huang (2008) Jansa, Hansen, and Gray (2019) |
4 |
Denny and Spirling (2018) Laver, Benoit, and Garry (2003) Schonhardt-Bailey (2005) Slapin and Proksch (2008) Spirling (2012) |
5 |
Blei (2012) Burscher, Vliegenthart, and De Vreese (2015) Müller and Rauh (2018) |
Note |
For the precise literature references, see reference list below. |
R (3.4 or later) and R Studio
Please bring your own laptop that meets the minimum system requirements for the quanteda package.
Introduction to R
Advanced Quantitative Text Analysis