ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Introduction to Quantitative Text Analysis

Course Dates and Times

Monday 29 July – Friday 2 August

09:00–10:30 & 11:00–12:30

Lisa Lechner

Lisa.Lechner@uibk.ac.at

University of Innsbruck

Kohei Watanabe

watanabe.kohei@gmail.com

University of Innsbruck

This course introduces you to quantitative text analysis.

It starts with an overview of key concepts and basic workflow in manual and computational text analysis.

We then discuss how to develop a good coding scheme, conduct manual content analysis, and preprocess textual data for computational analysis.

Next, we apply two of the most popular bag-of-words models for document scaling. 

Finally, you will learn about supervised and unsupervised models for document classification.

ECTS Credits for this course

2 credits (pass/fail grade) Attend 90% of course hours and participate fully in in-class activities. Carry out the necessary reading and/or other work prior to, and after, class. 
3 credits (to be graded) As above, plus complete daily assignments based on the methods illustrated during the seminars. 
4 credits (to be graded) As above, plus complete collect, preprocess, and scale another corpus than the one offered in class.


Instructor Bio

Lisa Lechner is Assistant professor for methods and methodology in political science at the University of Innsbruck.

In her research, Lisa studies international treaties such as trade agreements, bilateral tax treaties, and environmental agreements, as well as national and international jurisdictions by dint of inferential network- and quantitative text-analysis.

Kohei Watanabe is an assistant professor at the Department of Political Science / Center for Digital Science at the University of Innsbruck.

He holds an MA from CEU, and studied for his PhD at the London School of Economics and Political Science.

Kohei develops quanteda, the R package for quantitative text analysis to research international and political communication.

Twitter @koheiw7

Quantitative text analysis offers powerful tools to study textual data, such as newspapers, speeches, laws, and treaties, produced in everyday political activity. 

Through lectures and seminars, you will learn theoretical and practical aspects of quantitative text analysis.

By the end of this course, you will be able to conduct quantitative text analysis independently.


Day 1
We start with a lecture on 
key concepts, basic workflow for manual and computational analyses as an overview of the quantitative text analysis. This will be followed by a lab seminar on how to use the R package quanteda, of which Kohei is a developer. 

Day 2
We discuss reliability and validity in manual and computational analysis of texts. Although reliability and validity concerns are more pronounced in manual and computational analysis respectively, both approaches must achieve reliability and validity. In the seminar we cover dictionary making and sentiment analysis, which offer good examples to help us understand reliability and validity.
 

Day 3
You will learn how to segment, clean and simplify texts in preparation for statistical analysis. Beginners in quantitative text analysis often find this preprocessing difficult because it requires a series of decisions, but we will explain the principle to make it easier for you. In the seminar, we implement the preprocessing using quanteda’s various functions.

Day 4
We will discuss algorithms and applications of two famous models (Wordscore and Wordfish) for document scaling in political sciences. However, results of analysis by these bag-of-words models change depending on how texts are preprocesed as we will demonstrate. We explore these models’ sensitivity to feature selection in the seminar.

Day 5
In the lecture, you will learn different types of supervised and unsupervised models (naïve Bayes, topic models) for document classification. In the seminar, you will apply those methods yourself using quanteda, to understand the entire workflow of quantitative text analysis.

You should have experience in statistical analysis in R.

Prior knowledge of programming or quantitative text analysis is not required.

Day Topic Details
1 Introduction

Lecture
Overview of quantitative text analysis (key concepts, basic workflow for manual and computational analyses) 
Lab
Introduction to quanteda

2 Manual content analysis and sentiment analysis

Lecture
Reliability, validity, codebooks, and dicitonary-based analysis 

Lab
Dictionary making and analysis (Lexicoder Sentiment Dictionary) 

3 Text preprocessing and similarity measures

Lecture
Tokenisation and feature selection, collocation analysis, n-grams, and stopwords; computing text similarities 

Lab
Regular expression, collocation analysis, n-gram generation (quanteda) 

4 Document scaling techniques

Lecture
Wordscores, Wordfish and Correspondence analysis their applications 

Lab
Wordscore and Wordfish and its robustness check (pretext)

5 Document classification techniques

Lecture
Introduction to supervised and unsupervised methods in text analysis 

Lab
Naïve Bayes and topic models

Day Readings
1

Grimmer and Stewart (2013)

Liddy (2001)

Lowe and Benoit (2013)

Welbers, Van Atteveldt, and Benoit (2017)

2

Krippendorff (1989)

Krippendorff (2013) (suggested)

Pennebaker and Francis (1996)

Young and Soroka (2012)

3

Huang (2008)

Jansa, Hansen, and Gray (2019)

4

Denny and Spirling (2018)

Laver, Benoit, and Garry (2003)

Schonhardt-Bailey (2005)

Slapin and Proksch (2008)

Spirling (2012)

5

Blei (2012)

Burscher, Vliegenthart, and De Vreese (2015)

Müller and Rauh (2018)

Note

For the precise literature references, see reference list below.

Software Requirements

R (3.4 or later) and R Studio

Hardware Requirements

Please bring your own laptop that meets the minimum system requirements for the quanteda package.

Recommended Courses to Cover Before this One

Introduction to R

Recommended Courses to Cover After this One

Advanced Quantitative Text Analysis