ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Introduction to Quantitative Text Analysis

Lisa Lechner
Lisa.Lechner@uibk.ac.at

University of Innsbruck

Lisa Lechner is Assistant professor for methods and methodology in political science at the University of Innsbruck.

In her research, Lisa studies international treaties such as trade agreements, bilateral tax treaties, and environmental agreements, as well as national and international jurisdictions by dint of inferential network- and quantitative text-analysis.

Kohei Watanabe
watanabe.kohei@gmail.com

University of Innsbruck

Kohei Watanabe is an assistant professor at the Department of Political Science / Center for Digital Science at the University of Innsbruck.

He holds an MA from CEU, and studied for his PhD at the London School of Economics and Political Science.

Kohei develops quanteda, the R package for quantitative text analysis to research international and political communication.

Twitter @koheiw7

Course Dates and Times

Monday 29 July – Friday 2 August

09:00–10:30 & 11:00–12:30

Prerequisite Knowledge

You should have experience in statistical analysis in R.

Prior knowledge of programming or quantitative text analysis is not required.


Short Outline

This course introduces you to quantitative text analysis.

It starts with an overview of key concepts and basic workflow in manual and computational text analysis.

We then discuss how to develop a good coding scheme, conduct manual content analysis, and preprocess textual data for computational analysis.

Next, we apply two of the most popular bag-of-words models for document scaling. 

Finally, you will learn about supervised and unsupervised models for document classification.

ECTS Credits for this course

2 credits (pass/fail grade) Attend 90% of course hours and participate fully in in-class activities. Carry out the necessary reading and/or other work prior to, and after, class. 
3 credits (to be graded) As above, plus complete daily assignments based on the methods illustrated during the seminars. 
4 credits (to be graded) As above, plus complete collect, preprocess, and scale another corpus than the one offered in class.


Long Course Outline

Quantitative text analysis offers powerful tools to study textual data, such as newspapers, speeches, laws, and treaties, produced in everyday political activity. 

Through lectures and seminars, you will learn theoretical and practical aspects of quantitative text analysis.

By the end of this course, you will be able to conduct quantitative text analysis independently.


Day 1
We start with a lecture on 
key concepts, basic workflow for manual and computational analyses as an overview of the quantitative text analysis. This will be followed by a lab seminar on how to use the R package quanteda, of which Kohei is a developer. 

Day 2
We discuss reliability and validity in manual and computational analysis of texts. Although reliability and validity concerns are more pronounced in manual and computational analysis respectively, both approaches must achieve reliability and validity. In the seminar we cover dictionary making and sentiment analysis, which offer good examples to help us understand reliability and validity.
 

Day 3
You will learn how to segment, clean and simplify texts in preparation for statistical analysis. Beginners in quantitative text analysis often find this preprocessing difficult because it requires a series of decisions, but we will explain the principle to make it easier for you. In the seminar, we implement the preprocessing using quanteda’s various functions.

Day 4
We will discuss algorithms and applications of two famous models (Wordscore and Wordfish) for document scaling in political sciences. However, results of analysis by these bag-of-words models change depending on how texts are preprocesed as we will demonstrate. We explore these models’ sensitivity to feature selection in the seminar.

Day 5
In the lecture, you will learn different types of supervised and unsupervised models (naïve Bayes, topic models) for document classification. In the seminar, you will apply those methods yourself using quanteda, to understand the entire workflow of quantitative text analysis.

Day Topic Details
1 Introduction

Lecture
Overview of quantitative text analysis (key concepts, basic workflow for manual and computational analyses) 
Lab
Introduction to quanteda

2 Manual content analysis and sentiment analysis

Lecture
Reliability, validity, codebooks, and dicitonary-based analysis 

Lab
Dictionary making and analysis (Lexicoder Sentiment Dictionary) 

3 Text preprocessing and similarity measures

Lecture
Tokenisation and feature selection, collocation analysis, n-grams, and stopwords; computing text similarities 

Lab
Regular expression, collocation analysis, n-gram generation (quanteda) 

4 Document scaling techniques

Lecture
Wordscores, Wordfish and Correspondence analysis their applications 

Lab
Wordscore and Wordfish and its robustness check (pretext)

5 Document classification techniques

Lecture
Introduction to supervised and unsupervised methods in text analysis 

Lab
Naïve Bayes and topic models

Day Readings
1

Grimmer and Stewart (2013)

Liddy (2001)

Lowe and Benoit (2013)

Welbers, Van Atteveldt, and Benoit (2017)

2

Krippendorff (1989)

Krippendorff (2013) (suggested)

Pennebaker and Francis (1996)

Young and Soroka (2012)

3

Huang (2008)

Jansa, Hansen, and Gray (2019)

4

Denny and Spirling (2018)

Laver, Benoit, and Garry (2003)

Schonhardt-Bailey (2005)

Slapin and Proksch (2008)

Spirling (2012)

5

Blei (2012)

Burscher, Vliegenthart, and De Vreese (2015)

Müller and Rauh (2018)

Note

For the precise literature references, see reference list below.

Software Requirements

R (3.4 or later) and R Studio

Hardware Requirements

Please bring your own laptop that meets the minimum system requirements for the quanteda package.

Recommended Courses to Cover Before this One

<p style="text-align:left">Introduction to R</p>

Recommended Courses to Cover After this One

<p style="text-align:left">Advanced&nbsp;Quantitative Text Analysis</p>


Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed at the time of change.

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, please contact us before registering.