ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Advanced Quantitative Text Analysis

Course Dates and Times

Monday 5 – Friday 9 August

09:00–10:30 / 11:00–12:30

Lisa Lechner

Lisa.Lechner@uibk.ac.at

University of Innsbruck

Kohei Watanabe

watanabe.kohei@gmail.com

University of Innsbruck

The topics of this advanced course on quantitative text analysis will range from machine learning algorithms, such as Random Forest, over topic models, seeded topic models, LSS, newsmap, to word embeddings.

You will also learn how to automatically derive extra information from syntactic structures in the texts.

The course will end with an interactive discussion on participants’ research projects, and your own text analysis tools developed in R.

ECTS Credits for this course

2 credits (pass/fail grade). Attend 90% of course hours and participate fully in in-class activities. Carry out the necessary reading and/or other work prior to, and after, class. 

3 credits (to be graded) As above, plus complete daily assignments based on the methods illustrated during the seminars. 

4 credits (to be graded) As above, plus write a text analysis function for R. 


Instructor Bio

Lisa Lechner is Assistant professor for methods and methodology in political science at the University of Innsbruck.

In her research, Lisa studies international treaties such as trade agreements, bilateral tax treaties, and environmental agreements, as well as national and international jurisdictions by dint of inferential network- and quantitative text-analysis.

Kohei Watanabe is an assistant professor at the Department of Political Science / Center for Digital Science at the University of Innsbruck.

He holds an MA from CEU, and studied for his PhD at the London School of Economics and Political Science.

Kohei develops quanteda, the R package for quantitative text analysis to research international and political communication.

Twitter @koheiw7

This course will revisit topics in Introduction to Quantitative Text Analysis but goes much deeper into theoretical and technological foundations of quantitative text analysis to be able to develop complex analytic pipelines in research projects.  

Day 1
The lecture will advance your knowledge of supervised and unsupervised methods, focusing on their strengths and weaknesses. We will cover Wordscores and naïve Bayes classifiers, Random Forest, latent Dirichlet allocation (LDA), and Structural Topic Model (STM). Wordscores and naïve Bayes classifiers are simple supervised algorithms for document scaling and for document classification, respectively. Random Forest can be used for both purposes, but it has a more sophisticated algorithm. LDA and STM are unsupervised algorithms for topic classification, but the latter can take into account document-level variables. We learn how to apply these models in the seminar.

Day 2
We will explain a seeded LDA model as well as LSS and Newsmap that offers compromise between strengths and weaknesses of supervised and unsupervised methods. These models rely on exemplary words (seed words) as supervision to perform document scaling or classification tasks. Semi-supervised models can be used for similar purposes as both supervised and unsupervised models, but training semi-supervised models demands special attention to the semantics of seed words. We will learn how to use these models in the seminar. 

Day 3
We discuss the word-embeddings technique that helps us accurately estimate semantic proximity of words in a large corpus. Although there are few applications of this technique in political science research, recently developed models Word2vec and GloVe attracted the attention of many quantitative text analysts to the technique. We explore its potential in the seminar.

Day 4
In the lecture, you will learn how to derive extra information from syntactic structures in texts and how to use that information to perform fine-tuned analysis. In the seminar, we apply syntactic parser, which recognises part-of-speech and dependencies of words, to improve text pre-processing, and geographical parsing (geoparser.io), which is a combination of a syntactic parser and a geographic database, to identify places mentioned in texts.

Day 5
You should come to the lecture with concrete research ideas involving quantitative text analysis. Some of you will be asked to present your ideas to initiate a class-wide discussion on how to choose analytic methods in actual research projects. In the seminar, you will learn how to develop your own text analysis tools by combining NLP functions in R.

You should have experience in quantitative text analysis in R – textual data management and preprocessing.

Basic knowledge of programming (object types, control flow, loop etc.) is desirable.

Day Topic Details
1 Supervised and unsupervised models

Lecture
Supervised / Unsupervised machine learning models and their applications 

Lab
Supervised (randomForest, caret etc.) unsupervised models (topicmodels, stm, etc.)

2 Semi-supervised models

Lecture
Semi-supervised machine learning models and their applications 

Lab
Seeded LDA, LSS and Newsmap 

3 Word embeddings

Lecture
Word embeddings for estimating semantic proximities of words

Lab
LSI, Word2vec (text2vec)

4 Syntactic parsing

Lecture
Part-of-speech tagging, geographical information extraction 

Lab
SpaCy (spacyr), geoparser.io (geoparser)

5 Research strategies and programming

Lecture
How to combine analytic tools to answer research questions 

Lab
Program text analysis functions in R

Day Readings
1

Benoit, Laver, and Mikhaylov (2009)

Chang et al. (2009)

2

Lu et al. (2011)

Watanabe (2017)

Watanabe (2018)

3

Turney and Pantel (2010)

Spirling and Rodriguez (2019)

4

Atteveldt et al. (2017)

Buscaldi (2011)

Software Requirements

R (3.4 or later) and RStudio 

Hardware Requirements

Please bring your own laptop that meets the minimum system requirements for the quanteda package.

Literature

Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162. 

Hastie, T. J., Tibshirani, Robert J, & Friedman, Jerome H. (2013). The elements of statistical learning: data mining, inference, and prediction. New York, NY: Springer. 

Jurka, T. P., Collingwood, L., Boydstun, A. E., Grossman, E., & Van Atteveldt, W. (2013). RTextTools: A Supervised Learning Package for Text Classification. The R Journal, 5, 6–12. 

Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Pearson Prentice Hall. 

Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., & Tingley, D. (2015). Computer-Assisted Text Analysis for Comparative Politics. Political Analysis23(2), 254–277.  

Manning, C. D., & Schütze, H. (2001). Foundations of statistical natural language processing. Cambridge (Mass.): MIT press. 

Wilkerson, J., & Casas, A. (2017). Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges. Annual Review of Political Science20(1), 529–544. https://doi.org/10.1146/annurev-polisci-052615-025542 

Recommended Courses to Cover Before this One

Introduction to Quantitative Text Analysis