ECPR Summer School Budapest, 26 July - 09 August 2019

Current Event

Guides

Terms and Conditions Code of Conduct

Your subscription could not be saved. Please try again.

Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

* Provide your email address to subscribe. For e.g abc@xyz.com

I agree to receive your newsletters and accept the data privacy statement.

You may unsubscribe at any time using the link in our newsletter.

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Advanced Quantitative Text Analysis

Member rate £492.50
Non-Member rate £985.00

Save £45 Loyalty discount applied automatically*
Save 5% on each additional course booked

*If you attended our Methods School in July/August 2023 or February 2024.

Course Dates and Times

Monday 5 – Friday 9 August

09:00–10:30 / 11:00–12:30

Kohei Watanabe

watanabe.kohei@gmail.com

Institution: University of Innsbruck

Lisa Lechner

Lisa.Lechner@uibk.ac.at

Institution: University of Innsbruck

The topics of this advanced course on quantitative text analysis will range from machine learning algorithms, such as Random Forest, over topic models, seeded topic models, LSS, newsmap, to word embeddings.

You will also learn how to automatically derive extra information from syntactic structures in the texts.

The course will end with an interactive discussion on participants’ research projects, and your own text analysis tools developed in R.

ECTS Credits for this course

2 credits (pass/fail grade). Attend 90% of course hours and participate fully in in-class activities. Carry out the necessary reading and/or other work prior to, and after, class.

3 credits (to be graded) As above, plus complete daily assignments based on the methods illustrated during the seminars.

4 credits (to be graded) As above, plus write a text analysis function for R.

Instructor Bio

Lisa Lechner is Assistant professor for methods and methodology in political science at the University of Innsbruck.

In her research, Lisa studies international treaties such as trade agreements, bilateral tax treaties, and environmental agreements, as well as national and international jurisdictions by dint of inferential network- and quantitative text-analysis.

Kohei Watanabe is an assistant professor at the Department of Political Science / Center for Digital Science at the University of Innsbruck.

He holds an MA from CEU, and studied for his PhD at the London School of Economics and Political Science.

Kohei develops quanteda, the R package for quantitative text analysis to research international and political communication.

@koheiw7

This course will revisit topics in Introduction to Quantitative Text Analysis but goes much deeper into theoretical and technological foundations of quantitative text analysis to be able to develop complex analytic pipelines in research projects.

Day 1
The lecture will advance your knowledge of supervised and unsupervised methods, focusing on their strengths and weaknesses. We will cover Wordscores and naïve Bayes classifiers, Random Forest, latent Dirichlet allocation (LDA), and Structural Topic Model (STM). Wordscores and naïve Bayes classifiers are simple supervised algorithms for document scaling and for document classification, respectively. Random Forest can be used for both purposes, but it has a more sophisticated algorithm. LDA and STM are unsupervised algorithms for topic classification, but the latter can take into account document-level variables. We learn how to apply these models in the seminar.

Day 2
We will explain a seeded LDA model as well as LSS and Newsmap that offers compromise between strengths and weaknesses of supervised and unsupervised methods. These models rely on exemplary words (seed words) as supervision to perform document scaling or classification tasks. Semi-supervised models can be used for similar purposes as both supervised and unsupervised models, but training semi-supervised models demands special attention to the semantics of seed words. We will learn how to use these models in the seminar.

Day 3
We discuss the word-embeddings technique that helps us accurately estimate semantic proximity of words in a large corpus. Although there are few applications of this technique in political science research, recently developed models Word2vec and GloVe attracted the attention of many quantitative text analysts to the technique. We explore its potential in the seminar.

Day 4
In the lecture, you will learn how to derive extra information from syntactic structures in texts and how to use that information to perform fine-tuned analysis. In the seminar, we apply syntactic parser, which recognises part-of-speech and dependencies of words, to improve text pre-processing, and geographical parsing (geoparser.io), which is a combination of a syntactic parser and a geographic database, to identify places mentioned in texts.

Day 5
You should come to the lecture with concrete research ideas involving quantitative text analysis. Some of you will be asked to present your ideas to initiate a class-wide discussion on how to choose analytic methods in actual research projects. In the seminar, you will learn how to develop your own text analysis tools by combining NLP functions in R.

You should have experience in quantitative text analysis in R – textual data management and preprocessing.

Basic knowledge of programming (object types, control flow, loop etc.) is desirable.

Day	Topic	Details
1	Supervised and unsupervised models	Lecture Supervised / Unsupervised machine learning models and their applications Lab Supervised (randomForest, caret etc.) unsupervised models (topicmodels, stm, etc.)
2	Semi-supervised models	Lecture Semi-supervised machine learning models and their applications Lab Seeded LDA, LSS and Newsmap
3	Word embeddings	Lecture Word embeddings for estimating semantic proximities of words Lab LSI, Word2vec (text2vec)
4	Syntactic parsing	Lecture Part-of-speech tagging, geographical information extraction Lab SpaCy (spacyr), geoparser.io (geoparser)
5	Research strategies and programming	Lecture How to combine analytic tools to answer research questions Lab Program text analysis functions in R

Day	Readings
1	Benoit, Laver, and Mikhaylov (2009) Chang et al. (2009)
2	Lu et al. (2011) Watanabe (2017) Watanabe (2018)
3	Turney and Pantel (2010) Spirling and Rodriguez (2019)
4	Atteveldt et al. (2017) Buscaldi (2011)

Software Requirements

R (3.4 or later) and RStudio

Hardware Requirements

Please bring your own laptop that meets the minimum system requirements for the quanteda package.

Literature

Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.

Hastie, T. J., Tibshirani, Robert J, & Friedman, Jerome H. (2013). The elements of statistical learning: data mining, inference, and prediction. New York, NY: Springer.

Jurka, T. P., Collingwood, L., Boydstun, A. E., Grossman, E., & Van Atteveldt, W. (2013). RTextTools: A Supervised Learning Package for Text Classification. The R Journal, 5, 6–12.

Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Pearson Prentice Hall.

Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., & Tingley, D. (2015). Computer-Assisted Text Analysis for Comparative Politics. Political Analysis, 23(2), 254–277.

Manning, C. D., & Schütze, H. (2001). Foundations of statistical natural language processing. Cambridge (Mass.): MIT press.

Wilkerson, J., & Casas, A. (2017). Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges. Annual Review of Political Science, 20(1), 529–544. https://doi.org/10.1146/annurev-polisci-052615-025542

Recommended Courses to Cover Before this One

Introduction to Quantitative Text Analysis

Install the app