Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”


Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Introduction to Quantitative Text Analysis

Member rate £492.50
Non-Member rate £985.00

Save £45 Loyalty discount applied automatically*
Save 5% on each additional course booked

*If you attended our Methods School in July/August 2023 or February 2024.

Course Dates and Times

Monday 30 July - Friday 3 August

09:00-10:30 / 11:00-12:30

Iñaki Sagarzazu

Texas Tech University

This applied course will provide you with an introduction of quantitative text analysis methods that allow you to systematically extract information from texts. The course will start with more traditional approaches such as manual hand-coding, but quickly moves to recent advances in social science methodology that treat words as data. The course will begin with important concepts in content analysis such as content validity and intercoder reliability. We will afterwards take a closer look at manual hand-coding approaches before turning to computer-assisted dictionary-based text analysis techniques. This will be followed by a discussion of Sentiment Analysis and the scaling technique Wordscores, two cutting-edge content analysis approaches that allow you to automatically extract information from social science texts. The course will combine theoretical sessions with practical exercises to allow participants to immediately apply the presented techniques.

Tasks for ECTS Credits

  • Participants attending the course: 2 credits (pass/fail grade) The workload for the calculation of ECTS credits is based on the assumption that students attend classes and carry out the necessary reading and/or other work prior to, and after, classes.
  • Participants attending the course and completing one task (see below): 3 credits (to be graded)
  • Participants attending the course, and completing two tasks (see below): 4 credits (to be graded)

To receive 2 ECTS, you will have done the readings and taken part actively in the course.

For an additional credit you will need to complete the daily assignments (easy to moderate demand), due in by the following day's class.

For an additional two credits, a take home exam will be set.  The deadline for returning the exam will be set during the class.

Instructor Bio

Iñaki Sagarzazu is an Assistant Professor in Political Science at Texas Tech University. Prior to joining Texas Tech he was a Lecturer in Comparative Politics at the University of Glasgow and a postdoctoral researcher at Nuffield College, Oxford. He earned his PhD at the University of Houston.

Iñaki's research focuses on comparative politics, with a special focus on statistical content analysis with applications to political communication and institutions.

He has taught courses on Text Analysis at the IPSA Summer Schools in São Paulo and Singapore, and at the ECPR Winter School.


This applied course will provide you with an introduction of quantitative text analysis methods that allow you to systematically extract information from texts. The course will start with more traditional approaches such as manual hand-coding, but quickly moves to recent advances in social science methodology that treat words as data. The course will begin with important concepts in content analysis such as content validity and intercoder reliability. We will afterwards take a closer look at manual hand-coding approaches as for instance employed in the famous Comparative Manifesto Project which rely on human coders to code the content of texts according to a predefined category scheme. Afterwards, we will move to computer-assisted dictionary-based text analysis techniques. Dictionary-based content analysis employs computers to code the content of documents by relying on previously devised codebooks, which assigns individual words to specific thematic categories. Next, we will deal with refinements of the dictionary approach such as Sentiment Analysis and Wordscores. While the former approach allows for the study of attitudes or emotions in texts, the latter allows the researcher to automatically extract policy positions from political texts such as election manifestos or speeches. This course is an applied course for beginners and intermediate users of content analysis that provides participants with an overview of the theoretical foundations of quantitative text analysis, but which is mainly practical and applied so that participants learn how to use these methods in their own research. The course will therefore combine theoretical sessions with practical exercises to allow participants to immediately apply the presented techniques. Other more advanced techniques of unsupervised scaling and topic coding will be reserved for an advanced version of this class for which this class is a requirement.

The following skills are helpful though not necessarily required to be able to follow the course:

  • Familiarity with the R statistical software package
  • Basic knowledge of statistical analysis
  • Familiarity with a Text editor and with the handling of text files
Day Topic Details
1 Introduction / Hand-Coding

Two 90min lectures

2 Working with Text as Data

90min lecture, 90min lab

3 Dictionary Coding

90min lab, 90min lecture

4 Sentiment Analysis

90min lecture, 90min lab

5 Supervised Scaling: Wordscores

90min lecture, 90min lab

Day Readings

Krippendorff 2004 Ch. 5,6,11,13 ; Klingemann et al. 2006 Ch 1, 8, Appendices


Klüver 2009, Porter 1980


Laver and Garry 2000; Langer and Sagarzazu, 2016


Hu and Liu 2004; Thomas et al 2006; Hart and Childers 2005


Laver et al 2003; Klemmensen et al 2007; Benoit and Laver 2003


For the precise literature references, see reference list below.

Software Requirements



Hardware Requirements



Alexa, Melina and Cornelia Zu¨ll. 2000. “Text Analysis Software: Commonalities, Differences and Limitations: The Results of a Review.” Quality and Quantity 34(3):299–321.

Benoit, Kenneth and Michael Laver. 2003. “Estimating Irish party policy positions using computer wordscoring.” Irish Political Studies 18(1):97–107.

Feinerer, Ingo. 2008. “An introduction to text mining in R.” R News 8(2):19–22.

Feinerer, Ingo. 2011. TM Package Reference Manual. Version 0.5-6, URL (consulted November 2011):

Feinerer, Ingo, Kurt Hornik and David Meyer. 2008. “Text Mining Infrastructure in R.” Journal of Statistical Software 25(5):1–54.

Hart, Roderick P. and Jay P. Childers. 2005. “The Evolution of Candidate Bush.” American Behavioral Scientist 49(2):180–197.

Hu, M. and B. Liu. 2004. Mining and summarizing customer reviews. In proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, page 168-177,

Klemmensen, Robert, Sara Binzer Hobolt and Martin Ejnar Hansen. 2007. “Estimating policy positions using political texts: An evaluation of the Wordscores approach.” Electoral Studies 26(4):746–755.

Klingemann, Hans-Dieter, Andrea Volkens, Judith Bara, Ian Budge and Michael McDonald. 2006. Mapping Policy Preferences II: Estimates for Parties, Electors, and Governments in Eastern Europe, European Union and OECD 1990-2003. Oxford: Oxford University Press.

Klu¨ver, Heike. 2009.   “Measuring interest group influence using quantitative text analysis.”

European Union Politics 10(4):535–549.

Krippendorff, Klaus. 2004. Content  Analysis:  An  Introduction  to  Its  Methodology.  2 ed.

Thousand Oaks: Sage.

Langer, Ana Ines and I~naki Sagarzazu. 2016 Are all policy decisions equal? explaining the variation in media coverage of the UK budget. Policy Studies Journal

Laver, Michael and John Garry. 2000. “Estimating  Policy  Positions  from  Political  Texts.”

American Journal of Political Science 44(3):619–634.

Laver, Michael, Kenneth Benoit and John Garry. 2003. “Extracting policy positions from political texts using word as data.” American Political Science Review 97(2):311–331.

Lowe, Will. 2003. Software for Content Analysis A Review. Technical Report for the Identity Project: Weatherhead Center for International Affairs, Harvard University.

Lowe, Will, Ken Benoit, Slava Mikhaylov and Michael Laver. 2011. “Scaling policy positions from coded units of political texts.” Legislative Studies Quarterly 36(1):123–155.

Mikhaylov, Slava, Michael Laver and Kenneth Benoit. 2010. Coder Reliability and Misclassification in Comparative Manifesto Project Codings. Paper presented at the 66th National Conference of the Midwest Political Science Association: Chicago, 3-6 April 2008.

Neuendorf,  Kimberly  A.  2002.  The Content Analysis Guidebook.  Thousand  Oaks:  Sage.

Thomas M., B Pang, and L Lee. 2006. Get out the vote: Determining support or opposition from congressional floor-debate transcripts. In EMNLP, page 327335

Veen, Tim. 2011. “Positions and salience in European Union politics: Estimation and validation of a new dataset.” European Union Politics 12(2):267–288.

Recommended Courses to Cover Before this One

Summer School

Introduction to R

Winter School

Automated Web Data Collection with R

Recommended Courses to Cover After this One

Summer School

Introduction to Python

Winter School

Automated Web Data Collection with R