ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Quantitative Text Analysis

Iñaki Sagarzazu
inaki.sagarzazu@Ttu.edu

Texas Tech University

Iñaki Sagarzazu is an Assistant Professor in Political Science at Texas Tech University. Prior to joining Texas Tech he was a Lecturer in Comparative Politics at the University of Glasgow and a postdoctoral researcher at Nuffield College, Oxford. He earned his PhD at the University of Houston.

Iñaki's research focuses on comparative politics, with a special focus on statistical content analysis with applications to political communication and institutions.

He has taught courses on Text Analysis at the IPSA Summer Schools in São Paulo and Singapore, and at the ECPR Winter School.

  @YVPolis


Course Dates and Times

Monday 29 February to Friday 4 March 2016
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days

Prerequisite Knowledge

The following skills are helpful though not necessarily required to be able to follow the course: • Familiarity with the R statistical software package • Basic knowledge of the STATA statistical software package • Basic knowledge of statistical analysis • Familiarity with a Text editor and with the handling of text files


Short Outline

This applied course will provide you with an overview of quantitative text analysis methods that allow you to systematically extract information from political texts. The course will start with more traditional approaches such as manual hand-coding, but quickly moves to recent advances in political methodology that treat words as data. The course will begin with important concepts in content analysis such as content validity and intercoder reliability. We will afterwards take a closer look at manual hand-coding approaches before turning to computer-assisted dictionary-based text analysis techniques. This will be followed by a discussion of Wordscores and Wordfish, two cutting-edge content analysis approaches that allow you to automatically extract policy positions from political texts. Finally, we will cover automated document classification techniques which allow for automatically classifying texts into different thematic areas. The course will combine theoretical sessions with practical exercises to allow participants to immediately apply the presented techniques.


Long Course Outline

This applied course will provide you with an overview of quantitative text analysis methods that allow you to systematically extract information from political texts. The course will start with more traditional approaches such as manual hand-coding, but quickly moves to recent advances in political methodology that treat words as data. The course will begin with important concepts in content analysis such as content validity and intercoder reliability. We will then take a closer look at manual coding approaches as for instance employed in the famous Comparative Manifesto Project which rely on human coders to code the content of texts according to a predefined category scheme. Afterwards, we will move to automated text analysis techniques by first discussing computer-assisted dictionary-based text analysis. Dictionary-based content analysis employs computers to code the content of documents by relying on a humanly devised codebook which assigns individual words to specific thematic categories. Next, we will deal with fully computerized text analysis techniques. We will first deal with Wordscores and Wordfish, two cutting-edge techniques that allow you to automatically extract policy positions from political texts such as election manifestos or speeches. Finally, we will cover automated document classification approaches which allow for automatically classifying texts into different thematic areas. For instance, using such document classification techniques, researchers can automatically classify thousands of texts such as press releases or laws into different policy areas. This course is an applied course for beginners and intermediate users of content analysis that provides participants with an overview of the theoretical foundations of quantitative text analysis, but which is mainly practical and applied so that participants learn how to use these methods in their own research. The course will therefore combine theoretical sessions with practical exercises to allow participants to immediately apply the presented techniques.

Day Topic Details
1 Introduction / Hand-Coding Two 90min lectures
2 Dictionary Coding / Dictionary Coding Exercise 90 min lecture, 90 min lab
3 Wordscores / Words as data exercise 90 min lab, 90 min lecture
4 Wordfish / Ideal point measurement exercise 90 min lecture, 90 min lab
5 Document classification / Classifying documents exercise 90 min lecture, 90 min lab
Day Readings
1 Krippendorff 2004 Ch. 5,6,11,13 ; Klingemann et al. 2006 Ch 1, 8, Appendices
2 Neuendorf 2002, Ch 6; Laver & Garry 2000; Practical homework assignment
4 Slapin/Proksch 2008; Klüver 2009; Practical homework assignment
3 Laver/Garry 2003; Slapin/Proksch 2009; Practical homework assignment
5 Grimmer 2010

Software Requirements

R STATA Yoshikoder JFreq

Hardware Requirements

No specific requirements

Literature

Alexa, Melina and Cornelia Zu¨ll. 2000. “Text Analysis Software: Commonalities, Differences and Limitations: The Results of a Review.” Quality and Quantity 34(3):299–321. Benoit, Kenneth and Michael Laver. 2003. “Estimating Irish party policy positions using computer wordscoring.” Irish Political Studies 18(1):97–107. Feinerer, Ingo. 2008. “An introduction to text mining in R.” R News 8(2):19–22. Feinerer, Ingo. 2011. TM Package Reference Manual. Version 0.5-6, URL (consulted November 2011): http://tm.r-forge.r-project.org/. Feinerer, Ingo, Kurt Hornik and David Meyer. 2008. “Text Mining Infrastructure in R.” Journal of Statistical Software 25(5):1–54. Grimmer, Justin. 2010. “A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases.” Political Analysis 18(1):1–35. Hart, Roderick P. and Jay P. Childers. 2005. “The Evolution of Candidate Bush.” American Behavioral Scientist 49(2):180–197. Klemmensen, Robert, Sara Binzer Hobolt and Martin Ejnar Hansen. 2007. “Estimating policy positions using political texts: An evaluation of the Wordscores approach.” Electoral Studies 26(4):746–755. Klingemann, Hans-Dieter, Andrea Volkens, Judith Bara, Ian Budge and Michael McDonald. 2006. Mapping Policy Preferences II: Estimates for Parties, Electors, and Governments in Eastern Europe, European Union and OECD 1990-2003. Oxford: Oxford University Press. Klu¨ver, Heike. 2009. “Measuring interest group influence using quantitative text analysis.” European Union Politics 10(4):535–549. Krippendorff, Klaus. 2004. Content Analysis: An Introduction to Its Methodology. 2 ed. Thousand Oaks: Sage. Laver, Michael and John Garry. 2000. “Estimating Policy Positions from Political Texts.” American Journal of Political Science 44(3):619–634. Laver, Michael, Kenneth Benoit and John Garry. 2003. “Extracting policy positions from political texts using word as data.” American Political Science Review 97(2):311–331. Lowe, Will. 2003. Software for Content Analysis A Review. Technical Report for the Identity Project: Weatherhead Center for International Affairs, Harvard University. Lowe, Will, Ken Benoit, Slava Mikhaylov and Michael Laver. 2011. “Scaling policy positions from coded units of political texts.” Legislative Studies Quarterly 36(1):123–155. Mikhaylov, Slava, Michael Laver and Kenneth Benoit. 2010. Coder Reliability and Misclassification in Comparative Manifesto Project Codings. Paper presented at the 66th National Conference of the Midwest Political Science Association: Chicago, 3-6 April 2008. Neuendorf, Kimberly A. 2002. The Content Analysis Guidebook. Thousand Oaks: Sage. Proksch, Sven-Oliver and Jonathan B. Slapin. 2009a. “How to avoid pitfalls in statistical analysis of political texts: The case of Germany.” German Politics 18(3):323–344. Proksch, Sven-Oliver and Jonathan B. Slapin. 2009b. WORDFISH Manual. Version 1.3, URL (consulted Sept. 2009): http://www.wordfish.org. Proksch, Sven-Oliver and Jonathan B. Slapin. 2010. “Position Taking in European Parliament Speeches.” British Journal of Political Science 40(3):587–611. Quinn, Kevin M., Burt Monroe, Michael Colaresi, Michael Crespin and Drago Radev. 2010. “How to analyze political attention with minimal assuptions and costs.” American Journal of Political Science 54(1):209–228. Slapin, Jonathan and Sven-Oliver Proksch. 2008. “A Scaling Model for Estimating Time Series Policy Positions from Texts.” American Journal of Political Science 52(8):705722. Veen, Tim. 2011. “Positions and salience in European Union politics: Estimation and validation of a new dataset.” European Union Politics 12(2):267–288.

Recommended Courses to Cover Before this One

<p>Introduction to R Webscraping with R</p>

Recommended Courses to Cover After this One

<p>Introduction to Python Webscraping with R</p>


Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed at the time of change.

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, please contact us before registering.