Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Member rate £492.50
Non-Member rate £985.00
Save £45 Loyalty discount applied automatically*
Save 5% on each additional course booked
*If you attended our Methods School in July/August 2023 or February 2024.
Monday 29 February to Friday 4 March 2016
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days
This applied course will provide you with an overview of quantitative text analysis methods that allow you to systematically extract information from political texts. The course will start with more traditional approaches such as manual hand-coding, but quickly moves to recent advances in political methodology that treat words as data. The course will begin with important concepts in content analysis such as content validity and intercoder reliability. We will afterwards take a closer look at manual hand-coding approaches before turning to computer-assisted dictionary-based text analysis techniques. This will be followed by a discussion of Wordscores and Wordfish, two cutting-edge content analysis approaches that allow you to automatically extract policy positions from political texts. Finally, we will cover automated document classification techniques which allow for automatically classifying texts into different thematic areas. The course will combine theoretical sessions with practical exercises to allow participants to immediately apply the presented techniques.
Iñaki Sagarzazu is an Assistant Professor in Political Science at Texas Tech University. Prior to joining Texas Tech he was a Lecturer in Comparative Politics at the University of Glasgow and a postdoctoral researcher at Nuffield College, Oxford. He earned his PhD at the University of Houston.
Iñaki's research focuses on comparative politics, with a special focus on statistical content analysis with applications to political communication and institutions.
He has taught courses on Text Analysis at the IPSA Summer Schools in São Paulo and Singapore, and at the ECPR Winter School.
This applied course will provide you with an overview of quantitative text analysis methods that allow you to systematically extract information from political texts. The course will start with more traditional approaches such as manual hand-coding, but quickly moves to recent advances in political methodology that treat words as data. The course will begin with important concepts in content analysis such as content validity and intercoder reliability. We will then take a closer look at manual coding approaches as for instance employed in the famous Comparative Manifesto Project which rely on human coders to code the content of texts according to a predefined category scheme. Afterwards, we will move to automated text analysis techniques by first discussing computer-assisted dictionary-based text analysis. Dictionary-based content analysis employs computers to code the content of documents by relying on a humanly devised codebook which assigns individual words to specific thematic categories. Next, we will deal with fully computerized text analysis techniques. We will first deal with Wordscores and Wordfish, two cutting-edge techniques that allow you to automatically extract policy positions from political texts such as election manifestos or speeches. Finally, we will cover automated document classification approaches which allow for automatically classifying texts into different thematic areas. For instance, using such document classification techniques, researchers can automatically classify thousands of texts such as press releases or laws into different policy areas. This course is an applied course for beginners and intermediate users of content analysis that provides participants with an overview of the theoretical foundations of quantitative text analysis, but which is mainly practical and applied so that participants learn how to use these methods in their own research. The course will therefore combine theoretical sessions with practical exercises to allow participants to immediately apply the presented techniques.
The following skills are helpful though not necessarily required to be able to follow the course: • Familiarity with the R statistical software package • Basic knowledge of the STATA statistical software package • Basic knowledge of statistical analysis • Familiarity with a Text editor and with the handling of text files
Day | Topic | Details |
---|---|---|
1 | Introduction / Hand-Coding | Two 90min lectures |
2 | Dictionary Coding / Dictionary Coding Exercise | 90 min lecture, 90 min lab |
3 | Wordscores / Words as data exercise | 90 min lab, 90 min lecture |
4 | Wordfish / Ideal point measurement exercise | 90 min lecture, 90 min lab |
5 | Document classification / Classifying documents exercise | 90 min lecture, 90 min lab |
Day | Readings |
---|---|
1 | Krippendorff 2004 Ch. 5,6,11,13 ; Klingemann et al. 2006 Ch 1, 8, Appendices |
2 | Neuendorf 2002, Ch 6; Laver & Garry 2000; Practical homework assignment |
4 | Slapin/Proksch 2008; Klüver 2009; Practical homework assignment |
3 | Laver/Garry 2003; Slapin/Proksch 2009; Practical homework assignment |
5 | Grimmer 2010 |
R STATA Yoshikoder JFreq
No specific requirements
Alexa, Melina and Cornelia Zu¨ll. 2000. “Text Analysis Software: Commonalities, Differences and Limitations: The Results of a Review.” Quality and Quantity 34(3):299–321. Benoit, Kenneth and Michael Laver. 2003. “Estimating Irish party policy positions using computer wordscoring.” Irish Political Studies 18(1):97–107. Feinerer, Ingo. 2008. “An introduction to text mining in R.” R News 8(2):19–22. Feinerer, Ingo. 2011. TM Package Reference Manual. Version 0.5-6, URL (consulted November 2011): http://tm.r-forge.r-project.org/. Feinerer, Ingo, Kurt Hornik and David Meyer. 2008. “Text Mining Infrastructure in R.” Journal of Statistical Software 25(5):1–54. Grimmer, Justin. 2010. “A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases.” Political Analysis 18(1):1–35. Hart, Roderick P. and Jay P. Childers. 2005. “The Evolution of Candidate Bush.” American Behavioral Scientist 49(2):180–197. Klemmensen, Robert, Sara Binzer Hobolt and Martin Ejnar Hansen. 2007. “Estimating policy positions using political texts: An evaluation of the Wordscores approach.” Electoral Studies 26(4):746–755. Klingemann, Hans-Dieter, Andrea Volkens, Judith Bara, Ian Budge and Michael McDonald. 2006. Mapping Policy Preferences II: Estimates for Parties, Electors, and Governments in Eastern Europe, European Union and OECD 1990-2003. Oxford: Oxford University Press. Klu¨ver, Heike. 2009. “Measuring interest group influence using quantitative text analysis.” European Union Politics 10(4):535–549. Krippendorff, Klaus. 2004. Content Analysis: An Introduction to Its Methodology. 2 ed. Thousand Oaks: Sage. Laver, Michael and John Garry. 2000. “Estimating Policy Positions from Political Texts.” American Journal of Political Science 44(3):619–634. Laver, Michael, Kenneth Benoit and John Garry. 2003. “Extracting policy positions from political texts using word as data.” American Political Science Review 97(2):311–331. Lowe, Will. 2003. Software for Content Analysis A Review. Technical Report for the Identity Project: Weatherhead Center for International Affairs, Harvard University. Lowe, Will, Ken Benoit, Slava Mikhaylov and Michael Laver. 2011. “Scaling policy positions from coded units of political texts.” Legislative Studies Quarterly 36(1):123–155. Mikhaylov, Slava, Michael Laver and Kenneth Benoit. 2010. Coder Reliability and Misclassification in Comparative Manifesto Project Codings. Paper presented at the 66th National Conference of the Midwest Political Science Association: Chicago, 3-6 April 2008. Neuendorf, Kimberly A. 2002. The Content Analysis Guidebook. Thousand Oaks: Sage. Proksch, Sven-Oliver and Jonathan B. Slapin. 2009a. “How to avoid pitfalls in statistical analysis of political texts: The case of Germany.” German Politics 18(3):323–344. Proksch, Sven-Oliver and Jonathan B. Slapin. 2009b. WORDFISH Manual. Version 1.3, URL (consulted Sept. 2009): http://www.wordfish.org. Proksch, Sven-Oliver and Jonathan B. Slapin. 2010. “Position Taking in European Parliament Speeches.” British Journal of Political Science 40(3):587–611. Quinn, Kevin M., Burt Monroe, Michael Colaresi, Michael Crespin and Drago Radev. 2010. “How to analyze political attention with minimal assuptions and costs.” American Journal of Political Science 54(1):209–228. Slapin, Jonathan and Sven-Oliver Proksch. 2008. “A Scaling Model for Estimating Time Series Policy Positions from Texts.” American Journal of Political Science 52(8):705722. Veen, Tim. 2011. “Positions and salience in European Union politics: Estimation and validation of a new dataset.” European Union Politics 12(2):267–288.
Introduction to R Webscraping with R
Introduction to Python Webscraping with R