ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Back to Panel Details
Back to Panel Details

Introduction to Quantitative Text Analysis

Iñaki Sagarzazu
inaki.sagarzazu@Ttu.edu

Texas Tech University

Iñaki Sagarzazu is an Assistant Professor in Political Science at Texas Tech University. Prior to joining Texas Tech he was a Lecturer in Comparative Politics at the University of Glasgow and a postdoctoral researcher at Nuffield College, Oxford. He earned his PhD at the University of Houston.

Iñaki's research focuses on comparative politics, with a special focus on statistical content analysis with applications to political communication and institutions.

He has taught courses on Text Analysis at the IPSA Summer Schools in São Paulo and Singapore, and at the ECPR Winter School.

  @YVPolis


Course Dates and Times

Monday 6 to Friday 10 March 2017
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days

Prerequisite Knowledge

The following skills are helpful though not necessarily required to be able to follow the course:

  • Familiarity with the R statistical software package
  • Basic knowledge of statistical analysis
  • Familiarity with a Text editor and with the handling of text files


Short Outline

This applied course will provide you with an introduction of quantitative text analysis methods that allow you to systematically extract information from texts. The course will start with more traditional approaches such as manual hand-coding, but quickly moves to recent advances in social science methodology that treat words as data. The course will begin with important concepts in content analysis such as content validity and intercoder reliability. We will afterwards take a closer look at manual hand-coding approaches before turning to computer-assisted dictionary-based text analysis techniques. This will be followed by a discussion of Sentiment Analysis and the scaling technique Wordscores, two cutting-edge content analysis approaches that allow you to automatically extract information from social science texts. The course will combine theoretical sessions with practical exercises to allow participants to immediately apply the presented techniques.

 

 


Long Course Outline

This applied course will provide you with an introduction of quantitative text analysis methods that allow you to systematically extract information from texts. The course will start with more traditional approaches such as manual hand-coding, but quickly moves to recent advances in social science methodology that treat words as data. The course will begin with important concepts in content analysis such as content validity and intercoder reliability. We will afterwards take a closer look at manual hand-coding approaches as for instance employed in the famous Comparative Manifesto Project which rely on human coders to code the content of texts according to a predefined category scheme. Afterwards, we will move to computer-assisted dictionary-based text analysis techniques. Dictionary-based content analysis employs computers to code the content of documents by relying on previously devised codebooks, which assigns individual words to specific thematic categories. Next, we will deal with refinements of the dictionary approach such as Sentiment Analysis and Wordscores. While the former approach allows for the study of attitudes or emotions in texts, the latter allows the researcher to automatically extract policy positions from political texts such as election manifestos or speeches. This course is an applied course for beginners and intermediate users of content analysis that provides participants with an overview of the theoretical foundations of quantitative text analysis, but which is mainly practical and applied so that participants learn how to use these methods in their own research. The course will therefore combine theoretical sessions with practical exercises to allow participants to immediately apply the presented techniques. Other more advanced techniques of unsupervised scaling and topic coding will be reserved for an advanced version of this class for which this class is a requirement.

Day Topic Details
Monday Introduction / Hand-Coding

Two 90-minute lectures

Tuesday Working with Text as Data

90-minute lecture, 90-minute lab

Wednesday Dictionary Coding

90-minute lecture, 90-minute lab

Thursday Sentiment Analysis

90-minute lecture, 90-minute lab

Friday Supervised Scaling: Wordscores

90-minute lecture, 90-minute lab

1 Introduction / Hand-Coding

Two 90min lectures

2 Working with Text as Data

90 min lecture, 90 min lab

3 Dictionary Coding

90 min lab, 90 min lecture

4 Sentiment Analysis

90 min lecture, 90 min lab

5 Supervised Scaling: Wordscores

90 min lecture, 90 min lab

Day Readings
Monday

Krippendorff 2004 Ch. 5,6,11,13 ; Klingemann et al. 2006 Ch 1, 8, Appendices

Tuesday

Klüver 2009, Porter 1980

Wednesday

Laver and Garry 2000; Langer and Sagarzazu, 2016

Thursday

Hu and Liu 2004; Thomas et al 2006; Hart and Childers 2005

Friday

Laver et al 2003; Klemmensen et al 2007; Benoit and Laver 2003

For the precise literature references, see reference list below.

1

Krippendorff 2004 Ch. 5,6,11,13 ; Klingemann et al. 2006 Ch 1, 8, Appendices

2

Klüver 2009, Porter 1980

3

Laver and Garry 2000; Langer and Sagarzazu, 2016

4

Hu and Liu 2004; Thomas et al 2006; Hart and Childers 2005

5

Laver et al 2003; Klemmensen et al 2007; Benoit and Laver 2003

Software Requirements

R

Hardware Requirements

No specific requirements

Literature

Alexa, Melina and Cornelia Zu¨ll. 2000. “Text Analysis Software: Commonalities, Differences and Limitations: The Results of a Review.” Quality and Quantity 34(3):299–321.

Benoit, Kenneth and Michael Laver. 2003. “Estimating Irish party policy positions using computer wordscoring.” Irish Political Studies 18(1):97–107.

Feinerer, Ingo. 2008. “An introduction to text mining in R.” R News 8(2):19–22.

Feinerer, Ingo. 2011. TM Package Reference Manual. Version 0.5-6, URL (consulted November 2011): http://tm.r-forge.r-project.org/.

Feinerer, Ingo, Kurt Hornik and David Meyer. 2008. “Text Mining Infrastructure in R.” Journal of Statistical Software 25(5):1–54.

Hart, Roderick P. and Jay P. Childers. 2005. “The Evolution of Candidate Bush.” American Behavioral Scientist 49(2):180–197.

Hu, M. and B. Liu. 2004. Mining and summarizing customer reviews. In proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, page 168-177,

Klemmensen, Robert, Sara Binzer Hobolt and Martin Ejnar Hansen. 2007. “Estimating policy positions using political texts: An evaluation of the Wordscores approach.” Electoral Studies 26(4):746–755.

Klingemann, Hans-Dieter, Andrea Volkens, Judith Bara, Ian Budge and Michael McDonald. 2006. Mapping Policy Preferences II: Estimates for Parties, Electors, and Governments in Eastern Europe, European Union and OECD 1990-2003. Oxford: Oxford University Press.

Kluver, Heike. 2009. “Measuring interest group influence using quantitative text analysis.” European Union Politics 10(4):535–549.

Krippendorff, Klaus. 2004. Content Analysis:An  Introduction  to  Its Methodology.2 ed. Thousand Oaks: Sage.

Langer, Ana Ines and I~naki Sagarzazu. 2016 Are all policy decisions equal? explaining the variation in media coverage of the UK budget. Policy Studies Journal

Laver, Michael and John Garry. 2000. “Estimating Policy Positions from Political Texts.” American Journal of Political Science 44(3):619–634.

Laver, Michael, Kenneth Benoit and John Garry. 2003. “Extracting policy positions from political texts using word as data.” American Political Science Review 97(2):311–331.

Lowe, Will. 2003. Software for Content Analysis A Review. Technical Report for the Identity Project: Weatherhead Center for International Affairs, Harvard University.

Lowe, Will, Ken Benoit, Slava Mikhaylov and Michael Laver. 2011. “Scaling policy positions from coded units of political texts.” Legislative Studies Quarterly 36(1):123–155.

Mikhaylov, Slava, Michael Laver and Kenneth Benoit. 2010. Coder Reliability and Misclassification in Comparative Manifesto Project Codings. Paper presented at the 66th National Conference of the Midwest Political Science Association: Chicago, 3-6 April 2008.

Neuendorf, Kimberly A.  2002.  The Content Analysis Guidebook. Thousand Oaks:  Sage.

Thomas M., B Pang, and L Lee. 2006. Get out the vote: Determining support or opposition from congressional floor-debate transcripts. In EMNLP, page 327335

Veen, Tim. 2011. “Positions and salience in European Union politics: Estimation and validation of a new dataset.” European Union Politics 12(2):267–288.

Recommended Courses to Cover Before this One

<p>Introduction to R</p> <p>Webscraping with R</p>

Recommended Courses to Cover After this One

<p>Introduction to Python</p> <p>Webscraping with R</p>


Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Conveners

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.