ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

SD106 - Intro to GLM: Binary, Ordered and Multinomial Logistic, and Count Regression Models

Instructor Details

Instructor Photo

Federico Vegetti

Institution:
Università degli Studi di Milano

Instructor Bio

Federico Vegetti is a postdoctoral research fellow in Political Science at CEU. He gained his PhD in Political Science from the University of Mannheim in 2013.

His research interests include political psychology and behaviour, comparative politics, political economy, and quantitative research methods.

  @fedeunderstress


Course Dates and Times

Monday 8 to Friday 12 August 2016
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days

Prerequisite Knowledge

a) Participants should have taken the course on “Multiple Regression Analysis: Estimation and Diagnostics” in the first week of the summer school or have obtained equivalent prior knowledge through other means. (the coruse might have a different title this year, please check)

b) The course relies heavily on the software R: students will be given examples of some rather abstract concepts like ”maximum likelihood” or ”data generating process” by means of statistical simulation. While this can be a very helpful tool, it requires that the students have a basic understanding of the R language. For students unfamiliar with R, a preparatory course will be offered prior to the first week. Otherwise, online resources are plenty. I recommend the tutorial ”Try R”, available online for free (http://tryr.codeschool.com/) and/or the ”Foundations” section of the online tutorial by Hadley Wickham (http://adv-r.had.co.nz/). A good introduction book is “R in a Nutshell – A Desktop Quick Reference” by Joseph Adler (O’Reilly, 2010).

c) Students are expected to understand the logic of inferential statistics. Students familiar with R but in need of a refresher in basic statistics are encouraged to take part in the preparatory course on statistics.

d) The course will use some matrix algebra notation, hence the student should have some familiarity with the logic of matrix algebra.

Short Outline

The aim of this course is to offer a detailed but accessible introduction to generalized linear modeling (GLM). Political scientists are often confronted with outcome variables that are not linear, such as survey respondents' choices among two or more options, ordinal survey items, or event counts. GLM is a common technique used to perform regression in these cases. The aim of this course is to make students comfortable with applying GLM techniques to a variety of outcome variables. The course discusses the logic of GLM and maximum likelihood estimation, with applications to binary, ordinal, categorical, and count data. Particular emphasis will be put on interpreting estimates, obtaining quantities of interest, and visualizing the results in a compelling way.

Long Course Outline

Political scientists are often interested in studying outcome variables that are not linear in nature. For instance, scholars may be interested in studying discrete choices among two or more options (e.g. voting or abstaining, choosing party X instead of party Y or Z, etc.) or the number of times a particular event is repeated (e.g. in how many wars a country was involved over a given period of time). In these cases, using OLS regression may produce biased or even meaningless results.

 

This course is meant to provide an introduction to a usual technique employed to tackle with some common types of non-continuous dependent variables, namely generalized linear modeling (GLM). By reflecting on the type of data generating process behind the observed outcomes, the course aims to make the students comfortable with concepts such as linear predictor, link function, and maximum likelihood. Moreover, by relying on statistical simulation as well as real-world data, the course will provide students with some tools to generate and report quantities of interest obtained via GLM regressions.

 

The course starts in medias res, discussing the most simple and common example where GLM is needed, namely binary response variables. Students will be encouraged to discuss what potential problems may arise when applying linear regression to dichotomous variables, which assumptions are violated, and why it matters. With this example in mind, the course proceeds with a general introduction to the logic of GLM and to the Maximum Likelihood method to estimate models in the GLM framework. Then, the course focuses on issues arising from the interpretation of the coefficients. In this part, we will discuss some strategies to obtain  quantities of interest (i.e. predicted probabilities) and visualize them in a compelling way. This will include the interpretation of interaction terms, and the graphic presentation of interaction effects. In the last two days, the course covers three less common but nevertheless important types of outcome variables: ordinal, categorical, and counts. On the fourth day, ordinal and multinomial logit models are discussed, as a generalization of the framework introduced in the study of binary outcomes. On the fifth day, the focus moves to poisson and negative binomial regression models.

 

The lab sessions will be based on the open-source statistical software R (www.r-project.org). Because several functions will require the use of additional packages, it is recommended that the students bring their own laptops, so that the download and installation of additional components will proceed smoothly. The lecturer will provide all the datasets necessary for the lab exercises.

 

The course assumes a basic understanding of descriptive statistics and probability theory (e.g. types of variables, basic statistics, common distributions) and some proficiency with linear regression analysis (i.e. how to interpret an OLS output). Moreover, the course will assume some familiarity with R. Note that this is not an introductory course to R. Although the lecturer will be open to explain and discuss the code used for the exercises, it is assumed that the students can move within the R environment with a certain degree of confidence.

Day-to-Day Schedule

Day-to-Day Reading List

Software Requirements
Hardware Requirements

Participants need to bring their own laptop with software installed.

Literature

Books:

Fox, J., 2008. Applied Regression Analysis and Generalized Linear Models, Sage. (Ch. 14, 15)

Eliason, 1993. Maximum Likelihood Estimation. Logic and Practice. Sage. (Ch. 1, 2)

Enders, C.K., 2010. Applied Missing Data Analysis. Guilford Press. (Ch. 3 – “An Introduction to Maximum Likelihood Estimation” offers a clear and intuitive discussion of ML)

King, G. 1998. Unifying Political Methodology. University of Michigan Press. (Ch. 1, 2 for a conceptual discussion of the inferential logic and the likelihood. Ch. 3, 4, 5 are optional, but recommended)

Long, J. Scott, 1997. Regression Models for Categorical and Limited Dependent Variables. Sage. (Ch. 5, 6, 8)

Books about R:

Adler, J., 2010. R in a Nutshell – A Desktop Quick Reference, O'Reilly. (A general introduction to R, we will take some examples from the book in the lab sessions)

Articles:

Benoit, K., 1996. Democracies Really Are More Pacific (in general). Journal of Conflict Resolution.

Berry, W.D., DeMeritt, J.H.R., and Esarey, J., 2010. Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential? American Journal of Political Science.

Berry, W.D., Golder, M., and Milton, D., 2012. Improving Tests of Theories Positing Interaction. Journal of Politics.

Brambor, T., Clark, W. R., Golder, M., 2006. Understanding Interaction Models: Improving Empirical Analyses, Political Analysis.

Braumoeller, B.F., 2004. Hypothesis testing and multiplicative interaction terms. International Organization.

Further readings:

Fitzmaurice, G.M, Laird, N.M., Ware, J.H., 2004. Applied Longitudinal Analysis. Wiley. (Ch. 10 – “Review of Generalized Linear Models” is yet another reading explaining the logic of GLMs, like the Fox chapter. You don't have to read it all – definitely skip the SAS part – but it might be useful to hear the same concepts repeated in a different context)

The following other ECPR Methods School courses could be useful in combination with this one in a ‘training track .
Recommended Courses Before

Introduction to R

Introduction to Multivariate Linear Regression

Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Convenors

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.


Share this page