Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Monday 29 July to Friday 2 August
09:00–10:30 and 11:00–12:30
The aim of this course is to offer a detailed, but accessible introduction to generalised linear modelling (GLM).
Political scientists are often confronted with outcome variables that are not linear, such as survey respondents' choices among two or more options, ordinal survey items, or event counts. GLM is a common technique used to perform regression in these cases.
The aim of this course is to make you comfortable with applying GLM techniques to a variety of outcome variables. It discusses the logic of GLM with applications to binary, ordinal, categorical, and count data. Particular emphasis will be put on obtaining quantities of interest and interpreting estimates.
2 credits (pass/fail grade) Attend 90% of classes and carry out the necessary reading and/or other work prior to, and after, class
3 credits (to be graded) As above, plus complete one task
4 credits (to be graded) As above, plus complete two tasks
Daily assignments will be graded daily, without feedback, as 0 Did not submit, 1 Insufficient, 2 Sufficient, 3 Excellent
The Instructor will set a deadline for completion of a take-home paper no later than three weeks after the end of the course.
Julia Koltai is an assistant professor at the Faculty of Social Sciences, Eötvös Loránd University. She is also a research fellow at the Centre for Social Sciences, Hungarian Academy of Sciences. She gained her PhD in sociology in 2013.
Julia has led several domestic research programs and has taken part in international research projects and groups, including EU FP6-funded programs.
Her main scientific focus is on statistics and social research methodology, so her research has ranged widely, from minority research through political participation to social justice and integration.
In recent years, Julia's interest has turned to computational social science, especially network analysis and big data processing.
Scientists are often interested in studying outcome variables that are not linear in nature. For instance, scholars may be interested in studying discrete choices among two or more options (e.g. voting or abstaining, choosing party X instead of party Y or Z, etc.) or the number of times a particular event is repeated (e.g. in how many wars a country was involved over a given period of time). In these cases, using OLS regression may produce biased or even meaningless results.
This course is meant to provide an introduction to a common technique for tackling some common types of non-continuous dependent variables, namely generalised linear modelling (GLM). By reflecting on the type of observed outcomes, with the use of real-world data, the course will enable you to report, explain and interpret quantities of interest via GLM regressions.
Each day is divided to two parts. The first is a lecture about the day’s topic and the second a lab session, where we see the theory in practice on real-world datasets. Modelling techniques of GLM are explained and applied by exercises using free access social science data, or you can use your own data for analyses. Daily assignments allow the application and transfer of GLM methodology to your own research interests.
The course starts in medias res, discussing the most simple and common example where GLM is needed, namely binary response variables. We will discuss the problems that may arise when applying linear regression to dichotomous variables, which assumptions are violated, and why it matters. With this example in mind, the course proceeds with a general introduction to the logic of GLM and to the Maximum Likelihood method to estimate models in the GLM framework.
The course focuses on issues arising from the interpretation of coefficients. In this part, we will discuss strategies to obtain quantities of interest (i.e. predicted probabilities). The course also covers three other types of outcome variables: ordinal, categorical, and counts. Ordinal and multinomial logit models will be discussed, as a generalisation of the framework introduced in the study of binary outcomes. Focusing on count variables, I will introduce poisson and negative binomial regression models.
Lab sessions will be based on the open-source statistical software R. I will therefore assume you can move within the R environment with a certain degree of confidence.
Because several functions will require the use of additional packages, I recommend you bring your own laptop, so the download and installation of additional components will proceed smoothly. I will provide all the datasets necessary for the lab exercises.
The course also assumes a basic understanding of descriptive statistics and probability theory (e.g. level of measurement of variables, basic statistics, common distributions) and the understanding of OLS regression analysis.
You should have a deep understanding of Ordinary Least Squares (OLS) regression or have taken the Week 1 course Multiple Regression Analysis: Estimation, Diagnostics, and Modelling or have obtained equivalent prior knowledge through other means.
You should have some familiarity with using the software R to manage data.
If you are unfamiliar with R, I recommend you take Akos Mate's preparatory course R Basics prior to the first week. Otherwise, online resources are plentiful, including:
DataCamp Introduction to R Course
A website with tutorial and examples
A good introductory book is R in a Nutshell – A Desktop Quick Reference by Joseph Adler (O’Reilly, 2010)
Each course includes pre-course assignments, including readings and pre-recorded videos, as well as daily live lectures totalling at least two hours. The instructor will conduct live Q&A sessions and offer designated office hours for one-to-one consultations.
Please check your course format before registering.
Live classes will be held daily for two hours on a video meeting platform, allowing you to interact with both the instructor and other participants in real-time. To avoid online fatigue, the course employs a pedagogy that includes small-group work, short and focused tasks, as well as troubleshooting exercises that utilise a variety of online applications to facilitate collaboration and engagement with the course content.
In-person courses will consist of daily three-hour classroom sessions, featuring a range of interactive in-class activities including short lectures, peer feedback, group exercises, and presentations.
This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc.). Registered participants will be informed at the time of change.
By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, please contact us before registering.
Day | Topic | Details |
---|---|---|
1 | Modelling Binary Response Variables: what to do? |
Lecture Linear probability models, problems and alternatives. Introduction to binary logistic regression. Lab Specifying and interpreting binary logistic regression |
2 | The general logic of GLM and Maximum Likelihood. Interactions. |
Lecture The frame of GLM: distributions, link functions and Maximum Likelihood estimation. Problems and solutions for including interaction to logit and probit models. Marginal effects Lab Scripts for GLM, different link functions, inside ML estimation. Interpreting marginal effects |
3 | Modelling categorical variables |
Lecture Multinomial logistic regression Lab Specifying and interpreting multinomial logit models |
4 | Modelling ordinal variables |
Lecture Ordered logistic regression Lab Specifying and interpreting ordered logit models |
5 | Modelling count variables |
Lecture Poisson and negative binomial regression Lab Specifying and interpreting poisson and negative binomial models |
Day | Readings |
---|---|
1 |
Fox, J. (2008). Applied regression analysis and generalized linear models (2nd ed.). Thousand Oaks, CA, US: Sage Publications, Inc. Chapter 14: Logit and Probit Models for Categorical Response Variables. Chapter 15. Generalized Linear Models. Field, A. P., Miles, J., & Field, Z. (2012). Discovering statistics using R. London: Sage. Chapter 8: Logistic Regression |
2 |
Enders, C. K. (2010). Applied missing data analysis. New York, NY, US: Guilford Press. Chapter 3: An Introduction to Maximum Likelihood Estimation. Eliason, S. R. (1993). Maximum Likelihood Estimation. Logic and Practice. Thousand Oaks, CA: Sage Publications, Inc. Chapter 1: Introduction: The Logic of Maximum Likelihood. Chapter 2: A General Modeling Framework Using Maximum Likelihood Methods. Allison, P. D. (1999): Comparing logit and probit coefficients across groups. Sociological Methods & Research 28(2): 186–208. Mood, C. (2010): Logistic regression: Why we cannot do what we think we can do, and what we can do about it? European Sociological Review 26(1): 67–82. Marginal Effects in Lewis-Beck, M. S., Bryman, A., & Futing Liao, T. (2004). The SAGE encyclopedia of social science research methods. Thousand Oaks, CA: Sage Publications, Inc. Alan Fernihough, 2011. "Simple logit and probit marginal effects in R," Working Papers 201122, School of Economics, University College Dublin. |
3 |
Long, J. S. (1997): Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Inc. Chapter 6: Nominal Outcomes. Multinomial Logit and Related Models. |
4 |
Long, J. S. (1997): Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Inc. Chapter 5: Ordinal outcomes. Ordered Logit and Ordered Probit Analysis. |
5 |
Fox, J. (2008). Applied regression analysis and generalized linear models (2nd ed.). Thousand Oaks, CA, US: Sage Publications, Inc. Chapter 15. Generalized Linear Models. Long, J. S. (1997): Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Inc. Chapter 8: Count outcomes. Regression Models for Counts. |
R version 3.5.2 or higher with RStudio Desktop version 1.1.463 or higher.
Please bring your own own laptop – PC/Windows and Mac are OK with R Studio – with the abovementioned software version and packages already installed.
You will need user privileges to install R and R packages. If you have limited access – because, for example, you are using a work laptop and unsure – consult an IT professional at your institution in the first instance.
Books
Fox, J., 2008. Applied Regression Analysis and Generalized Linear Models, Sage. (Ch. 14, 15)
Eliason, 1993. Maximum Likelihood Estimation. Logic and Practice. Sage. (Ch. 1, 2)
Enders, C.K., 2010. Applied Missing Data Analysis. Guilford Press.
(Ch. 3 – An Introduction to Maximum Likelihood Estimation – offers a clear and intuitive discussion of ML)
King, G. 1998. Unifying Political Methodology. University of Michigan Press.
(Ch. 1, 2 for a conceptual discussion of the inferential logic and the likelihood. Ch. 3, 4, 5 are optional, but recommended)
Long, J. Scott, 1997. Regression Models for Categorical and Limited Dependent Variables. Sage. (Ch. 5, 6, 8)
Books about R
Adler, J., 2010. R in a Nutshell – A Desktop Quick Reference, O'Reilly.
A general introduction to R; we will take some examples from the book in the lab sessions
Articles
Benoit, K., 1996. Democracies Really Are More Pacific (in general). Journal of Conflict Resolution.
Berry, W.D., DeMeritt, J.H.R., and Esarey, J., 2010. Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential? American Journal of Political Science.
Berry, W.D., Golder, M., and Milton, D., 2012. Improving Tests of Theories Positing Interaction. Journal of Politics.
Brambor, T., Clark, W. R., Golder, M., 2006. Understanding Interaction Models: Improving Empirical Analyses, Political Analysis.
Braumoeller, B.F., 2004. Hypothesis testing and multiplicative interaction terms. International Organization.
Further reading
Fitzmaurice, G.M, Laird, N.M., Ware, J.H., 2004. Applied Longitudinal Analysis. Wiley. (Ch. 10 – Review of Generalized Linear Models is yet another explanation of the logic of GLMs, like the Fox chapter. You don't have to read it all – definitely skip the SAS part – but it might be useful to hear the same concepts repeated in a different context)
Summer School
R Basics
Refresher in Regression (before you take a more advanced stats class)
Introduction to Inferential Statistics: What you need to know before you take regression
Multiple Regression Analysis: Estimation, Diagnostics, and Modelling