ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Introduction to Logistic Regression and General Linear Models: Binary, Ordered, Multinomial and Count Outcomes

Course Dates and Times

Monday 29 July to Friday 2 August

09:00–10:30 and 11:00–12:30

Julia Koltai

koltai.juli@gmail.com

Eötvös Loránd University

The aim of this course is to offer a detailed, but accessible introduction to generalised linear modelling (GLM).

Political scientists are often confronted with outcome variables that are not linear, such as survey respondents' choices among two or more options, ordinal survey items, or event counts. GLM is a common technique used to perform regression in these cases.

The aim of this course is to make you comfortable with applying GLM techniques to a variety of outcome variables. It discusses the logic of GLM with applications to binary, ordinal, categorical, and count data. Particular emphasis will be put on obtaining quantities of interest and interpreting estimates.

ECTS Credits

2 credits (pass/fail grade) Attend 90% of classes and carry out the necessary reading and/or other work prior to, and after, class

3 credits (to be graded) As above, plus complete one task

4 credits (to be graded) As above, plus complete two tasks

Daily assignments will be graded daily, without feedback, as 0 Did not submit, 1 Insufficient, 2 Sufficient, 3 Excellent

The Instructor will set a deadline for completion of a take-home paper no later than three weeks after the end of the course.

 


Instructor Bio

Julia Koltai is an assistant professor at the Faculty of Social Sciences, Eötvös Loránd University. She is also a research fellow at the Centre for Social Sciences, Hungarian Academy of Sciences. She gained her PhD in sociology in 2013.

Julia has led several domestic research programs and has taken part in international research projects and groups, including EU FP6-funded programs.

Her main scientific focus is on statistics and social research methodology, so her research has ranged widely, from minority research through political participation to social justice and integration.

In recent years, Julia's interest has turned to computational social science, especially network analysis and big data processing.

  @koltaijuli

Scientists are often interested in studying outcome variables that are not linear in nature. For instance, scholars may be interested in studying discrete choices among two or more options (e.g. voting or abstaining, choosing party X instead of party Y or Z, etc.) or the number of times a particular event is repeated (e.g. in how many wars a country was involved over a given period of time). In these cases, using OLS regression may produce biased or even meaningless results.

This course is meant to provide an introduction to a common technique for tackling some common types of non-continuous dependent variables, namely generalised linear modelling (GLM). By reflecting on the type of observed outcomes, with the use of real-world data, the course will enable you to report, explain and interpret quantities of interest via GLM regressions.

Each day is divided to two parts. The first is a lecture about the day’s topic and the second a lab session, where we see the theory in practice on real-world datasets. Modelling techniques of GLM are explained and applied by exercises using free access social science data, or you can use your own data for analyses. Daily assignments allow the application and transfer of GLM methodology to your own research interests.

The course starts in medias res, discussing the most simple and common example where GLM is needed, namely binary response variables. We will discuss the problems that may arise when applying linear regression to dichotomous variables, which assumptions are violated, and why it matters. With this example in mind, the course proceeds with a general introduction to the logic of GLM and to the Maximum Likelihood method to estimate models in the GLM framework.

The course focuses on issues arising from the interpretation of coefficients. In this part, we will discuss strategies to obtain quantities of interest (i.e. predicted probabilities). The course also covers three other types of outcome variables: ordinal, categorical, and counts. Ordinal and multinomial logit models will be discussed, as a generalisation of the framework introduced in the study of binary outcomes. Focusing on count variables, I will introduce poisson and negative binomial regression models.

Lab sessions will be based on the open-source statistical software R. I will therefore assume you can move within the R environment with a certain degree of confidence.

Because several functions will require the use of additional packages, I recommend you bring your own laptop, so the download and installation of additional components will proceed smoothly. I will provide all the datasets necessary for the lab exercises.

The course also assumes a basic understanding of descriptive statistics and probability theory (e.g. level of measurement of variables, basic statistics, common distributions) and the understanding of OLS regression analysis.

You should have a deep understanding of Ordinary Least Squares (OLS) regression or have taken the Week 1 course Multiple Regression Analysis: Estimation, Diagnostics, and Modelling or have obtained equivalent prior knowledge through other means.

You should have some familiarity with using the software R to manage data.

If you are unfamiliar with R, I recommend you take Akos Mate's preparatory course R Basics prior to the first week. Otherwise, online resources are plentiful, including:

DataCamp Introduction to R Course

PluralSight Courses in R

A website with tutorial and examples

A good introductory book is R in a Nutshell – A Desktop Quick Reference by Joseph Adler (O’Reilly, 2010)

Day Topic Details
1 Modelling Binary Response Variables: what to do?

Lecture Linear probability models, problems and alternatives. Introduction to binary logistic regression.

Lab Specifying and interpreting binary logistic regression

2 The general logic of GLM and Maximum Likelihood. Interactions.

Lecture The frame of GLM: distributions, link functions and Maximum Likelihood estimation. Problems and solutions for including interaction to logit and probit models. Marginal effects

Lab Scripts for GLM, different link functions, inside ML estimation. Interpreting marginal effects

3 Modelling categorical variables

Lecture Multinomial logistic regression

Lab Specifying and interpreting multinomial logit models

4 Modelling ordinal variables

Lecture Ordered logistic regression

Lab Specifying and interpreting ordered logit models

5 Modelling count variables

Lecture Poisson and negative binomial regression

Lab Specifying and interpreting poisson and negative binomial models

Day Readings
1

Fox, J. (2008). Applied regression analysis and generalized linear models (2nd ed.). Thousand Oaks, CA, US: Sage Publications, Inc. Chapter 14: Logit and Probit Models for Categorical Response Variables. Chapter 15. Generalized Linear Models.

Field, A. P., Miles, J., & Field, Z. (2012). Discovering statistics using R. London: Sage. Chapter 8: Logistic Regression

2

Enders, C. K. (2010). Applied missing data analysis. New York, NY, US: Guilford Press. Chapter 3: An Introduction to Maximum Likelihood Estimation.

Eliason, S. R. (1993). Maximum Likelihood Estimation. Logic and Practice. Thousand Oaks, CA: Sage Publications, Inc. Chapter 1: Introduction: The Logic of Maximum Likelihood.  Chapter 2: A General Modeling Framework Using Maximum Likelihood Methods.

Allison, P. D. (1999): Comparing logit and probit coefficients across groups. Sociological Methods & Research 28(2): 186–208.

Mood, C. (2010): Logistic regression: Why we cannot do what we think we can do, and what we can do about it? European Sociological Review 26(1): 67–82.

Marginal Effects in Lewis-Beck, M. S., Bryman, A., & Futing Liao, T. (2004). The SAGE encyclopedia of social science research methods. Thousand Oaks, CA: Sage Publications, Inc.

Alan Fernihough, 2011. "Simple logit and probit marginal effects in R," Working Papers 201122, School of Economics, University College Dublin.

3

Long, J. S. (1997): Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Inc. Chapter 6: Nominal Outcomes. Multinomial Logit and Related Models.

4

Long, J. S. (1997): Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Inc. Chapter 5: Ordinal outcomes. Ordered Logit and Ordered Probit Analysis.

5

Fox, J. (2008). Applied regression analysis and generalized linear models (2nd ed.). Thousand Oaks, CA, US: Sage Publications, Inc. Chapter 15. Generalized Linear Models.

Long, J. S. (1997): Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Inc. Chapter 8: Count outcomes. Regression Models for Counts.

Software Requirements

R version 3.5.2 or higher with RStudio Desktop version 1.1.463 or higher.

 

Hardware Requirements

Please bring your own own laptop – PC/Windows and Mac are OK with R Studio – with the abovementioned software version and packages already installed.

You will need user privileges to install R and R packages. If you have limited access – because, for example, you are using a work laptop and unsure – consult an IT professional at your institution in the first instance.

Literature

Books

Fox, J., 2008. Applied Regression Analysis and Generalized Linear Models, Sage. (Ch. 14, 15)

Eliason, 1993. Maximum Likelihood Estimation. Logic and Practice. Sage. (Ch. 1, 2)

Enders, C.K., 2010. Applied Missing Data Analysis. Guilford Press.
(Ch. 3 – An Introduction to Maximum Likelihood Estimation – offers a clear and intuitive discussion of ML)

King, G. 1998. Unifying Political Methodology. University of Michigan Press.
(Ch. 1, 2 for a conceptual discussion of the inferential logic and the likelihood. Ch. 3, 4, 5 are optional, but recommended)

Long, J. Scott, 1997. Regression Models for Categorical and Limited Dependent Variables. Sage. (Ch. 5, 6, 8)

Books about R

Adler, J., 2010. R in a Nutshell – A Desktop Quick Reference, O'Reilly.
A general introduction to R; we will take some examples from the book in the lab sessions

Articles

Benoit, K., 1996. Democracies Really Are More Pacific (in general). Journal of Conflict Resolution.

Berry, W.D., DeMeritt, J.H.R., and Esarey, J., 2010. Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential? American Journal of Political Science.

Berry, W.D., Golder, M., and Milton, D., 2012. Improving Tests of Theories Positing Interaction. Journal of Politics.

Brambor, T., Clark, W. R., Golder, M., 2006. Understanding Interaction Models: Improving Empirical Analyses, Political Analysis.

Braumoeller, B.F., 2004. Hypothesis testing and multiplicative interaction terms. International Organization.

Further reading

Fitzmaurice, G.M, Laird, N.M., Ware, J.H., 2004. Applied Longitudinal Analysis. Wiley. (Ch. 10 – Review of Generalized Linear Models is yet another explanation of the logic of GLMs, like the Fox chapter. You don't have to read it all – definitely skip the SAS part – but it might be useful to hear the same concepts repeated in a different context)

Recommended Courses to Cover Before this One

 

Summer School

R Basics
Refresher in Regression (before you take a more advanced stats class)
Introduction to Inferential Statistics: What you need to know before you take regression

Multiple Regression Analysis: Estimation, Diagnostics, and Modelling