ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Back to Panel Details
Back to Panel Details

Logistic Regression and General Linear Models

Michał Kotnarowski
kotnar@isppan.waw.pl

Polish Academy of Sciences

Michał Kotnarowski is an Assistant Professor at the Institute of Political Studies of the Polish Academy of Sciences. He specialises in voting behaviour, comparative politics and political methodology.

He has contributed a number of articles to journals including, Party Politics, Communist and Post-Communist Studies, Acta Politica, and the International Journal of Sociology.


Course Dates and Times

Monday 25 February – Friday 1 March, 14:00 – 17:30 (finishing slightly earlier on Friday)
15 hours over five days

Prerequisite Knowledge

This is an advanced course. To get the most out of it, you should:

  1. Understand the logic of inferential statistics 
    You should be familiar with hypothesis testing and concepts such as confidence intervals and significance level. If you are not, take course WB108 Introduction to Statistics for Political and Social Scientists or Introduction to Inferential Statistics: What you need to know before you take regression [link to 2018 course] at the 2019 ECPR Summer School. 
  2. Be familiar with rudiments of the linear regression model estimated using the Ordinary Least Squares method 
    You should be familiar with the logic of linear regression analysis, assumptions of the linear regression model and regression with dummy variables. If you are not, take course WB107 Linear Regression with R/Stata: Estimation, Interpretation and Presentation or Multiple Regression Analysis: Estimation, Diagnostics, and Modelling [link to 2018 course] at the 2019 ECPR Summer School.
  3. Have at least a basic understanding of the R language 
    You should be able to import into R a dataset written in the SPSS format and run a linear regression model. You should be able to conduct a set of basic data manipulations in R, such as: selecting observations, selecting variables and computing new variables using existing variables. If you are new to R, take course WA106A Introduction to R


Short Outline

This course is an introduction to General Linear Models (GLMs). You will learn how to run a regression model when the dependent variable is not a continuous numerical one.

It is quite common in social sciences to want to model respondents’ choices between two or more categories, measuring answers on an ordinal scale or event counts. The typical solution is to use GLMs. 

This course will cover a broad family of GLMs, including binary, multinomial, ordered, and conditional logistic regression models, as well as models designed for count data (Poisson regression and negative binomial models).

You will learn practical skills related to running GLMs, including proper interpretation of the regression outcome and presentation of model results in the form of graphs and tables. We will also discuss limitations of GLMs.

Tasks for ECTS Credits

2 credits (pass/fail grade) Attend at least 90% of course hours, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.

3 credits (to be graded) As above, plus complete one task (tbc).

4 credits (to be graded) As above, plus complete two tasks (tbc).


Long Course Outline

Researchers working in broadly defined social sciences often have to deal with analyses in which the dependent variable is not a continuous variable defined on the interval scale. These are situations in which the dependent variable is either:

  1. a binary variable, when respondents select one out of two options (e.g., whether they voted in the last election)
  2. a nominal variable, when respondents select one out of three or more options (e.g., which party they voted for in the last election)
  3. an ordinal variable (e.g., when a respondent chooses an answer on the Likert scale) or
  4. a variable counting the number of occurrences of a phenomenon (e.g., how many times a respondent participated in protest actions).

For this type of dependent variable, it is not appropriate to use Ordinary Least Square (OLS) regression models but General Linear Models (GLMs), which are estimated in a different way from linear regression models.

Interpretation of GLMs is much more complex than for OLS models. Although GLMs are often used in social sciences, their use and correct interpretation still give researchers difficulties.

This course is an introduction to GLMs, and you will gain practical skills related to their use. But I will also introduce theoretical aspects of GLMs so you can understand and interpret them properly, and I will do it in a way that is understandable to those without rudimentary matrix algebra or calculus.

Day 1
The regression model with a binary dependent variable. We start by discovering why it is inappropriate to use OLS models in such cases; in particular, which OLS model assumptions are not met, and what might be the negative consequences of using OLS models for this type of data. Next, I will show you how to generalise a linear model so that it can be applied to models with a limited dependent variable. You will learn the linear predictor and the link function. We close with a presentation of Maximum Likelihood Estimation as a technique for estimating logistic regression model parameters.

Days 2 & 3
You will develop practical skills related to the interpretation of the binary logistic regression model, i.e. the interpretation of regression coefficients and odds ratios. I will present measures of goodness of fit of the models, and various versions of pseudo-R-squared measures. I demonstrate the extension of additive logistic regression models by introducing interactions between independent variables. You will learn how to correctly interpret a logistics binary regression model that incorporates interaction terms, and how to report the results using predicted probabilities, in particular through statistical graphics.

Day 4
Models with a nominal dependent variable, i.e. multinomial logistic regression.

Day 5
Techniques for an ordinal dependent variable, i.e. ordinal logistic regression, and regression models for counts. Poisson regression and negative binomial models.

I will illustrate the application of each method using analyses based on real-world data, presenting GLMs with their constraints and limitations.

By the end of the course, you will be able to:

  • run GLMs on your own datasets
  • interpret GLM regression coefficients and odds ratios
  • assess the goodness of fit of the models
  • estimate the uncertainty of predicted effects using simulations
  • present the results using statistical graphics techniques.

You will be given assignments. Students who want ECTS credits must complete practical exercises related to the techniques introduced on a given day. You can use your own data for these (strongly recommended) or data provided by the instructor.

The lab session and assignments use the open-source statistical software R, enabling efficient implementation and advanced interpretation of GLMs. R’s graphical capabilities allow effective and relatively simple presentation of GLM results.

Day Topic Details
Day 1 Introduction to General Linear Models

90-minute Workshop with elements of lecture Linear model vs. general linear model, linear predictor, link function, Maximum Likelihood Estimation.

90-minute Lab session Running first binary regression models.

Day 2 Binary Logistic Regression

90-minute Workshop with elements of lecture; 90-minute Lab session 
Interpretation of parameters of binary logistic regression models, goodness of fit measures, interaction terms within a binary logistic regression model, predicted probabilities, measures of uncertainty of predicted effects.

 

Day 3 Binary Logistic Regression - continuation

90-minute Workshop with elements of lecture; 90-minute Lab session 
Developing skills in the interpretation of binary logistic regression models. Presentation of logistic regression models using tools of statistical graphics.

Day 4 Models for Nominal Outcomes

90-minute Workshop with elements of lecture; 90-minute Lab session Multinomial logistic regression model. Interpretation of parameters of the model, goodness of fit measures, interaction terms, predicted probabilities, measures of uncertainty of predicted effects.

Day 5 Models for Ordinal Outcomes and Count Data

90-minute Workshop with elements of lecture; 90-minute Lab session
Ordinal logistic regression model. Poisson regression and negative binomial model. Interpretation of parameters of the models, goodness of fit measures, interaction terms, predicted probabilities/predicted values, measures of uncertainty of predicted effects.

Day Readings

Brambor, T., Clark, W. R., & Golder, M. (2006). Understanding Interaction Models: Improving Empirical Analyses. Political Analysis, 14(1), 63–82.

Fox, J. (2003). Effect Displays in R for Generalised Linear Models. Journal of Statistical Software, 8(15). https://doi.org/10.18637/jss.v008.i15

Fox, J. (2008). Applied Regression Analysis and Generalized Linear Models (2nd ed.). Sage Publications, Inc.

Fox, J., & Hong, J. (2009). Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package. Journal of Statistical Software, 32(1), 1–24. https://doi.org/10.18637/jss.v032.i01

Fox, J., & Weisberg, H. S. (2011). An R Companion to Applied Regression (Second Edition). Sage Publications, Inc.

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables (1st ed.). Sage Publications, Inc.

Day 1

Long (1997) Ch. 3; Fox (2008) Ch. 14.1, Ch. 15.1

Day 2

Long (1997) Ch. 4; Fox (2011) Ch. 5.1 - 5.3, Brambor et al. (2006)

Day 3

Fox (2003)

Day 4

Long (1997) Chapter 6; Fox (2008) Chapter 14.2; Fox & Hong (2009)

Day 5

Long (1997) Chapters 5 and 8; Fox (2008) Chapter 15.2; Fox (2011) Chapter 5.5

Software Requirements

Download the newest version of R

Download the newest version of R Studio

Hardware Requirements

Please bring a laptop not more than four years old.

Literature

Agresti, A. (2007). An introduction to categorical data analysis (2nd ed). Hoboken, NJ: Wiley-Interscience.

Agresti, A. (2013). Categorical data analysis (Third edition). Hoboken, NJ: Wiley.

Brambor, T., Clark, W. R., & Golder, M. (2006). Understanding Interaction Models: Improving Empirical Analyses. Political Analysis, 14(1), 63–82.

Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data (Second edition). Cambridge ; New York, NY: Cambridge University Press.

Fox, J. (2003). Effect Displays in R for Generalised Linear Models. Journal of Statistical Software, 8(15). https://doi.org/10.18637/jss.v008.i15

Fox, J. (2008). Applied Regression Analysis and Generalized Linear Models (2nd ed.). Sage Publications, Inc.

Fox, J., & Hong, J. (2009). Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package. Journal of Statistical Software, 32(1), 1–24. https://doi.org/10.18637/jss.v032.i01

Fox, J., & Weisberg, H. S. (2011). An R Companion to Applied Regression (Second Edition). Sage Publications, Inc.

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Third edition).

Hoboken, New Jersey: Wiley.

King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data. Political Analysis, 9(2), 137–163.

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables (1st ed.). Sage Publications, Inc.

Recommended Courses to Cover Before this One

<p><strong>Summer School </strong></p> <p>R Basics</p> <p>Introduction to Inferential Statistics: What you need to know before you take regression</p> <p>Multiple Regression Analysis: Estimation, Diagnostics, and Modelling</p> <p>Multivariate Statistical Analysis and Comparative Crossnational Surveys Data</p> <p><strong>Winter School </strong></p> <p>Missing Data</p> <p>Introduction to R&nbsp;(entry level or for participants with some prior knowledge in command-line programming)</p> <p>Linear Regression with R/Stata: Estimation, Interpretation and Presentation</p> <p>Introduction to Statistics for Political and Social Scientists</p>

Recommended Courses to Cover After this One

<p><strong>Summer School </strong></p> <p>Applied Multilevel Regression Modelling</p> <p>Causal Inference in the Social Sciences</p> <p>Introduction to Structural Equation Modelling</p> <p>Time Series Analysis</p> <p>Panel Data Analysis</p> <p>Multilevel Structural Equation Modelling</p> <p>Advanced Structural Equation Modelling</p> <p><strong>Winter School </strong></p> <p>Time Series Analysis</p> <p>Methods of Modern Causal Analysis Based on Observational Data</p> <p>Multilevel Regression Modelling</p> <p>Structural Equation Modeling (SEM) with R</p>


Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Conveners

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.