ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Logistic Regression and General Linear Models

Michał Kotnarowski
kotnar@isppan.waw.pl

Polish Academy of Sciences

Michał Kotnarowski is an Assistant Professor at the Institute of Political Studies of the Polish Academy of Sciences. He specialises in voting behaviour, comparative politics and political methodology.

He has contributed a number of articles to journals including, Party Politics, Communist and Post-Communist Studies, Acta Politica, and the International Journal of Sociology.


Course Dates and Times

Monday 5 to Friday 9 March 2018
14:00-17:30
15 hours over 5 days

Prerequisite Knowledge

This is advanced course. In order to get the most out of the course, participants should met the following requirements:

1. Participants are expected to understand the logic of inferential statistics. In particular, participants should be familiar with hypothesis testing and such concepts as confidence intervals and significance level. Participants not familiar with these topics should have taken introductory course ‘Introduction to Statistics for Political and Social Scientists’ offered at the ECPR Winter School or ‘Introduction to Inferential Statistics: What you need to know before you take regression’ offered at the ECPR Summer School or have obtained equivalent prior knowledge through other means.

2. Participants should be familiar with rudiments of linear regression model estimated using Ordinary Least Squares method. In particular, participants should be familiar with the logic of the linear regression analysis, assumptions of the linear regression model and regression with dummy variables. Participants not familiar with the topics mentioned above should have taken the course ‘Linear Regression with R/Stata: Estimation, Interpretation and Presentation’ offered at the ECRP Winter School or ‘Multiple Regression Analysis: Estimation, Diagnostics, and Modelling’ offered at the ECPR Summer School or have obtained equivalent prior knowledge through other means.

3. The course relies on the R software. Participants should have at least a basic understanding of the R language. In particular, participants should be able to import into R a dataset written in the SPSS format and run linear regression model. Moreover, participants should be able to conduct a set of basic data manipulations in R, such as: selecting observations, selecting variables and computing new variables using existed variables. Participants new to R should have taken preparatory short course ‘Introduction to R’ offered at the ECPR Winter School or have obtained equivalent prior knowledge through other means.


Short Outline

The course will deal with the problem of how to run a regression model when the dependent variable is not a continuous numerical one. It is quite a common situation in social sciences when one wants to model respondents’ choices between two or more categories, answers measured on an ordinal scale or event counts. The typical solution for this problem is to use General Linear Models (GLM).  The course is an introduction to General Linear Models. It will cover a broad family of GL models, including models based on logistic regression, namely, binary, multinomial, ordered, and conditional logistic regression models, as well as models designed for count data (Poisson regression and negative binomial model). The course will deliver practical skills related to running GLM, including proper interpretation of the regression outcome and presentation of model results in the form of graphs and tables. Limitations of the GL Models will also be discussed during the course.

Tasks for ECTS Credits

  • Participants attending the course: 2 credits (pass/fail grade) The workload for the calculation of ECTS credits is based on the assumption that students attend classes and carry out the necessary reading and/or other work prior to, and after, classes.
  • Participants attending the course and completing one task (see below): 3 credits (to be graded)
  • Participants attending the course, and completing two tasks (see below): 4 credits (to be graded)

Long Course Outline

Researchers, who are working in broadly-defined social sciences, have to deal, quite often, with analyses in which the dependent variable is not a continuous variable defined on the interval scale. These are the situations where the dependent variable is either (1) a binary variable, when respondents selected one out of two option (e.g., answer to the question whether a respondent voted in the last elections), (2) a nominal variable, when respondents selected one out of three or more options (e.g., answer to the question which party the respondent voted in the last elections),  (3) an ordinal variable (e.g., when a respondent chooses one of the answers on the Likert scale) or (4) a variable counting the number of occurrences of a phenomenon (e.g., answer to the question how many times a respondent participated in protest actions). For this type of dependent variables, it is not appropriate to use Ordinary Least Square (OLS) regression models. The most common approach is using General Linear Models (GLM). These models are estimated in a different way than linear regression models. Besides, interpretation of GLM models is also much more complex than OLS models. Although GL models are used very often in the field of social sciences, their use and correct interpretation still give difficulties for the researchers.

The aim of this course is to provide an introduction to the General Linear Models. The participants of the course will gain practical skills related to the use of GLM. However, the course will not be abstracted from the statistical theory. The theoretical aspects of GL modeling will be presented to the extent that is necessary to understanding and interpret GL models properly. Theoretical aspects will be introduced in an approachable way that is understandable for participants without rudiments in matrix algebra or calculus.

The course will be conducted according to the following scheme: On the first day, we will deal with the regression model with a binary dependent variable. We start with discovering why it is inappropriate to use OLS models in such cases, in particular, which OLS model assumptions are not met, and what might be the negative consequences of using OLS models for this type of data. Next, it will be shown how a linear model can be generalized so that it can be applied to models with a limited dependent variable. The participants will learn such concepts as the linear predictor and the link function. The first day of the course will end with a presentation of the Maximum Likelihood Estimation as a technique for estimating logistic regression model parameters. During the next two days, participants will develop practical skills related to the interpretation of the binary logistic regression model, i.e. the interpretation of regression coefficients and odds ratios. Moreover, measures of goodness of fit of the models and various versions of pseudo-R-squared measures will be presented. The next point will be the extension of additive logistic regression models by introducing interactions between independent variables. Participants will learn how to correctly interpret a logistics binary regression model that incorporates interaction terms. Participants will also learn the possibilities of reporting results using predicted probabilities, in particular through the techniques of statistical graphics. The fourth day will be devoted to the models with a nominal dependent variable, i.e. multinomial logistic regression and conditional logistic regression models. These techniques will be compared regarding their analytical capabilities and possible applications. On the fifth day, participants will learn a techniques designed for an ordinal dependent variable, i.e. ordinal logistic regression, and regression models for counts. In terms of counts, participants will learn such techniques as Poisson regression and negative binomial models.

The application of each method will be illustrated using analyses based on real-world data. General Linear Models will be presented together with its constraints and limitations.

For each of the techniques presented during the course, participants will acquire a similar set of practical skills. By the end of the course, participants will be able to run by themselves GLM models on their own datasets, interpret the GLM regression coefficients and odds ratios, assess the goodness of fit of the models, estimate the uncertainty of predicted effects using simulations, and present the results using statistical graphics techniques.

Each day, the course will consist of two parts. During the first 90 minutes, it will be a workshop with elements of the lecture. The second part, lasting for another 90 minutes, will take the form of a lab session. During lab sessions, participants will conduct practical exercises using the techniques introduced during the first part. It is recommended that the participants of the course bring their own laptops and be able to use them during both parts of the course.

During the course, participants will be given assignments. Students, who will need to get ECTS credits, will be meant to complete assignments after each class. Each assignment will consist of a set of practical exercises related to the techniques introduced on a given day. For assignments, participants will be able to use their own data (which is strongly recommended) or the data provided by the instructor.

The practical part of the course (the lab session and assignments) will be based on the statistical software R. This open-source software enables the efficient implementation of the GLM models. R also allows for the advanced interpretation of GLM models, and the graphical capabilities of the R allow for an effective and relatively simple presentation of GLM results using the statistical graphics techniques.

 

Day Topic Details
1 Review Log odds and odds ratios Rationale for logistic regression models; differences between OLS and logistic regression models; running binary logistic regression models in Stata; interpreting the effects of explanatory variables as the effects on the log odds and on odds ratios; useful Stata commands for understanding log odds and odds ratios; presenting and interpreting odds ratios in presentations and papers (1.5 hrs lecture, 1.5 hrs lab)
2 Predicted probabilities (1) Advantages and disadvantages of odds ratio interpretation of logistic regression models; advantages of using predicted probabilities; basic Stata commands for predicted probabilities (1 hr lecture, 2 hrs lab)
3 Predicted probabilities (2) Graphs of effects (coefficients and predicted probabilities); advantages of using simulations to assess effect uncertainty; running simulations using Stata (with Clarify and without); calculating predicted probabilities: observed case versus average value approaches (1hr lecture, 2 hrs lab)
4 Interaction effects Evaluating and presenting the results of interaction effects, including quadratic effects; differences in interpreting interaction effects compared to OLS models (1 hr lecture, 2 hrs lab)
5 Diagnostics and model fit Simple diagnostic techniques for binary logistic regression models; appropriate measures of model fit (1.5 hrs lecture, 1.5 hrs lab)
Day Readings
1 Long (1997), ch.1-3. Orme and Combs-Orme (2009), ch.1-2. Long and Freese (2006), section 4.7.
2 Orme and Combs-Orme (2009), ch.2. Long and Freese (2006), sections 3.6, 4.6 and 4.7.
3 Mood (2010) Hanmer and Ozan Kalkan (2012) King et al. (2000)
4 Long and Freese (2006), sections 9.2 to 9.4 . Brambor et al. (2006) Berry et al. (2010) Berry et al. (2012) Tsai and Gill (2013)
5 Long and Freese (2006), sections 4.4 and 4.5. Menard (2001), chs. 2 and 4. Esarey and Pierce (2012)

Software Requirements

The newest version of R, accessible at: https://cran.r-project.org

The newest version of R Studio, accessible at: https://www.rstudio.com

Hardware Requirements

Laptops or PCs 3-4 years old.

Literature

Agresti, A. (2007). An introduction to categorical data analysis (2nd ed). Hoboken, NJ: Wiley-Interscience.

Agresti, A. (2013). Categorical data analysis (Third edition). Hoboken, NJ: Wiley.

Brambor, T., Clark, W. R., & Golder, M. (2006). Understanding Interaction Models: Improving Empirical Analyses. Political Analysis, 14(1), 63–82.

Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data (Second edition). Cambridge ; New York, NY: Cambridge University Press.

Fox, J. (2003). Effect Displays in R for Generalised Linear Models. Journal of Statistical Software, 8(15). https://doi.org/10.18637/jss.v008.i15

Fox, J. (2008). Applied Regression Analysis and Generalized Linear Models (2nd ed.). Sage Publications, Inc.

Fox, J., & Hong, J. (2009). Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package. Journal of Statistical Software, 32(1), 1–24. https://doi.org/10.18637/jss.v032.i01

Fox, J., & Weisberg, H. S. (2011). An R Companion to Applied Regression (Second Edition). Sage Publications, Inc.

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Third edition).

Hoboken, New Jersey: Wiley.

King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data. Political Analysis, 9(2), 137–163.

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables (1st ed.). Sage Publications, Inc.

Recommended Courses to Cover Before this One

<p><strong>Summer School </strong></p> <p>R Basics</p> <p>Introduction to Inferential Statistics: What you need to know before you take regression</p> <p>Multiple Regression Analysis: Estimation, Diagnostics, and Modelling</p> <p>Multivariate Statistical Analysis and Comparative Crossnational Surveys Data</p> <p>&nbsp;</p> <p><strong>Winter School </strong></p> <p>&nbsp;</p> <p>Missing Data</p> <p>Introduction to R&nbsp;(entry level or for participants with some prior knowledge in command-line programming)</p> <p>Linear Regression with R/Stata: Estimation, Interpretation and Presentation</p> <p>Introduction to Statistics for Political and Social Scientists</p>

Recommended Courses to Cover After this One

<p><strong>Summer School </strong></p> <p>Applied Multilevel Regression Modelling</p> <p>Causal Inference in the Social Sciences</p> <p>Introduction to Structural Equation Modelling</p> <p>Time Series Analysis</p> <p>Panel Data Analysis</p> <p>Multilevel Structural Equation Modelling</p> <p>Advanced Structural Equation Modelling</p> <p>&nbsp;</p> <p><strong>Winter School </strong></p> <p>Time Series Analysis</p> <p>Methods of Modern Causal Analysis Based on Observational Data</p> <p>Multilevel Regression Modelling</p> <p>Structural Equation Modeling (SEM) with R</p>


Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed at the time of change.

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, please contact us before registering.