ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Applied Multilevel Regression Modelling

Course Dates and Times

Monday 30 July - Friday 3 August and Monday 6 August - Friday 10 August

09:00-10:30 / 11:00-12:30

Zoltán Fazekas

zf.egb@cbs.dk

Copenhagen Business School

The present course is aimed at familiarizing students with principles of multilevel modeling and its implementation in R. The course manual is Gelman and Hill (2007) and the main R package used will be lme4. Broadly speaking, we will build and estimate models that step-by-step get more complex, discussing each decision in this process. Although this is a methods course, each and every lecture will focus on how the statistical model is linked to possible theories or particular hypotheses with added concern regarding potential limitations and correct interpretation. After laying the basic foundations, during the second week we switch gears and work with more complex models (i.e. cross-classified models, longitudinal analysis). By the end of the course, participants are expected to be able to clearly argue why they use in their own research papers a multilevel model, which specification suits the research question and the data (including relatively complex questions and data structures), how the models are specified, what the results mean and how they can be integrated with previous research. Each lecture will discuss both theoretical principles and practical implementations, whereas the lab sessions are designated solely to issues related to practical implementation.

Tasks for ECTS Credits

  • Participants attending the course: 4 credits (pass/fail grade) The workload for the calculation of ECTS credits is based on the assumption that students attend classes and carry out the necessary reading and/or other work prior to, and after, classes.
  • Participants attending the course and completing one task (see below): 6 credits (to be graded)
  • Participants attending the course, and completing two tasks (see below): 8 credits (to be graded)

An additional 2 ECTS credits can be earned by completing two assignments (exercise + reading assignment). All these three are graded and are carried out during the two-weeks at the Summer School.

Finally, a further 2 additional ECTS credits can be earned by submitting a post-Summer School final paper, and this paper usually builds on the presentations during the last day. Upon more detailed discussion on the first day of class, participants are expected to communicate their desired number of credits by Day 3.


Instructor Bio

Zoltán Fazekas is a Postdoctoral Researcher at the Department of Political Science, University of Oslo.

He earned his PhD in political science at the Department of Methods in the Social Sciences, University of Vienna, where he was an Early Stage Researcher in the Marie Curie Initial Training Network in Electoral Democracy, ELECDEM.

Zoltán holds a BA in Economics, an MA in European Affairs and an MA in Political Science. His fields of interest are: comparative electoral behaviour, political psychology, and quantitative methods.

  @fazol

Using multilevel models became a trend in recent years, however the switch from educational sciences with students nested in different schools (or teachers) to how volatility of a party system might influence individual political decisions is not trivial. These models allow researchers to test various comparative hypotheses that previously were only tentative explanations in research carried out on separate country samples. Furthermore, the abundance of cross-national survey data (such as EES, ESS, CSES) increasingly invites researchers to use these models. Nevertheless, multilevel linear models have specific assumptions and their use is guided both by theoretical reasoning and data properties. Also, once a general understanding of multilevel models is acquired, they present themselves as an extremely valuable and versatile set of tools for complex questions. In this course we will focus on three core aspects of applied multilevel modeling: 1) properties and specification of multilevel models, 2) linking method with theories of heterogeneity, and 3) implementation in R.

The present course is set up as a two-week course. The first week is dedicated to the general principles of multilevel models and implementation, including varying intercepts varying slope multilevel linear models, with additional focus on uncertainty, prediction, and limitations. The second week is dedicated to more advanced topics that usually appear in applied research. Each day has a lecture component and a lab component and along the assigned readings, the end of the first week will feature an overview homework. We start out the second week by reviewing together this homework. We will work with multiple datasets (overwhelmingly survey data) throughout the whole course, step-by-step specifying more complicated models, or extending our models to accurately reflect the formulated theory. In parallel, participants are expected to apply step of the covered analyses to their own data or projects.

The lab sessions (taught in R) accompany the course and we will go through examples of multilevel models in applied research. Moreover, we will extract, display and discuss quantities of interest and link them directly to the concepts covered during the lectures, and discuss how these should be reported in an academic paper and how this can be easily formatted and exported from R. The lab sessions can be described as “supervised individual/group work”.

After a brief discussion of the course logistics we will review linear regression and assumption violations on day one. Furthermore, we introduce nested data structures and what challenges these raise for pooled regression models. We review the alternatives such as the comparison of single group regressions run separately and pooled regression with cluster corrected standard errors and their limitations and start discussing properties of the variables that will be included in the multilevel models. In this case, the sources of variation (within and between group) are of specific interest.

Day two offers the methodological and statistical transition from pooled regression to multilevel modeling with varying intercepts and varying slopes. We focus on principles and assumptions of multilevel models, discussing the great benefits but also the possible limitations (both from a theoretical and statistical perspective). We will also spend quite some time on clarifying notation and the meaning of these terms (i.e. fixed and random effects, correlation between random effects, etc.).

On day three we will extend our models to include multiple predictors, discussing both data preparation (as in centering) and interpretation, with strong focus on cross-level interactions. We extend the examples used in the previous days and we dedicate both lecture and lab time to interpretation exercises focusing on transitioning from software output to reporting results and substantive implications for the research questions considered.

Day four discusses comparative model fit evaluation and model building, sources of uncertainty around the estimates and hypotheses testing. Additionally, we will review principles of prediction and substantive communication of empirical results based on multilevel models, rounding up interpretation and results communication tasks introduced earlier during the course.

The last day of the first week is dedicated to generalized linear models, with focus on logistic regression in a multilevel setting. After a short overview of link functions and the different quantities that we are interested in (such as predicted probabilities of a particular outcome category), we will focus on multilevel models, including second level predictors and cross level interactions, for dichotomous variables (and potentially counts).

Day six we kick-off with a review of the weekend homework assignment and carry out a detailed reproduction exercise that allows participants to apply all facets of knowledge and skills gained in Week 1, with specific focus on presenting and streamlining a data/analysis part of an academic paper that uses a multilevel model. On day seven we switch to rather specific, but still intuitive application of multilevel models appears in the case of modeling longitudinal data. In many cases, change (continuous or discontinuous) throughout time is of interest for researchers, and the multilevel framework offers extensive possibilities to accurately model this, easily incorporating time varying predictors for example. We extend the basic longitudinal models to be able to handle unbalanced data (variably spaced data), but also data with varying numbers of measurement occasions, as these problems often characterize real data stemming from surveys. For different approaches to handling time or specific focus on various forms of age-cohort analysis (but also typical cases where cross-country surveys come in few, but multiple waves), we return to modeling longitudinal data on day nine, as some of the approaches build on a cross-classified structure.

On day eight we analyze data where we have a more complicated nesting structure. As in education research where pupils attend a particular elementary school and then a particular high school, we can find these situations in many other research areas. Observations can be nested in both countries and years, or specific survey responses can be nested in individuals and different modes, and so on. We will discuss cross-classified and multiple membership models in order to accurately account for this nesting structure and evaluate hypotheses that are linked to multiple grouping units.

 

The last day revolves around introducing future directions, especially deep interactions and poststratification, an extremely useful approach for deriving sub-group level estimates (for example, geographic and demographic sub-categories) from data that is available only on a higher level (such as, a nationally representative survey). However, the bulk of this day will be dedicated to an overall practical application task including models discussed in Week 2, focusing on the most often encountered issues or specific elements related interpretation.

Note on readings: some of the days contain extensive preparatory readings. Similarly, before starting the class, a review of regression related considerations is extremely useful, so please plan some extra time BEFORE the course to familiarize yourselves with the readings and reading load.

  • Software: This is a course focused on application. The course examples and labs will be carried out in R. Students are expected to use R comfortably. This implies that they should be able to 1) load data from different sources and in different formats, 2) transform (recode) variables as necessary, 3) fit linear regression models, and 4) extract quantities of interest for summary and plotting from regression output objects.
  • Prior methodological training: We will do a brief review of linear regression, but that does not substitute in depth knowledge and experience with linear regression models (OLS and maximum likelihood), and a thorough understanding of regression assumptions. Three sample books that can help in reviewing these concepts are:
  1. Achen, C. H. (1982). Interpreting and using regression (Vol. 29). Sage Publications, Incorporated.
  2. Lewis-Beck, M. (1980). Applied regression: An introduction (Vol. 22). Sage Publications, Incorporated.
  3. Eliason, S. R. (1993). Maximum likelihood estimation: Logic and practice (Vol. 96). Sage Publications, Incorporated.

Note: at the start of the course we assume that all of the above prerequisites are satisfied. In order to keep all topics and sufficient number of examples, we cannot allocate time for regression review or unrelated R issues.

Day Topic Details
1 Introduction: from complete or no-pooling to partial pooling

Equal split of 3 hours: 90min lecture, 90min lab

2 THE model: Varying intercept and slope

Equal split of 3 hours

3 Predictors and cross-level interactions

Equal split of 3 hours

4 Uncertainty, prediction, and power

Equal split of 3 hours

5 Multilevel generalized linear models

Equal split of 3 hours

7 Longitudinal models (1)

Equal split of 3 hours

8 Models for cross-classified data

Equal split of 3 hours

9 Longitudinal models (2)

Equal split of 3 hours

6 Review of Week 1 and reproduction task

30min lecture; 150min lab

10 Future directions and application task

45-60min lecture; 120-135min lab

Day Readings

PLEASE NOTE: participants are expected to do the readings BEFORE the scheduled meeting

1

For review: Gelman and Hill (2007), Chapters 2, 3, and 4

Gelman and Hill (2007), Chapter 1, 11

Recommended:

King, Gary, and Margaret E Roberts. (2015). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It, Political Analysis 23: 159-179.

Aronow, Peter. (2016). A Note on "How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It"

2

Gelman and Hill (2007), Chapter 12, 13

Steenbergen, M. R., & Jones, B. S. (2002). Modeling multilevel data structures. American Journal of Political Science, 46(1), 218-237.

3

Craig K. E. & Tofighi, D. (2007) “Centering predictor variables in cross-sectional multilevel models: A new look at an old issue.” Psychological Methods 12(2): 121-138.

Brambor, T., Clark, W. R., & Golder, M. (2006). Understanding interaction models: Improving empirical analyses. Political Analysis, 14(1), 63-82.

Recommended:

Pittau, M. G., Zelli, R., & Gelman, A. (2010). Economic disparities and life satisfaction in European regions. Social indicators research96(2), 339-361.

Berry, W. D., Golder, M., & Milton, D. (2012). Improving tests of theories positing interaction. The Journal of Politics, 74(03), 653-671.

4

Gelman and Hill (2007), Chapter 20, 21, and 24.

Recommended:

Stegmueller, D. (2013). How many countries for multilevel modeling? A comparison of frequentist and Bayesian approaches. American Journal of Political Science, 57(3), 748-761.

5

Gelman and Hill (2007), Chapter 5 and 14.

Recommended:

Gelman and Hill (2007), Chapter 15.

Bates, D. M. (2010). lme4: Mixed-effects modeling with R. http://lme4.r-forge.r-project.org/book. Chapter 6.

6

No readings assigned (read ahead for Day 7).

7

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press, USA. Chapters 3, 4, 5, and 6.

Recommended:

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press, USA. Chapters 1 and 7.

Bates, D. M. (2010). lme4: Mixed-effects modeling with R. http://lme4.r-forge.r-project.org/book. Chapter 3.

8

Fielding, A., & Goldstein, H. (2006). Cross-classified and multiple membership structures in multilevel models: an introduction and review. DfES.

Browne, W. J., Goldstein, H., & Rasbash, J. (2001). Multiple membership multiple classification (MMMC) models. Statistical Modelling1(2), 103-124.

Bates, D. M. (2010). lme4: Mixed-effects modeling with R. http://lme4.r-forge.r-project.org/book. Chapter 2.

9

Steele, F. (2008). Multilevel models for longitudinal data. Journal of the Royal Statistical Society: Series A (Statistics in Society)171(1), 5-19.

Yang, Y., & Land, K. C. (2008). Age–Period–Cohort Analysis of Repeated Cross-Section Surveys: Fixed or Random Effects? Sociological methods & research36(3), 297-326.

Smets, K., & Neundorf, A. (2014). The hierarchies of age-period-cohort research: Political context and the development of generational turnout patterns. Electoral Studies, 33, 41-51.

Recommended:

Bell, A., & Jones, K. (2014). Another “futile quest”? A simulation study of Yang and Land’s Hierarchical Age-Period-Cohort model. Demographic Research, 30, 333–360.

Bell, A., & Jones, K. (2015). Should age-period-cohort analysts accept innovation without scrutiny? A response to Reither, Masters, Yang, Powers, Zheng and Land. Social Science & Medicine, 128, 331–333.

[the rejoinder] Reither, E. N., Land, K. C., Jeon, S. Y., Powers, D. A., Masters, R. K., Zheng, H., … Claire Yang, Y. (2015). Clarifying hierarchical age–period–cohort models: A rejoinder to Bell and Jones. Social Science & Medicine, 145, 125–128.

10

Ghitza, Y., & Gelman, A. (2013). Deep interactions with MRP: Election turnout and voting patterns among small electoral subgroups. American Journal of Political Science, 57(3), 762-776.

Leemann, L., & Wasserfallen, F. (2017). Extending the use of prediction precision of subnational public opinion estimates. American Journal of Political Science, 61(4), 1003-1022.

Recommended:

Lax, Jeffrey, and Justin Phillips. 2009b. “How Should We Estimate Public Opinion in the States?” American Journal of Political Science 53(1): 107–21.

Wang, W., Rothschild, D., Goel, S., & Gelman, A. (2015). Forecasting elections with non-representative polls. International Journal of Forecasting, 31(3), 980-991.

Software Requirements

Latest version of R with possibility to install packages (if needed) on the go. RStudio installed is a plus.

Hardware Requirements

No special requirements, but students are expected to bring their own laptops to lecture and lab sessions.

Literature

In addition to mandatory readings, these pieces can be very useful both for alternative perspectives, applications, and further learning:

  1. Gelman, A. (2009). Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (Expanded Edition). Princeton University Press.
  2. Achen, C. H. (1982). Interpreting and using regression (Vol. 29). Sage Publications, Incorporated.
  3. Lewis-Beck, M. (1980). Applied regression: An introduction (Vol. 22). Sage Publications, Incorporated.
  4. Eliason, S. R. (1993). Maximum likelihood estimation: Logic and practice (Vol. 96). Sage Publications, Incorporated.
  5. Luke, D. A. (2004). Multilevel modeling (Vol. 143). Sage Publications, Incorporated.
  6. Snijders, T. A. A., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Sage Publications Limited.
  7. Hox, J. J. (2010). Multilevel analysis: Techniques and applications. Taylor & Francis.
  8. Raudenbush, S.W. and Bryk, A.S. (2002). Hierarchical Linear Models (Second Edition). Thousand Oaks: Sage Publications.
  9. Jackman, S. (2009). Bayesian analysis for the social sciences (Vol. 846). Wiley.
  10. Kruschke, J. (2010). Doing Bayesian data analysis: A tutorial introduction with R and BUGS. Academic Press.
  11. Gill, J. (2007). Bayesian Methods: A Social and Behavioral Sciences Approach. Chapman & Hall/CRC.

Recommended Courses to Cover Before this One

Summer School

Introduction to Generalized Linear Modeling

Advanced Topics in Applied Regression

Winter School

Linear Regression with R/Stata: Estimation, Interpretation and Presentation

Interpreting Binary Logistic Regression Models

Recommended Courses to Cover After this One

Winter School

Times Series Analysis

Introduction to Bayesian Inference