Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”


Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Applied Multilevel Regression Modelling

Zoltán Fazekas

Copenhagen Business School

Zoltán Fazekas is a Postdoctoral Researcher at the Department of Political Science, University of Oslo.

He earned his PhD in political science at the Department of Methods in the Social Sciences, University of Vienna, where he was an Early Stage Researcher in the Marie Curie Initial Training Network in Electoral Democracy, ELECDEM.

Zoltán holds a BA in Economics, an MA in European Affairs and an MA in Political Science. His fields of interest are: comparative electoral behaviour, political psychology, and quantitative methods.


Course Dates and Times

Monday 1 to Friday 5 August and Monday 8 to Friday 12 August 2016

Generally classes are either 09:00-12:30 or 14:00-17:30

30 hours over 10 days

Prerequisite Knowledge

Students should be able to comfortably use R (or transition fast from STATA/SPSS, which is easy) and have a solid prior knowledge of linear regression, including a solid understanding of assumptions. We will do a brief review of linear regression, but that does not substitute in depth knowledge and experience with linear regression models (OLS and maximum likelihood). Three sample books that can help in reviewing these concepts are:


  1. Achen, C. H. (1982). Interpreting and using regression (Vol. 29). Sage Publications, Incorporated.
  2. Lewis-Beck, M. (1980). Applied regression: An introduction (Vol. 22). Sage Publications, Incorporated.
  3. Eliason, S. R. (1993). Maximum likelihood estimation: Logic and practice (Vol. 96). Sage Publications, Incorporated.


For R, students can consult many freely available resources, but a good book to accompany a systematic review is: Adler, J. (2010). R in a Nutshell: A Desktop Quick Reference. O'Reilly Media

Short Outline

The present course is aimed at familiarizing students with principles of multilevel modelling and its implementation in R. The course manual is Gelman and Hill (2007) and the two main R packages used will be lme4 and nlme. Broadly speaking, we will build and estimate models that step-by-step get more complex, discussing each decision in this process. Although this is a methods course, each and every lecture will focus on how the statistical model is linked to possible theories or particular hypotheses with added focus on potential limitations and correct interpretation. After laying the basic foundations, during the second week we switch gears and work with more complex models (i.e. deep interactions, cross-classified models, longitudinal analysis). By the end of the course, participants are expected to be able to clearly argue why they use in their own research papers a multilevel model, which specification suits the research question and the data (including relatively complex questions and data structures), how the models are specified, what the results mean and how they can be integrated with previous research. Each lecture will discuss both theoretical principles and practical implementations, whereas the lab sessions are designated solely to issues related to practical implementation.

Long Course Outline

Using multilevel models became a trend in recent years, however the switch from educational sciences with students nested in different schools (or teachers) to how volatility of a party system might influence individual political decisions is not trivial. These models allow researchers to test various comparative hypotheses that previously were only tentative explanations in research carried out on separate country samples. Furthermore, the abundance of cross-national survey data (such as EES, ESS, CSES) increasingly invites researchers to use these models. Nevertheless, multilevel linear models have specific assumptions and their use is guided both by theoretical reasoning and data properties. Also, once a general understanding of multilevel models is acquired, they present themselves as an extremely valuable and versatile set of tools for complex questions. In this course we will focus on three core aspects of applied multilevel modelling: 1) properties and specification of multilevel models, 2) linking method with theories of heterogeneity, and 3) implementation in R.


The present course is set up as a two-week course. The first week is dedicated to the general principles of multilevel models and implementation, including varying intercepts varying slope multilevel linear models, with additional focus on uncertainty, prediction, and limitations. The second week is dedicated to more advanced topics that usually appear in applied research. Each day has a lecture component and a lab component and along the assigned readings, the end of the first week will feature an overview homework. We start out the second week by reviewing together this homework. We will work with multiple datasets (overwhelmingly survey data) throughout the whole course, step-by-step specifying more complicated models, or extending our models to accurately reflect the formulated theory.


The lab sessions (taught in R) accompany the course and we will go through examples of multilevel models in applied research. Moreover, we will extract, display and discuss quantities of interest and link them directly to the concepts covered during the lectures, and discuss how these should be reported in an academic paper and how this can be easily formatted and exported from R. The lab sessions can be described as “supervised individual/group work”.


After a brief discussion of the course logistics we will review principles of inference, linear regression and assumption violations on day one. We introduce examples of comparative research questions and hypotheses, and what sort of data and method requirements have to be met.


Day two is dedicated to nested data structures and what challenges these raise for pooled regression models. We review the alternatives such as the comparison of single group regressions run separately and pooled regression with cluster corrected standard errors and their limitations and start discussing properties of the variables that will be included in the multilevel models. In this case, the sources of variation (within and between group) are of specific interest.


Day three offers the methodological and statistical transition from pooled regression to multilevel modelling with varying intercepts and varying slopes. We focus on principles and assumptions of multilevel models, discussing the great benefits but also the possible limitations (both from a theoretical and statistical perspective). We will also spend quite some time on clarifying notation and the meaning of these terms (i.e. fixed and random effects, correlation between random effects, etc.).


On day four we will extend our models to include multiple predictors, discussing both data preparation (as in centering) and interpretation, with strong focus on cross-level interactions and comparative model fit evaluation.


The last day of the first week is designated to uncertainty, prediction, and power. The first two elements are quantities or procedures that are both important for contextualizing our inferences and for a better reporting of our results. Within the framework of multilevel models, the presence of random effects raises several considerations about how we calculate uncertainty around the estimates, or how do we present our results using new data for model based predictions.


Day six we kick-off with the review of the weekend homework assignment and then switch to generalized linear models. After a short overview of link functions and the different quantities that we are interested in (such as predicted probabilities of a particular outcome category), we will focus on multilevel models, including second level predictors and cross level interactions, for dichotomous variables and counts.


On day seven we analyze data where we have a more complicated nesting structure. As in education research where pupils attend a particular elementary school and then a particular high school, we can find these situations in many other research areas. Observations can be nested in both countries and years, or specific survey responses can be nested in individuals and different modes, and so on. We will discuss cross-classified and multiple membership models in order to accurately account for this nesting structure and evaluate hypotheses that are linked to multiple grouping units.


We dedicate day eight to deep interactions and poststratification, as an extremely useful approach for deriving sub-group level estimates (for example, geographic and demographic sub-categories) from data that is available only on a higher level (such as, a nationally representative survey). As in most cases researchers also have access to rich official statistics at sub-group levels (but not good quality attitudinal data), these combined in a multilevel framework can enhance the quality of estimates for sub-group levels in terms of attitudes and reported behaviors, even with relatively small sample sizes.


One rather specific, but still intuitive application of multilevel models appears in the case of modelling longitudinal data. In many cases, change (continuous or discontinuous) throughout time is of interest for researchers, and the multilevel framework offers extensive possibilities to accurately model this, easily incorporating time varying predictors for example. We extend the basic longitudinal models to be able to handle unbalanced data (variably spaced data), but also data with varying numbers of measurement occasions, as these problems often characterize real data stemming from surveys. These models will be the topic of day nine.


On the last day, after a summary, we will look into future directions and important extensions for applied research. Most notably, bulk of the quantitative comparative research uses survey data and thus response behavior and systematic cross-country variation these can be a real issue. We will discuss multilevel item-response models to account for these possible problems and review previous examples on how substantive findings might change without considering these effects. Finally, we introduce how to transition to Bayesian estimation for hierarchical models.

Day Topic Details
Monday 1 Introduction to multilevel models: theory and data requirements

Equal split of 3 hours: 90min lecture, 90min lab

Tuesday 2 From complete or no-pooling to partial pooling

Equal split of 3 hours: 90min lecture, 90min lab

Wednesday 3 THE model: Varying intercept and slope

Equal split of 3 hours: 90min lecture, 90min lab

Thursday 4 Predictors and cross-level interactions

Equal split of 3 hours: 90min lecture, 90min lab

Friday 5 Uncertainty, prediction, and power

Equal split of 3 hours: 90min lecture, 90min lab

Monday 8 Multilevel generalized linear models

Equal split of 3 hours: 90min lecture, 90min lab

Tuesday 9 Models for cross-classified data

Equal split of 3 hours: 90min lecture, 90min lab

Wednesday 10 Deep interactions

Equal split of 3 hours: 90min lecture, 90min lab

Thursday 11 Longitudinal models

Equal split of 3 hours: 90min lecture, 90min lab

Friday 12 Future directions: transition to Bayesian estimation and multilevel item-response models

Equal split of 3 hours: 90min lecture, 90min lab

Day Readings
Monday 1

Kass, R. E. (2011). Statistical inference: The big picture. Statistical science: a review journal of the Institute of Mathematical Statistics, 26(1), 1-9.

Rainey, Carlisle. 2014. "Arguing for a Negligible Effect." American Journal of Political Science 58(4): 1083-1091

For review:

Gelman and Hill (2007), Chapters 2, 3, and 4

Tuesday 2

Gelman and Hill (2007), Chapter 1, 11

King, Gary, and Margaret E Roberts. Early View (2014). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It, Political Analysis: 1-21.

Wednesday 3

Gelman and Hill (2007), Chapter 12, 13

Steenbergen, M. R., & Jones, B. S. (2002). Modeling multilevel data structures. American Journal of Political Science, 46(1), 218-237.

Thursday 4

Craig K. E. & Tofighi, D. (2007) “Centering predictor variables in cross-sectional multilevel models: A new look at an old issue.” Psychological Methods 12(2): 121-138.

Pittau, M. G., Zelli, R., & Gelman, A. (2010). Economic disparities and life satisfaction in European regions. Social indicators research96(2), 339-361.

If not already familiar, for review:

Brambor, T., Clark, W. R., & Golder, M. (2006). Understanding interaction models: Improving empirical analyses. Political Analysis, 14(1), 63-82.

Berry, W. D., Golder, M., & Milton, D. (2012). Improving tests of theories positing interaction. The Journal of Politics, 74(03), 653-671.

Friday 5

Gelman and Hill (2007), Chapter 20, 21, and 24.

Stegmueller, D. (2013). How many countries for multilevel modeling? A comparison of frequentist and Bayesian approaches. American Journal of Political Science, 57(3), 748-761.

Monday 8

Gelman and Hill (2007), Chapter 5, 14, and 15.

Bates, D. M. (2010). lme4: Mixed-effects modeling with R. URL http://lme4. r-forge. r-project. org/book. Chapter 6.

Tuesday 9

Fielding, A., & Goldstein, H. (2006). Cross-classified and multiple membership structures in multilevel models: an introduction and review. DfES.

Browne, W. J., Goldstein, H., & Rasbash, J. (2001). Multiple membership multiple classification (MMMC) models. Statistical Modelling1(2), 103-124.

Bates, D. M. (2010). lme4: Mixed-effects modeling with R. URL http://lme4. r-forge. r-project. org/book. Chapter 2.

Wednesday 10

Lax, Jeffrey, and Justin Phillips. 2009b. “How Should We Estimate Public Opinion in the States?” American Journal of Political Science 53(1): 107–21.

Ghitza, Y., & Gelman, A. (2013). Deep interactions with MRP: Election turnout and voting patterns among small electoral subgroups. American Journal of Political Science, 57(3), 762-776.

Wang, W., Rothschild, D., Goel, S., & Gelman, A. (2014). Forecasting Elections with Non-Representative Polls. International Journal of Forecasting. Forthcoming.

Thursday 11

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press, USA. Chapters 1, 3, 4, 5, 6, and 7.

Bates, D. M. (2010). lme4: Mixed-effects modeling with R. URL http://lme4. r-forge. r-project. org/book. Chapter 3.


Steele, F. (2008). Multilevel models for longitudinal data. Journal of the Royal Statistical Society: Series A (Statistics in Society)171(1), 5-19.

Yang, Y., & Land, K. C. (2008). Age–Period–Cohort Analysis of Repeated Cross-Section Surveys: Fixed or Random Effects? Sociological methods & research36(3), 297-326.

Friday 12

Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8-38.

Jackman, S. (2009). Bayesian analysis for the social sciences (Vol. 846). Wiley, Part I, Chapter 1.

Stegmueller, D. (2011). Apples and oranges? The problem of equivalence in comparative research. Political Analysis19(4), 471-487.

Software Requirements

Latest version of R (at least R 3.2) with possibility to install packages (if needed) on the go.

Hardware Requirements

Participants need to bring their own laptop with R installed.


In addition to mandatory readings, these pieces can be very useful both for alternative perspectives, applications, and further learning:


  1. Gelman, A. (2009). Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (Expanded Edition). Princeton University Press.
  2. Achen, C. H. (1982). Interpreting and using regression (Vol. 29). Sage Publications, Incorporated.
  3. Lewis-Beck, M. (1980). Applied regression: An introduction (Vol. 22). Sage Publications, Incorporated.
  4. Eliason, S. R. (1993). Maximum likelihood estimation: Logic and practice (Vol. 96). Sage Publications, Incorporated.
  5. Luke, D. A. (2004). Multilevel modeling (Vol. 143). Sage Publications, Incorporated.
  6. Snijders, T. A. A., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Sage Publications Limited.
  7. Hox, J. J. (2010). Multilevel analysis: Techniques and applications. Taylor & Francis.
  8. Raudenbush, S.W. and Bryk, A.S. (2002). Hierarchical Linear Models (Second Edition). Thousand Oaks: Sage Publications.
  9. Jackman, S. (2009). Bayesian analysis for the social sciences (Vol. 846). Wiley.
  10. Kruschke, J. (2010). Doing Bayesian data analysis: A tutorial introduction with R and BUGS. Academic Press.
  11. Gill, J. (2007). Bayesian Methods: A Social and Behavioral Sciences Approach. Chapman & Hall/CRC.

Recommended Courses to Cover Before this One

<p>Introduction to Generalized Linear Modeling</p> <p>Interpreting Binary Logistic Regression Models</p> <p>Multilevel Regression Modeling</p> <p>Advanced Topics in Applied Regression</p>

Recommended Courses to Cover After this One

<p>Panel Data Analysis: hierarchical structures, heterogeneity and serial dependence</p> <p>Age-period-cohort analysis</p> <p>Introduction to Bayesian Inference</p>

Additional Information


This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed at the time of change.

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, please contact us before registering.