Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Monday 29 July – Friday 2 August and Monday 5 – Friday 9 August
09:00–10:30 and 11:00–12:30
This course will teach you the application of simple, and then increasingly complex, multilevel specifications.
The first week sets the foundations. We start from basic hierarchical linear models (HLMs), with only random intercepts, to more complex specifications that allow us to understand how an effect varies across contexts.
As part of this progression we cover estimation, 2- and 3-level configurations, what sample size considerations apply to HLMs, and how to assess models’ adequacy.
In the second week we explore alterations to the fundamental framework introduced the previous week. We cover dichotomous outcomes, applying a multilevel specification to assess change over time (growth curve modelling), as well as how to analyse non-hierarchical data configurations.
If you have no prior knowledge of multilevel models, but want in-depth coverage, this course will serve you well. Sessions will be conducted entirely in R.
ECTS Credits for this course and, below, tasks for additional credits:
4 credits Attend at least 90% of course hours, and participate fully in in-class activities. Carry out the necessary reading and/or other work prior to and after classes.
6 credits As above, plus complete two take-home assignments in the first and second weeks, comprising an interpretation task, and a multilevel modelling task:
8 credits As above, plus submit a final paper:
I will provide more details about the tasks and requirements during the course itself.
Constantin Manuel Bosancianu is a postdoctoral researcher in the Institutions and Political Inequality unit at Wissenschaftszentrum Berlin.
His work focuses on the intersection of political economy and electoral behaviour: how to measure political inequalities between citizens of developed and developing countries, and what the linkages between political and economic inequalities are.
He is interested in statistics, data visualisation, and the history of Leftist parties. Occasionally, he teaches methods workshops on regression, multilevel modelling, or R.
─ Kreft & de Leeuw, 1998, page 1
This course will introduce you to a class of statistical specifications that allows for the rigorous analysis of data that exhibit such hierarchical properties: multilevel models.
Beyond their desirable statistical properties, though, the primary sell-point of these models is that they allow us to pose, and find supportive evidence for, more complex questions about the world. They do so by treating variation at multiple levels of the nesting structure not as a nuisance but as a substantively interesting feature of the data, to be modelled rather than corrected for.
An additional desirable feature of these models is their versatility: wherever data is nested in higher-order groups, it’s a good bet that a multilevel can be adapted and applied to such data.
The two weeks are devoted, first, to covering the foundations of multilevel modelling and, second, to exploring extensions of this core framework to alternative data configurations.
In the first week we explore common linear specifications: random-intercept and random-slope models. We go over statistical notation for these models, interpretation of coefficients, presenting uncertainty, reporting results from such models, testing and displaying the effect of interactions between group-level and observation-level variables, as well as what sample size considerations should be kept in mind when analysing such data.
In the second week we allocate each day to an extension of the standard hierarchical linear framework. We introduce generalised linear mixed specifications, with an application to a dichotomous response variable. We then show how to use a multilevel specification to analyse change over time, as well as how to produce sub-national estimates of public opinion based solely on national-level data. Finally, I highlight how multilevel specifications can be applied to data structures that are not hierarchical, as well as to data that has the added complication of spatial correlations. Throughout the sessions we make intensive use of the lme4 and nlme packages, along with a variety of functions from connected packages that assist in plotting, model comparison, and data reshaping.
By the end of the course you should be able to easily identify such nested data configurations in your own field of study, e.g. voters nested in electoral districts, companies grouped in regions, parliamentarians nested in committees. You should also be able to properly specify, theoretically as well as with R/lme4 syntax, a multilevel model that fits the data structure you are faced with, and answers the substantive questions you are interested in. Finally, you will be equipped to interpret statistical output from these models, assess any misfit between model and data, and present substantive results to either a lay or a specialist audience.
Day 1
We start by describing problems OLS faces when applied to data that is nested, and how multilevel models (MLMs) overcome these difficulties. In addition to their statistical properties, MLMs also allow us to answer more sophisticated questions about the world. These insights will be complemented by a short practice session in R focusing on how OLS breaks down in certain situations, and how multilevel models are a compromise between two alternative strategies of analysing data in these instances.
Day 2
I introduce notation for multilevel specifications, as well as the simplest type of such models, with only a varying intercept. We then cover interpretation and inference for these models, and what the implications are of allowing for a varying intercept.
Day 3
We gradually introduce more complex specifications, allowing for the effect of a predictor to vary between groups, and trying to understand whether any group-level predictors systematically explain how this effect varies. Cross-level interactions will be presented, along with techniques to graphically present their estimates. Finally, we discuss how to do variable centering and rescaling in the case of nested data.
Day 4
We discuss how to determine the best-fitting model from a series of specifications we might have tested, as well as how to assess the quality of our model. The latter topic brings us to the issue of assumptions for multilevel models, where we cover a few diagnostic tools.
Day 5
This day is reserved for a set of smaller topics, as well as a review of the most important ideas from the previous four days. I show that the insights gained apply relatively seamlessly to data with more than two levels of nestation. I also broach the topic of sample size requirements at all levels of the nesting structure, which frequently plague empirical analyses in political science.
We examine advanced extensions of the standard hierarchical linear specification; these demonstrate MLM’s flexibility MLM in dealing with a variety of data configurations and outcome variables.
Day 6
We begin with a coverage of generalised linear mixed models (GLMMs), with a specific focus on dichotomous dependent variables. By working through a practical example, we cover the interpretation of estimates from these models, the presentation of marginal effects, and sample size considerations.
Day 7
I introduce multilevel models as a solution to the need to model change over time in a phenomenon, and to explain such change with time-varying and time-invariant predictors. I explain how to see such a setup as a nested data configuration, plotting and modelling trajectories of change, as well as choosing from a variety of error covariance structure options.
Day 8
We tackle cross-classified and multiple-membership models. These specifications accommodate situations where observations are simultaneously nested in two non-overlapping hierarchies, or where observations can be members of multiple higher-level units at the same time. Although this complicates our setup, we will see that multilevel models are well equipped to handle this situation. We finish the day by covering one way in which these models can help us disentangle age-period-cohort (APC) effects.
Day 9
We continue the discussion on cross-classified models by introducing multilevel regression with post-stratification (MRP). This is a specialised modelling technique that produces estimates of public opinion for sub-national units and population sub-groups from nationally representative surveys. Given the adaptability of this technique to varying types of attitudes (such as vote preferences), as well as to contexts where no census information is available (MRSP), it represents a valuable potential tool for you to master.
Day 10
We wrap up by considering the extension of multilevel models to situations where the data exhibits spatial dependence between observations. From this perspective, the distance between observations can also impact on the phenomenon we’re interested in, above and beyond the variance explained by level-1 characteristics or by group-level predictors. Following a practical example, I showcase the implementation of such models in R, as well as the interpretation of the estimates obtained.
Note on readings Some of the days require extensive preparatory readings. Before starting the class, I strongly recommend you review regression-related considerations, so please allocate extra time before the course to familiarise yourself with the reading workload.
To guarantee that we progress at a steady and firm pace, you'll need a thorough understanding of OLS linear regression, at a theoretical and practical level.
You should be able to interpret coefficients and model fit, work with residuals, plot marginal effects, interpret and graphically display interactions, and assess and correct regression assumption violations. You should also have basic knowledge of generalised linear models (at least binary logistic regression) and Maximum Likelihood estimation.
The course is conducted entirely in the R statistical environment, so I expect you to have practical experience with R for data management and recoding, making exploratory graphs, running OLS regressions, and plotting quantities of interest based on regression output. This experience should naturally extend to reading in data in different formats, such as Excel, SPSS, or Stata, into R.
To brush up on the above, I recommend the relevant chapters from Michael Crawley’s The R Book (2nd edition, Wiley, 2012), or Richard Cotton’s Learning R (O’Reilly Media, 2013).
If your regression knowledge was acquired a long time ago, brush up on the essential concepts and features using John Fox’s Applied Regression Analysis and Generalized Linear Models (3rd edition, Sage Publications, 2016).
Particularly relevant are chapters 5, 6, 7, 11, 12, and 14. A good coverage of Maximum Likelihood estimation is provided in Craig K. Enders’ Applied Missing Data Analysis (Guilford Press, 2010), chapter 3, or in Scott R. Eliason’s Maximum Likelihood Estimation: Logic and Practice (Sage Publications, QASS Series, 1993).
Each course includes pre-course assignments, including readings and pre-recorded videos, as well as daily live lectures totalling at least three hours. The instructor will conduct live Q&A sessions and offer designated office hours for one-to-one consultations.
Please check your course format before registering.
Live classes will be held daily for three hours on a video meeting platform, allowing you to interact with both the instructor and other participants in real-time. To avoid online fatigue, the course employs a pedagogy that includes small-group work, short and focused tasks, as well as troubleshooting exercises that utilise a variety of online applications to facilitate collaboration and engagement with the course content.
In-person courses will consist of daily three-hour classroom sessions, featuring a range of interactive in-class activities including short lectures, peer feedback, group exercises, and presentations.
This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc.). Registered participants will be informed at the time of change.
By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, please contact us before registering.
Day | Topic | Details |
---|---|---|
1 | Multilevel models: introduction and notation |
Lecture topics
Lab topics
Duration ~120 min. lecture and ~60 min. lab. |
2 | Random intercepts in MLM |
Lecture topics
Lab topics
Duration ~90 min. lecture and ~90 min. lab. |
3 | Random slopes and cross-level interactions |
Lecture topics
Lab topics
Duration ~90 min. lecture and ~90 min. lab. |
4 | Model fit and diagnostics |
Lecture topics
Lab topics
Duration ~90 min. lecture and ~90 min. lab. |
5 | Extensions of the framework: 3-level models and Week 1 recap |
Lecture topics
Lab topics
Duration ~60 min. lecture and ~120 min. lab. |
7 | Modelling change over time in MLM framework |
Lecture topics
Lab topics
Duration ~90 min. lecture and ~90 min. lab. |
8 | Cross-classified and multiple membership models |
Lecture topics
Lab topics
Duration ~90 min. lecture and ~90 min. lab. |
9 | Multilevel regression with post-stratification |
Lecture topics
Lab topics
Duration ~90 min. lecture and ~90 min. lab. |
6 | Generalised linear mixed models: dichotomous outcomes |
Lecture topics
Lab topics
Duration ~90 min. lecture and ~90 min. lab. |
10 | Multilevel spatial modelling |
Lecture topics
Lab topics
Duration ~90 min. lecture and ~60 min. lab. |
Day | Readings |
---|---|
NB: I expect you to do the readings before the scheduled meeting The primary textbook assigned for the course is Andrew Gelman and Jennifer Hill’s Data Analysis using Regression and Multilevel/Hierarchical Models (CUP, 2007). Selected chapters have been sourced from other notable multilevel model books, such as Tom Snijders and Roel Bosker’s Multilevel Analysis (Sage, 1999), or Stephen Raudenbush and Anthony Bryk’s Hierarchical Linear Models (Sage, 2002). The primary source for the longitudinal analysis module in Week 2 is Judith Singer and John Willett’s Applied Longitudinal Data Analysis (OUP, 2003). |
|
1 |
Kreft, Ita, and Jan De Leeuw. 1998 Gelman, Andrew, and Jennifer Hill. 2007 Optional Snijders, Tom A. B., and Roel J. Bosker. 1999 Bickel, Robert. 2007 Scott, Marc A., Patrick E. Shrout, and Sharon L. Weinberg |
2 |
Gelman, Andrew, and Jennifer Hill. 2007 Enders, Craig K., and Davood Tofighi. 2007 Optional Gill, Jeff, and Andrew J. Womack. 2013 Snijders, Tom A. B., and Roel J. Bosker. 1999 Raudenbush, Stephen W., and Anthony S. Bryk. 2002 Steenbergen, Marco R., and Bradford S. Jones. 2002 |
3 |
Gelman, Andrew, and Jennifer Hill. 2007 McNeish, Daniel M., and Laura M. Stapleton. 2016 Optional McNeish, Daniel M. 2017 Snijders, Tom A. B., and Roel J. Bosker. 1999 Brambor, T., Clark, W. R., & Golder, M. (2005) |
4 |
Steele, Russell. 2013 Raudenbush, Stephen W., and Anthony S. Bryk. 2002 Optional Snijders, Tom A. B., and Roel J. Bosker. 1999 Snijders, Tom A. B., and Johannes Berkhof. 2008 |
5 |
Goldstein, Harvey. 2011 McNeish, Daniel, and Kathryn R. Wentzel. 2017 Brincks, Ahnalee M., Craig K. Enders, Maria M. Llabre, Rebecca J. Bulotsky-Shearer, Guillermo Prado, and Daniel J. Feaster. 2017 Optional Bickel, Robert. 2007 |
6 |
Gelman, Andrew, and Jennifer Hill. 2007 Hox, Joop J. 2010 Optional Snijders, Tom A. B., and Roel J. Bosker. 1999 Bates, Douglas M. 2010 |
7 |
Singer, Judith D., and John B. Willett. 2003 Optional Singer, Judith D., and John B. Willett. 2003 Hox, Joop J. 2010 Goldstein, Harvey. 2011 Laird, Nan M., and Garrett M. Fitzmaurice. 2013 Núñez-Antón, Vicente, and Dale L. Zimmerman |
8 |
Fielding, Antony, and Harvey Goldstein. 2006 Snijders, Tom A. B., and Roel J. Bosker. 1999 Optional Goldstein, Harvey. 2011 Yang, Yang, and Kenneth C. Land. 2008 Hox, Joop J. 2010 |
9 |
Ghitza, Yair., and Andrew Gelman. 2013 Lax, Jeffrey R., Justin H. Phillips. 2009 Optional Lax, Jeffrey R., and Justin H. Phillips. 2009 Leemann, Lucas, and Fabio Wasserfallen. 2017 |
10 |
Ghitza, Y., & Gelman, A. (2013). Deep interactions with MRP: Election turnout and voting patterns among small electoral subgroups. American Journal of Political Science, 57(3), 762-776. Leemann, L., & Wasserfallen, F. (2017). Extending the use of prediction precision of subnational public opinion estimates. American Journal of Political Science, 61(4), 1003-1022. Recommended: Lax, Jeffrey, and Justin Phillips. 2009b. “How Should We Estimate Public Opinion in the States?” American Journal of Political Science 53(1): 107–21. Wang, W., Rothschild, D., Goel, S., & Gelman, A. (2015). Forecasting elections with non-representative polls. International Journal of Forecasting, 31(3), 980-991. |
10 |
Dong, Guanpeng, Jing Ma, Richard Harris, and Gwilym Pryce. 2016 Corrado, Luisa, and Bernard Fingleton. 2011 Optional Harris, Richard, John Moffat, and Victoria Kravtsova. 2011 Gelfand, Alan E., Sudipto Banerjee, C. F. Sirmans, Yong Tu, and Seow Eng Ong. 2007 |
R 3.5.2 or any newer version
RStudio 1.2.1322 or any newer version
Please bring your own laptop to lecture and lab sessions.
Any computer or laptop bought within the last 3–4 years should be sufficient.
4 GB of RAM and 200–300 MB of free space on the hard drive are enough to run the tasks we will attempt.
The assigned readings are some of the most commonly used textbooks in the field of multilevel modelling.
If you would like to consult additional sources, particularly in terms of how to implement such models in commonly used software packages, consult the literature below:
Winter School
Maximum Likelihood Estimation
Linear Regression with R/Stata: Multiple Regression Analysis
Logistic Regression and Generalised Linear Models
Summer School
Linear Regression with R/Stata: Multiple Regression Analysis
Introduction to Logistic Regression and General Linear Models: Binary, Ordered, Multinomial and Count Outcomes
Winter School
Introduction to Bayesian Inference
Summer School
Introduction to Structural Equation Modelling
Multilevel Structural Equation Modelling
Introduction to Bayesian Inference