Using multilevel models became a trend in recent years, however the switch from educational sciences with students nested in different schools (or teachers) to how volatility of a party system might influence individual political decisions is not trivial. These models allow researchers to test various comparative hypotheses that previously were only tentative explanations in research carried out on separate country samples. Furthermore, the abundance of cross-national survey data (such as EES, ESS, CSES) increasingly invites researchers to use these models. Nevertheless, multilevel linear models have specific assumptions and their use is guided both by theoretical reasoning and data properties. Also, once a general understanding of multilevel models is acquired, they present themselves as an extremely valuable and versatile set of tools for complex questions. In this course we will focus on three core aspects of applied multilevel modelling: 1) properties and specification of multilevel models, 2) linking method with theories of heterogeneity, and 3) implementation in R.
The present course is set up as a two-week course. The first week is dedicated to the general principles of multilevel models and implementation, including varying intercepts varying slope multilevel linear models, with additional focus on uncertainty, prediction, and limitations. The second week is dedicated to more advanced topics that usually appear in applied research. Each day has a lecture component and a lab component and along the assigned readings, the end of the first week will feature an overview homework. We start out the second week by reviewing together this homework. We will work with multiple datasets (overwhelmingly survey data) throughout the whole course, step-by-step specifying more complicated models, or extending our models to accurately reflect the formulated theory.
The lab sessions (taught in R) accompany the course and we will go through examples of multilevel models in applied research. Moreover, we will extract, display and discuss quantities of interest and link them directly to the concepts covered during the lectures, and discuss how these should be reported in an academic paper and how this can be easily formatted and exported from R. The lab sessions can be described as “supervised individual/group work”.
After a brief discussion of the course logistics we will review principles of inference, linear regression and assumption violations on day one. We introduce examples of comparative research questions and hypotheses, and what sort of data and method requirements have to be met.
Day two is dedicated to nested data structures and what challenges these raise for pooled regression models. We review the alternatives such as the comparison of single group regressions run separately and pooled regression with cluster corrected standard errors and their limitations and start discussing properties of the variables that will be included in the multilevel models. In this case, the sources of variation (within and between group) are of specific interest.
Day three offers the methodological and statistical transition from pooled regression to multilevel modelling with varying intercepts and varying slopes. We focus on principles and assumptions of multilevel models, discussing the great benefits but also the possible limitations (both from a theoretical and statistical perspective). We will also spend quite some time on clarifying notation and the meaning of these terms (i.e. fixed and random effects, correlation between random effects, etc.).
On day four we will extend our models to include multiple predictors, discussing both data preparation (as in centering) and interpretation, with strong focus on cross-level interactions and comparative model fit evaluation.
The last day of the first week is designated to uncertainty, prediction, and power. The first two elements are quantities or procedures that are both important for contextualizing our inferences and for a better reporting of our results. Within the framework of multilevel models, the presence of random effects raises several considerations about how we calculate uncertainty around the estimates, or how do we present our results using new data for model based predictions.
Day six we kick-off with the review of the weekend homework assignment and then switch to generalized linear models. After a short overview of link functions and the different quantities that we are interested in (such as predicted probabilities of a particular outcome category), we will focus on multilevel models, including second level predictors and cross level interactions, for dichotomous variables and counts.
On day seven we analyze data where we have a more complicated nesting structure. As in education research where pupils attend a particular elementary school and then a particular high school, we can find these situations in many other research areas. Observations can be nested in both countries and years, or specific survey responses can be nested in individuals and different modes, and so on. We will discuss cross-classified and multiple membership models in order to accurately account for this nesting structure and evaluate hypotheses that are linked to multiple grouping units.
We dedicate day eight to deep interactions and poststratification, as an extremely useful approach for deriving sub-group level estimates (for example, geographic and demographic sub-categories) from data that is available only on a higher level (such as, a nationally representative survey). As in most cases researchers also have access to rich official statistics at sub-group levels (but not good quality attitudinal data), these combined in a multilevel framework can enhance the quality of estimates for sub-group levels in terms of attitudes and reported behaviors, even with relatively small sample sizes.
One rather specific, but still intuitive application of multilevel models appears in the case of modelling longitudinal data. In many cases, change (continuous or discontinuous) throughout time is of interest for researchers, and the multilevel framework offers extensive possibilities to accurately model this, easily incorporating time varying predictors for example. We extend the basic longitudinal models to be able to handle unbalanced data (variably spaced data), but also data with varying numbers of measurement occasions, as these problems often characterize real data stemming from surveys. These models will be the topic of day nine.
On the last day, after a summary, we will look into future directions and important extensions for applied research. Most notably, bulk of the quantitative comparative research uses survey data and thus response behavior and systematic cross-country variation these can be a real issue. We will discuss multilevel item-response models to account for these possible problems and review previous examples on how substantive findings might change without considering these effects. Finally, we introduce how to transition to Bayesian estimation for hierarchical models.