Once a person becomes comfortable with basic statistics and learns to use regression, often a new question arises. What next? While the possible answers to this question are endless, this course offers one such answer. Building on the assumptions regression models make (which are reviewed extensively in the course), this course offers an overview of multitude of ways the assumptions can be relaxed. In the process the course trains researchers to carefully think about these assumptions and become better data analysts and social scientists at the same time. The relaxing of regression assumptions allows us to look at the world from a new angle, to ask novel research questions that do not always follow the logic of one dependent and multiple independent variables familiar from regression models. Since many of the assumptions of regression models can be relaxed in a large number of ways, the course offers an introduction to many statistical techniques that either complement or build on regression analysis. Many of these techniques would deserve their own course (one of them, Multilevel Regression Modeling, I teach at the ECPR Winter School in Methods). Despite the number of topics covered, the course not only allows the students to master the basics of these techniques, it goes much further. It arms participants with the basic knowledge to comprehend the related literature and acquire an in depth understanding of the broader issues on their own. The course aims to tear down the barriers that come between written applied statistical textbooks and the consumer of the techniques which often exists because of a lack of appropriate foundation in the specific areas of statistics, that stem from the lack of understanding of what problems these advanced techniques solve and why they are absolutely crucial in producing solid scientific work.
This course used to be two weeks, but this year I cut the number of topics and made the course more intensive. This year, with the expansion of the methods school, we are offering entire classes on some of the omitted topics.
The class focuses on the following assumptions of regression models: random sampling, independence and the absence of measurement error. After the first day’s overview of the class, the assumptions of regression models are reviewed in depth on the second day. The course will cover what happens when these assumptions are violated, how to test these assumptions and, in the easy cases, how to correct your analysis to avoid violating any assumptions.
Tuesday’s class will revolve around the issue of heterogeneity. Regression models make the assumption that observations (more specifically the post-control variable residuals of observations) are independent of each other. This assumption is often hard to meet. If any heterogeneity is present among observations that are not accounted for in the model, the model coefficients and significance tests will be biased. How the independent observation assumption can be met is the topic of the class. We discuss the explicit modeling of known heterogeneity with both control variables, fixed and random effects and explicit development of multilevel models designed to deal with this specific issue. If time permits I will briefly mention the modeling of unobserved heterogeneity. Mixture models can inductively derive subgroups of the observations and estimate different regression results for the sub-groups. All this is done in a way that maximizes model fit. These mixture models are not only useful in eliminating latent heterogeneity in the regression model, it is also useful in producing sub-classifications of our population that adhere to different characteristics based on our specified model. Which cases belong in which subgroups can become a research question of its own. But these models come with a set of problems that are difficult to overcome and therefore practical use of the approach is rare and limited. Additionally, this class will also be devoted to overcoming measurement error issues. Measurement is often an under-appreciated process of the quantitative social science, despite the fact that the problem unites both qualitative and quantitative paradigms. Poor measurements bias regression estimates by making them appear less strong and significant. In Tuesday’s class we will consider the theories of measurement and ways to assess quality of measurements in practice.
Wednesday we cover bootstrapping with a focus on estimating confidence intervals with the technique. Bootstrapped confidence intervals are more robust to some assumption violations in regression than methods that derive confidence intervals from the standard errors. This is especially true for smaller samples, in presence of heteroskedasticity and when outliers are present. Bootstrapped confidence intervals are more robust to violations of linearity and correct model specification as well.
Thursday’s class will consider the use of regression weights. Weights can be incorporated into regression models for a multitude of reasons. They can be used to correct for sampling error or survey (unit) nonresponse. The class will address the debate if weights are useful and if they should be used at all. In addition to demonstrating the use of weights in regressions, the class will also show how to avoid common mistakes when using regression weights. As a related topic, this class is also devoted to the topic of missing data. What do we do when our regression has item missing data? The class will cover various theories of missing data that should be considered when devising a solution to the problem. In practice the two most commonly used modern approaches for missing data correction is imputation and direct estimation using full information. The class will also demonstrate commonly used methods that are probably left alone never to be used.
Finally, on Friday, we introduce modern methods designed to aid the selection of various alternative model specifications, and also discuss model averaging.