ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

SD207 - Advanced Topics in Applied Regression

Instructor Details

Instructor Photo

Levente Littvay

Institution:
Central European University

Instructor Bio

Levente Littvay researches survey and quantitative methodology, twin and family studies and the psychology of radicalism and populism.

He is an award-winning teacher of graduate courses in applied statistics with a topical emphasis in electoral politics, voting behaviour, political psychology and American politics.

He is one of the Academic Convenors of ECPR’s Methods School, and is Associate Editor of Twin Research and Human Genetics and head of the survey team at Team Populism.

 @littvay

Course Dates and Times

Monday 1 to Friday 5 August 2016
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days

Prerequisite Knowledge

A solid understanding of linear and logistic regression to the level that is described in the following texts.  Michael Lewis-Beck. (1980). Applied Regression: An Introduction. Newbury Park, CA: Sage, John Fox. (1991). Regression Diagnostics. Newbury Park, CA: Sage and Fred C. Pampel. (2000). Logistic Regression: A Primer. Newbury Park, CA: Sage (All books are from the Quantitative Applications in the Social Science, aka. little green books, series.)

 

You should also be comfortable to conduct basic data management, import and export and the analyses described in the listed books in at least one statistical package of your choice.  You also need to be open to learning other statistical packages.  In this course we will use R.  First lab session we will have a quick review of how to run regressions in R.  It would be helpful if you knew R or at least took the pre-session class.

 

This course starts where the ECPR Summer Course on Multiple Regression Analysis: Estimation, Diagnostics and Modelling (SD105 - Week 1) and Intro to GLM: Binary, Ordered and Multinomial Logistic, and Count Regression Models (SD106 - Week 2) ends.  If you do not feel prepared to come to Advanced topics, I recommend you take these classes.

Short Outline

Once a researcher becomes comfortable with regression, often the question arises. What next? Building on the assumptions regression models make (especially independence and lack of measurement error), this course offers an overview of multitude of ways the assumptions can be relaxed. In the process the course trains researchers to carefully think about these assumptions and become better data analysts and social scientists at the same time. The relaxing of regression assumptions allows us to look at the world from a new angle, to ask novel research questions.

 

The course offers an introduction to many statistical techniques that either complement or build on regression analysis. These include fixed and random effects, ideas behind multilevel modeling, measurement, reliability and validity, missing data and deeper understanding of model fit and model selection.

 

On a practical note. Most of this class will be in the classroom.  I may demonstrate some techniques using R (and if you are a proficient R user, you may be able to follow along on your laptop if you bring it) the purpose of the course is not to do practicals, but to teach you the methods.  The practicalities you can do at home and if you get stuck, we have consultation.  Scripts to do what we learn in class will be provided.

Long Course Outline

Once a person becomes comfortable with basic statistics and learns to use regression, often a new question arises. What next? While the possible answers to this question are endless, this course offers one such answer. Building on the assumptions regression models make (which are reviewed extensively in the course), this course offers an overview of multitude of ways the assumptions can be relaxed. In the process the course trains researchers to carefully think about these assumptions and become better data analysts and social scientists at the same time. The relaxing of regression assumptions allows us to look at the world from a new angle, to ask novel research questions that do not always follow the logic of one dependent and multiple independent variables familiar from regression models. Since many of the assumptions of regression models can be relaxed in a large number of ways, the course offers an introduction to many statistical techniques that either complement or build on regression analysis. Many of these techniques would deserve their own course (one of them, Multilevel Regression Modeling, I teach at the ECPR Winter School in Methods). Despite the number of topics covered, the course not only allows the students to master the basics of these techniques, it goes much further. It arms participants with the basic knowledge to comprehend the related literature and acquire an in depth understanding of the broader issues on their own. The course aims to tear down the barriers that come between written applied statistical textbooks and the consumer of the techniques which often exists because of a lack of appropriate foundation in the specific areas of statistics, that stem from the lack of understanding of what problems these advanced techniques solve and why they are absolutely crucial in producing solid scientific work.

 

This course used to be two weeks, but this year I cut the number of topics and made the course more intensive.  This year, with the expansion of the methods school, we are offering entire classes on some of the omitted topics.

 

The class focuses on the following assumptions of regression models: random sampling, independence and the absence of measurement error. After the first day’s overview of the class, the assumptions of regression models are reviewed in depth on the second day. The course will cover what happens when these assumptions are violated, how to test these assumptions and, in the easy cases, how to correct your analysis to avoid violating any assumptions.

 

Tuesday’s class will revolve around the issue of heterogeneity. Regression models make the assumption that observations (more specifically the post-control variable residuals of observations) are independent of each other. This assumption is often hard to meet. If any heterogeneity is present among observations that are not accounted for in the model, the model coefficients and significance tests will be biased. How the independent observation assumption can be met is the topic of the class. We discuss the explicit modeling of known heterogeneity with both control variables, fixed and random effects and explicit development of multilevel models designed to deal with this specific issue. If time permits I will briefly mention the modeling of unobserved heterogeneity. Mixture models can inductively derive subgroups of the observations and estimate different regression results for the sub-groups. All this is done in a way that maximizes model fit. These mixture models are not only useful in eliminating latent heterogeneity in the regression model, it is also useful in producing sub-classifications of our population that adhere to different characteristics based on our specified model. Which cases belong in which subgroups can become a research question of its own.  But these models come with a set of problems that are difficult to overcome and therefore practical use of the approach is rare and limited.  Additionally, this class will also be devoted to overcoming measurement error issues. Measurement is often an under-appreciated process of the quantitative social science, despite the fact that the problem unites both qualitative and quantitative paradigms. Poor measurements bias regression estimates by making them appear less strong and significant. In Tuesday’s class we will consider the theories of measurement and ways to assess quality of measurements in practice.

 

Wednesday we cover bootstrapping with a focus on estimating confidence intervals with the technique. Bootstrapped confidence intervals are more robust to some assumption violations in regression than methods that derive confidence intervals from the standard errors. This is especially true for smaller samples, in presence of heteroskedasticity and when outliers are present. Bootstrapped confidence intervals are more robust to violations of linearity and correct model specification as well.

 

Thursday’s class will consider the use of regression weights. Weights can be incorporated into regression models for a multitude of reasons. They can be used to correct for sampling error or survey (unit) nonresponse. The class will address the debate if weights are useful and if they should be used at all. In addition to demonstrating the use of weights in regressions, the class will also show how to avoid common mistakes when using regression weights.  As a related topic, this class is also devoted to the topic of missing data. What do we do when our regression has item missing data? The class will cover various theories of missing data that should be considered when devising a solution to the problem. In practice the two most commonly used modern approaches for missing data correction is imputation and direct estimation using full information.  The class will also demonstrate commonly used methods that are probably left alone never to be used.

 

Finally, on Friday, we introduce modern methods designed to aid the selection of various alternative model specifications, and also discuss model averaging.

Day-to-Day Schedule

Day-to-Day Reading List

Software Requirements

R -- http://www.r-project.org  (Newest version - FREE)

Hardware Requirements

If you bring your laptop (not a must) - 2GB RAM (4GB Preferred.)  Intel Atom CPU OK.  But anything produced after 2008 is OK.

Literature

Allison, Paul D. (2001). Missing Data. Newbury Park, Sage;

 

Enders, Craig K. (2010). Applied Missing Data Analysis. The Guilford Press;

 

Fox, John. (2008). Applied Regression Analysis and Generalized Linear Models, Sage;

 

Luke, Douglas. (2004). Multilevel Modeling. Sage;

 

Raudenbush, Stephen W. and Anthony S. Bryk. (2001). Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed. Sage;

The following other ECPR Methods School courses could be useful in combination with this one in a ‘training track .
Recommended Courses Before

Interpreting Binary Logistic Regression Models

Multiple Regression Analysis: Estimation, Diagnostics and Modelling

Intro to GLM: Binary, Ordered and Multinomial Logistic, and Count Regression Models

Data analysis course (introductory)

 

Recommended Courses After

Multilevel Regression Modelling

Structural Equation Modeling

Handling Missing Data

Causal Analysis

Panel Data Analysis

Introduction to Bayesian Inference

Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Convenors

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.


Share this page