Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Monday 17 – Friday 21 February 2019, 14:00 – 17:30 (finishing slightly earlier on Friday)
15 hours over five days
Even though the buzzwords of our times are 'big data' and 'computer/data science', the foundation of statistical analyses in the social sciences is still classical linear regression. The robustness and versatility of this basic analytical technique is applicable to various research problems, and knowledge and experience gained in OLS regression is transferrable to other methods. If learning statistics should start from somewhere, it should be linear regression.
This course will teach you how to apply, evaluate and interpret the results of linear regression models in R.
We start off very briefly with the prerequisites – a revision of the essential knowledge needed prior to running a regression – and move on from basic specifications of the model to more complex problems and interpretations.
We will go through regression assumptions and problems with assumption violations, look at how to use dummy variables and interactions in regression models and how the framework of linear regression can also accommodate non-linear associations.
The class ends with a focus on the presentation of regression results through tables and plots.
Tasks for ECTS Credits
2 credits (pass/fail grade)
Attend at least 90% of course hours, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.
3 credits (to be graded)
As above, plus complete a take-home assignment which involves fitting and interpreting regression models on the basis of pre-given data and model specifications.
4 credits (to be graded)
As above, plus complete a final paper that should be structured like an academic journal/conference article, with the exception that the literature review section can be just 2–3 paragraphs where you present the puzzle. Identify a few hypotheses you are interested in testing, and test them based on data of your choosing.
Martin Mölder (PhD in comparative politics) is a researcher Johan Skytte Institute of Political Studies at the University of Tartu, Estonia.
His main research focus is political parties, their ideological and political positions, and the functioning of party systems. He also teaches, among other things, quantitative methods.
Martin has extensive background in the use of R for data management and statistical analysis in the social sciences.
He has taught the following courses at the ECPR Summer School in Methods & Techniques:
If you want to move on to complex analyses and statistical models, you need to get the simple things right first. This course will familiarise you with the basic statistical concepts that will enable you to fully and correctly use the framework of linear regression.
By the end of the course you will have the theoretical and practical skills to responsibly run multivariate linear regressions on a variety of data configurations. This includes estimating multiple model specifications in R, presenting results in tables or in a graphical format and interpreting the coefficients for the reader. It also implies assessing the appropriateness of OLS regression for certain kinds of data and learning to make suitable corrections and adjustments when there is a mismatch between model requirements and data characteristics.
The basic introductory course in statistics usually gets to regression at the very end (if it goes further than that, then the speed of the course was probably too high). This course is for those who have got to that point but have not moved on much further. We will not focus too much on theory, but put the emphasis on correct application and interpretation. You don't need to know the mathematics that happens behind the scenes to understand what a regression model does and what it is capable of doing.
The course is also suitable for those who have briefly encountered OLS regression as part of a statistics class, but now wish to better understand how it works, where it breaks down, and how it can be applied in a thorough way. Due to the need to constantly focus on the application of linear models, the course is unsuitable for those who want an introductory course in general statistics. During one of the sessions we briefly cover some basic statistical concepts and tests, but this is only so that we can all delve into the topic of linear models from an equal footing. This cannot be considered a substitute for a good coverage of introductory statistics.
Day 1
We start with a condensed review of some fundamental concepts in basic statistics: the z and t distributions, hypothesis testing, confidence intervals and correlation. This overview is intended to provide a solid foundation from which to advance in the following days. We begin to discuss a few basics of regression, such as how it goes beyond correlation, and for what type of questions it is helpful. In the lab session, we will go through a few of the basic data manipulation procedures commonly required before running any regression: data cleaning and recoding, transformations of data, etc. This is a good opportunity for you to get familiar, if need be, with working with syntax files in R and with the RStudio interface.
Day 2
We delve fully into the fundamentals of Ordinary Least Squares (OLS) regression: how the estimation is carried out, and how we interpret the coefficients for simple (one predictor) and multiple regression (two or more predictors). I will present some basic formulas, but the goal will be to gain an intuitive understanding of how the estimation process functions, and what the results mean. In the lab session we put this newly-gained knowledge to the test, by running a few examples of linear models in R. We will interpret the output and the model fit.
Day 3
We advance in our understanding of OLS by focusing on model specifications that almost always appear in empirical research, like models with dummy variables and with interactions between variables. We discuss the interpretation of such models and how you ought to communicate it to your audience. In the lab component we learn how to run these model specifications with R.
Day 4
This day is devoted to preventing abuse in the estimation of linear models. As with the vast majority of statistical procedures, a series of assumptions underpins OLS regression. If these are not met, our results may deceive us. In this session we go over these assumptions, how they influence the results when they are not met, and what strategies we have to overcome this situation. In the lab we turn to these issues from a practical perspective. We run a test regression in R, assess whether the assumptions are met, and correct for assumption violations that exist (if possible). Through a step-by-step process, you will see how your estimates and model fit change when engaging in such a process.
Day 5
On the last day we look into the various ways regression results can be presented. A properly formatted and thought through regression table is a must, but sometimes that is not enough. In fact, I would say that almost always it is not enough for an effective presentation of your models and your conclusions. For that it is necessary to visualise your results. This is especially true for interactions and non-linear effects. While tables of coefficients are still the dominant way of presenting results in academic journals, graphs and predicted values tend to be preferred in reports and analyses for larger, non-technical audiences. I believe strongly that you should be familiar with both types and should tailor the delivery of your results to the audience.
This course presumes a basic knowledge of fundamental statistical concepts such as hypothesis testing and comparison of means (t-tests).
If you have no background in statistics, you should also take Florian Weiler's course Introduction to Statistics for Political and Social Scientists.
The class will be carried out in R. Therefore, you should have a basic knowledge of R as a statistical programming language and of RStudio.
The class assumes that you know how to read in data and a knowledge of basic data management skills as well as basic plotting commands.
If you have no experience with R, you should also take Thorsten Schnapp's short entry-level Introduction to R course.
Day | Topic | Details |
---|---|---|
Day 1 | From correlation to regression: revisiting the basics |
We cover a few foundational concepts in statistics: correlation, standard error, t test, t and z distributions. We also make our first forays into the regression setup. In the lab part, we get familiar with R or Stata, and try a few basic data manipulation and transformation tasks. All of these tasks habitually have to be performed before running a regression. |
Day 2 | OLS fundamentals, coefficients and model fit. |
We go through the estimation of OLS models and the interpretation of coefficients for simple and multiple regression. In the lab session, we run a few regressions in R, and go through interpreting coefficients and measures of model fit once more. |
Day 3 | Dummy variables, interactions, non-linear associations. |
We discuss slightly more complex model specifications which include dummy variables, interactions between variables and non-linear effects of predictors. In the lab, we go through such models and their interpretations. |
Day 4 | Regression assumptions: violations and remedies. |
This session covers the assumptions underpinning OLS regression, what the implications of assumption violations are, and how to correct for them, if possible. The lab session will offer practical strategies to identify assumption violations. We also see how estimates and model fit statistics change when correcting for some of these violations. |
Day 5 | Recap and presentation of regression models through regression tables and plots of coefficients and predicted values |
In this last session, we review a few of the most important ideas covered in the past four days, based on participants’ requests. I show a few of the ways in which regression results can be presented to the audience, and discuss the strengths and weaknesses of each. In the lab I show code for the presentations of results, and also allow for a recap of any topics participants feel we should cover again. |
Day | Readings |
---|---|
Day 1 |
Revisiting the basics Field, Andy, Jeremy Miles, and Zoë Field. 2012 I assume that many of the topics these chapters cover are familiar to you so they should at least to some extent be a refresher. Skim through as necessary. Advanced optional: Fox, J. (2008) |
Day 2 |
OLS fundamentals Field, Andy, Jeremy Miles, and Zoë Field. 2012 Advanced optional: Fox, J. (2008) |
Day 3 |
Dummy variables, interactions, non-linear associations Hardy, M. A. (1993) Brambor, T., Clark, W. R., & Golder, M. (2005) Advanced optional: Fox, J. (2008) |
Day 4 |
Regression assumptions Fox, J. (1991) Advanced optional: Fox, J. (2008) |
Day 5 |
Presentation of regression results Gelman, A., Pasarica, C., & Dodhia, R. (2002) Breheny P and Burchett W (2017) |
The primary textbook for the course is: Field, Andy, Jeremy Miles, and Zoë Field. 2012 It is an engaging introductory level textbook into statistics that also covers the basics of regression and provides examples in R. For each topic I will also indicate additional readings that help to take you further into the topics. |
Up-to-date versions of R and RStudio.
Please bring your laptop.
Literature on regression is ubiquitous. It is part of every introductory textbook and there is myriad literature that can take you into more advanced topics on regression. Before I get to some of those, I'd like to mention two textbooks that give a good overview of topics about and around regression. The first is a very simple textbook, the second more advanced:
Gravetter, F.J. and Wallnau, L.B. (2016)
Statistics for the Behavioral Sciences
Cengage Learning
Agresti, A and Finlay, B (2008)
Statistical Methods for the Social Sciences
Prentice Hall
Below is a by no means an exhaustive list of literature related to regression that will take your knowledge about regression further.
Belsley, D. A., Kuh, E., & Welsch, R. E. (2004)
Regression Diagnostics: Identifying Influential Data and Sources of Collinearity
New York: Wiley
Berry, W. D. (1993)
Understanding Regression Assumptions. Quantitative Applications in the Social Sciences
Thousand Oaks, CA: Sage Publications
Braumoeller, B. F. (2004)
Hypothesis Testing and Multiplicative Interaction Terms
International Organization, 58(4), 807–820
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003)
Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd ed.
Mahwah, NJ: Lawrence Erlbaum Associates
Jaccard, J., & Turrisi, R. (2003)
Interaction Effects in Multiple Regression (2nd ed.)
London: Sage Publications.
Kaufman, R. L. (2013)
Heteroskedasticity in regression: Detection and correction (Vol. 172)
Sage Publications
Lewis-Beck, M. S. (1980)
Applied Regression: An Introduction. Quantitative Applications in the Social Sciences Series
London: Sage
Motulsky, H. J., & Ransnas, L. A. (1987)
Fitting curves to data using nonlinear regression: a practical and nonmathematical review
The FASEB Journal, 1(5), 365–374
Ritz, C., & Streibig, J. C. (2008)
Nonlinear Regression with R
New York: Springer
Ryan, T. P. (2008)
Modern Regression Methods (2nd ed.)
Hoboken, NJ: Wiley
Sheather, S. J. (2009)
A Modern Approach to Regression with R
New York: Springer
Weisberg, S. (2005)
Applied Linear Regression (3rd ed.)
Hoboken, NJ: Wiley-Interscience
Summer School
Introduction to R
Introduction to Inferential Statistics: What you need to know before you take regression
Winter School
Introduction to R
Introduction to Statistics for Political and Social Scientists
Summer School
Introduction to General Linear Models: Binary, Ordered and Multinomial Logistic, and Count Regression
Winter School
Interpreting Binary Logistic Regression Models