Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Constantin Manuel Bosancianu is a postdoctoral researcher in the Institutions and Political Inequality unit at Wissenschaftszentrum Berlin.
His work focuses on the intersection of political economy and electoral behaviour: how to measure political inequalities between citizens of developed and developing countries, and what the linkages between political and economic inequalities are.
He is interested in statistics, data visualisation, and the history of Leftist parties. Occasionally, he teaches methods workshops on regression, multilevel modelling, or R.
Monday 25 February – Friday 1 March, 14:00 –17:30 (finishing slightly earlier on Friday)
15 hours over 5 days
You should have a thorough understanding of basic statistical concepts such as mean, median, variance, standard deviation and standard error.
You should also be familiar with very basic statistical tests and analyses, such as t tests and ANOVA, at least at a theoretical level.
The class will be carried out primarily in R, but with Stata examples and scripts. You should have basic knowledge of at least one of these software packages for reading in data, basic data recoding skills, and very basic plotting commands, as well as basic familiarity with working with syntax files.
This course will teach you the rigorous application of linear regression models. We will estimate these models, interpret their results and judge how well the models fit the data.
We will gradually explore more complex specifications, learning how to deal with dichotomous predictors and interactions. We also focus on the assumptions on which OLS models are based, how to check for these in the data at hand, and how to handle situations when they are not met.
Throughout the course, we emphasise presenting results as intuitively as possible, either through graphs or predicted values. This format should serve those interested in a thorough coverage of linear models, for immediate use and as a stepping stone to more advanced statistical procedures.
The class will be conducted primarily in R, but I will also share Stata code for all procedures and models.
Tasks for ECTS Credits
2 credits (pass/fail grade) Attend at least 90% of course hours, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.
3 credits (to be graded) As above, plus complete a take-home assignment, sent out on Tuesday 26 February, with a Thursday afternoon deadline. I will provide you with some data, along with a few model specifications which need to be estimated with this data. I expect you to interpret a few coefficients from these models, and their uncertainty, and to make a few qualitive decisions as to which model is the best and what recommendations you can make based on the results.
4 credits (to be graded) As above, plus complete a final paper resembling a conference paper, with the exception that the literature review section can be just 2–3 paragraphs where you present the puzzle. Identify a few hypotheses you are interested in testing, and test them based on data of your choosing.
The main parts I am interested in are the variable description (not more than a couple of pages), the analyses, and the interpretation of the results. You will be assessed on:
The deadline for this assignment will be 15–20 days after the end of the Winter School. For both assignments, I will provide more details about the tasks and requirements during the course itself.
It is frequently quipped that regression is the most used and abused method in the social sciences. My goal in this class is to expose you to the abuse-free application of linear regression models to social science data.
By the end of the course, you will have all the required theoretical and practical skills to responsibly run multivariate linear regressions to a variety of data configurations. This includes estimating multiple model specifications in R or Stata, presenting results in tables or, in a graphical format, and interpreting the coefficients for the reader. It also implies assessing the appropriateness of OLS regression for certain kinds of data distributions, and learning to make suitable corrections and adjustments when there is a mismatch between model requirements and data characteristics.
The course should appeal most to those who had statistical training at an introductory level as part of their undergraduate studies, and now wish to deepen it with a rigorous coverage of linear models. Although we will not be computing quantities by hand, there will be a few simple formulas as part of the lectures. In this sense, the class is also suitable for those of you who have briefly encountered OLS regression as part of a statistics class, but now wish to better understand how it works, where it breaks down, and how it can be applied in a thorough way. Due to the need to constantly focus on the application of linear models, the class is unsuitable for those who want an introductory course in general statistics. During one of the sessions we briefly cover some basic statistical concepts and tests, but this is only so that we can all delve into the topic of linear models from an equal footing. This cannot be considered a substitute for a good coverage of introductory statistics.
We start with a condensed review of some fundamental concepts in basic statistics: the z and t distributions, hypothesis testing, confidence intervals and correlation. This overview is intended to provide a solid foundation from which to advance in the following days. We begin to discuss a few basics of regression, such as how it goes beyond correlation, and for what type of questions it is helpful. In the lab session, we will go through a few of the basic data manipulation procedures commonly required before running any regression: data cleaning and recoding, transformations of data etc. This is a good opportunity for you to get familiar, if need be, with working with syntax files in R and Stata, and with the Stata or Rstudio interfaces.
We delve fully into the fundamentals of Ordinary Least Squares (OLS) regression: how the estimation is carried out, and how we interpret the coefficients for simple (one predictor) and multiple regression (two or more predictors). I will present some basic formulas, but the goal will be to gain an intuitive understanding of how the estimation process functions, and what the results mean. In the lab session we put this newly-gained knowledge to the test, by running a few examples of linear models in R or Stata. We will interpret the output and the model fit, and generate predictions based on the model, to present effects in an intuitive way.
We advance in our understanding of OLS by tackling uncertainty of estimates, as well as some model specifications that almost always appear in empirical research. In the latter case I refer to dummy indicators (categorical predictors). We will discuss where estimated uncertainty comes from, how it impacts on your results, what influences uncertainty, and how you ought to communicate it to your audience. In the lab component we learn how to run these model specifications with R/Stata.
This is devoted to preventing abuse in the estimation of linear models. As with the vast majority of statistical procedures, a series of assumptions underpins OLS regression. If these are not met we have little reason to put our faith in the results we obtain. In this session we go over these assumptions, how they influence the results when they are not met, and what strategies we have to overcome this situation. In the lab we turn to these issues from a practical perspective. We run a test regression in R, assess whether the assumptions are met, and correct for the assumption violations that exist. Through a step-by-step process, you will see how your estimates and model fit changes when engaging in such a process.
We recap the most important elements in the regression framework, based on participants’ needs and requests. We also delve into the various ways regression results can be presented, depending on the audience: quantitative scholars, policymakers, or a general readership. I conclude with a presentation on how interactions can expand considerably the range of hypotheses you can test with regression. In the lab I will present code for producing tables and graphs with regression estimates, and will demonstrate how an interaction effect between predictors can be set up, estimated, and evaluated.
Throughout the course, we focus on graphical methods and intuitive quantities when presenting results from linear models. I emphasise graphs over tables, predicted values and uncertainty around them, rather than coefficients and standard errors. While tables of coefficients are still the dominant way of presenting results in academic journals, graphs and predicted values tend to be preferred in reports and analyses for larger, non-technical audiences. I believe strongly that you should be familiar with both types, and should tailor the delivery of your results to the audience.
|Day 1||From correlation to regression: revisiting the basics||
We cover a few foundational concepts in statistics: correlation, standard error, t test, t and z distributions. We also make our first forays into the regression setup.
In the lab part, we get familiar with R or Stata, and try a few basic data manipulation and transformation tasks. All of these tasks habitually have to be performed before running a regression.
|Day 2||OLS fundamentals, coefficients, and graphical displays: coefficients and model fit.||
We go through the estimation of OLS models and the interpretation of coefficients for simple and multiple regression.
In the lab session, we run a few regressions in R or Stata, based on the code supplied by the instructor, and go through interpreting coefficients and measures of model fit once more. We also introduce a way of presenting effect sizes based on predictions, using the model estimates.
|Day 3||Dummy variables and uncertainty of estimates.||
We discuss slightly more complex model specifications which include dummy variables. The bulk of the class, though, is devoted to understanding and interpreting uncertainty in our regression estimates.
In the lab, we go through additional regression models, which involve dummies. Most of our empirical efforts, though, will be allocated to understanding where uncertainty in estimates comes from, how we can minimize it, and how we can responsibly present it to the audience.
|Day 4||Regression assumptions: violations and remedies.||
This session covers the assumptions underpinning OLS regression, what the implications of assumption violations are, and how to correct for them.
The lab session will offer practical strategies of identifying assumption violations, and overcoming some of them through data transformations. We also see how estimates and model fit statistics change when correcting for some of these violations.
|Day 5||Recap, multiplicative interactions in regression, and graphical presentations||
In this last session, we review a few of the most important ideas covered in the past four days, based on participants’ requests. I also introduce a way to test more sophisticated hypotheses, about how effects of a predictor vary, through the use of interactions. Finally, I show a few of the ways in which regression results can be presented to the audience, and discuss the strengths and weaknesses of each.
In the lab I show code for interactions, graphical presentations of results, and also allow for a recap of any topics participants feel we should cover again.
(NB: For Moore et al (2009), there is no need to read each of the chapters carefully. Please only focus on the topics that you feel you might need a brush up on. The rest of the topics can be merely skimmed. If the 4 chapters from Moore et al. (2009) seem too intimidating, please at least check the 2 chapters from Field et al. (2012) below. For Fox (2008), focus more on sections 1 and 3 of the chapter – even there, not the sophisticated terms, just the general ideas and logic of the procedure.)
Two primary textbooks are assigned for this course: John Fox’s book, as a more rigorous yet demanding text, and Andy Field and co-authors’ book, as a backup for the situations when Fox’s discussion seems too advanced. I would recommend that you proceed by using Fox’s book, and then if the discussion in certain sections seems too complex, to use Field et al.’s book for clarifications.
R 3.5.2 or any newer version.
Stata 13.1 or any newer version.
RStudio 1.2.747 or any newer version
Any computer or laptop bought within the last 4-5 years should be sufficient. 2 GB of RAM and 200-300 MB of free space on the hard drive are enough for running the tasks we will attempt.
I have tried, as much as possible, to assign chapters from the same textbook, so as to minimize disruptions in logic and in the way the topics are approached. However, if you encounter difficulties in tracking down the literature above, please try the sources below as well. Some are more advanced, though, and choose to present the topic in a more mathematical way.
For assistance with running regressions in R/Sata, please try the following books:
<p><strong>Summer School</strong></p> <p>Introduction to R<br /> Introduction to Stata<br /> <span style="color:#00000a">Introduction to Inferential Statistics: What you need to know before you take regression</span></p> <p><strong>Winter School</strong></p> <p>Introduction to R<br /> Introduction to Stata<br /> Introduction to Statistics for Political and Social Scientists</p>
<p><strong>Summer School</strong></p> <p>Intro to GLM: Binary, Ordered and Multinomial Logistic, and Count Regression</p> <p><strong>Winter School</strong></p> <p>Interpreting Binary Logistic Regression Models</p>
This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.
By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.