Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

SB107B - Causal Inference in the Social Sciences

Instructor Details

Instructor Photo

Elias Dinas

European University Institute

Instructor Bio

Course Dates and Times

Monday 8 to Friday 12 August 2016
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days

Prerequisite Knowledge

Participants are expected to be familiar with the OLS regression estimator. Although the notions of unconditional and conditional expectation will be introduced in the first session, prior exposure to these terms would be helpful in fully grasping the potential outcomes notation. We will be working mainly in Stata, but also in R. Full code will be provided. No perquisite knowledge of any specific software is requested.

Short Outline

The course will introduce participants to an authoritative framework of causal inference in the social sciences. The objective is to learn how statistical methods can help us draw causal claims about phenomena of interest. By the end of the course, participants will be in position to 1) critically evaluate statements about causal relationships based on some analysis of data; 2) apply a variety of design-based easy-to-implement methods that will help them draw causal inferences in their own research.

One of the keys goals of empirical research is to test causal hypotheses. This task is notoriously difficult without the luxury of experimental data. This course will introduce you into methods that allow you to make convincing causal claims without working with experimental data. By the end of the course, you will know how to estimate causal effects using the following designs:

  1. Matching
  2. Instrumental Variables
  3. Regression Discontinuity Design
  4. Difference-in-Differences

You can only learn statistics by doing statistics. This is why this course includes a laboratory component, where you will learn to apply these techniques to the analysis of discipline specific data.

Long Course Outline

Whenever, looking at my watch, I see the hand has reached the figure X, I hear the bells beginning to ring in the church close by. But from the fact that the watch hands point to ten whenever the bells begin to ring, I have not the right to infer that the position of the hands of my watch is the cause of the vibration of the bells.

Leo Tolstoy, War and Peace, trans. Constance Garnett

(New York: Modern Library Classics, 2002), p. 939.


Do hospitals make people healthier? Is it a problem that more people die in hospitals than in bars? Does an additional year of schooling increase future earnings? Do parties that enter the parliament enjoy vote gains in subsequent elections? The answers to these questions (and many others which affect our daily life) involve the identification and measurement of causal links: an old problem in philosophy and statistics. To address this problem we either use experiments or try to mimic them by collecting information on potential factors that may affect both treatment assignment and potential outcomes. Customary ways of doing this in the past entailed the specification of sophisticated versions of multivariate regressions. However, it is by now well understood that causality can only be dealt with during the design, not during the estimation process. The goal of this workshop is to familiarize participants with the logic of casual inference, the underlying theory behind it and introduce research methods that help us approach experimental benchmarks with observational data. Hence, this will be a much-applied course, which aims at providing participants with ideas for strong research designs in their own work and with the knowledge of how to derive and interpret causal estimates based on these designs.

We will start by discussing the fundamental problem of causal inference. After that, we will introduce the potential outcomes framework, within which we will examine closely at the most important quantities of interest, i.e. different types of causal effects (ATE, ATT, ATC). We will then illustrate how the selection problem creates bias in naïve estimators of such quantities using observational data and we will see how how randomization solves the problem of selection bias. The next steps will be dedicated into the methods through which we can approach the experimental benchmark using observational data.


The first method we will examine is matching. Although matching is not itself a design of causal inference but a family of techniques to ensure balance on a series of observables (and is thus based on the conditional-on-observables assumption), it is very useful as a first application to the potential outcomes language. We will discuss the logic behind matching, its identification assumptions and we will see how it differs from standard regression methods. We will learn various different balancing methods but we will mainly focus on two of them, genetic matching and entropy balancing.


After matching we will switch to the three designs. We will start with instrumental variables. We will motivate the discussion with the use of causal diagrams. We will then employ a running example which will help us to first unpack the identification assumptions upon which IVs can deliver unbiased causal estimates. We will then focus on estimation issues and on applications. We will see both the Wald estimator and its covariate extension, i.e. the 2SLS estimator. As a way of extension, we will also inform the compliers and introduce a flexible estimator, the Local Average Response Function (LARF). The LARF estimator will allow us to relax the constant treatment effects assumption when including covariates in both stages of the IV estimation.


The next design we will examine is the regression discontinuity design (RD). We will motivate the discussion with a series of examples from various subfields in sociology, political science and economics. These examples will help everyone grasp the intuition behind the design. Then we will move to the clarification of the assumptions upon which identification is based: under what assumptions does the RD generate unbiased causal estimators? Moreover, which causal quantity of interest is estimated? Once addressing this question, we will spend a lot of time explaining how exactly these effects can be estimated. We will cover both parametric and non-parametric estimation. We will discuss inference, using also robust confidence intervals for the point estimates. Special attention will be given to the procedure through which the bandwidth for the RD analysis is chosen. An important next step in this design is to discuss the plethora of robustness checks one needs to do when using the RD. Before moving to the lab applications, we will also look how the fuzzy RD operates. We will see the extra assumption needed for this design and we will look at several examples to gauge the key intuition. Estimation with a fuzz RD will be also discussed.


The last estimator we will focus on is the difference-in-differences estimator (DD). We will motivate the discussion with a real-world example of how and whether the Atocha attack affected the Spanish general election in 2004. After explaining the logic of the method, we will see the key assumption needed to identify causal effects through the difference-in-differences estimator. We will then look at how you can estimate these effects, using a variety of designs, both with two groups and with multiple groups. We will then go back and discuss the parallel trends assumption in more detail, showing under what conditions one can examine whether it holds or not. We will also look at an extension of this design, namely the difference-in-differences-in-differences estimator. Numerous hands-on applications will be covered and one of them will be used as the main example for our applied session in the lab. Finally, we will very briefly introduce the Synthetic Control Method to discuss best practices when using the DD method without a good control unit.


The lab sessions will draw on the discussion related to each of the three designs. IVs will be covered in STATA. The LARF estimator will be shown in R. In the RD session, we will look at an example of a sharp and one example of a fuzzy design. We will use both R and STATA for the local linear regression estimator. Polynomial estimation will be shown only in STATA. The bandwidth selection process will also be presented both in STATA and in R. Finally, the difference-in-differences estimator will be shown both in STATA and in R. It is worth emphasizing that this is not a software-intense course. I will not spend much time teaching you how STATA and R work in general. This is why full code will be provided so that we can focus on the analysis. That said, if you already use either of the two programs, after this course you will be in position to implement your analyses using any of the techniques that covered in the course.

Day-to-Day Schedule

Day-to-Day Reading List

Software Requirements

We will use extensively STATA and R throughout the course.

We will use two packages in R (“ebalance”, “genmatch”) in R and two add-on programs in STATA: “rdrobust” & “ebalance”

Hardware Requirements

Participants may bring their own laptops but the course will be partially taught in a computer lab.



Abadie, Alberto, Diamon, Alexis and Jens Hainmueller. 2009. Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program. Journal of the American Statistical Association.

Angrist, Joshua, Guido Imbens, and Donal Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association, 91(434):444-55.


Abadie, Alberto. 2003. Semiparametric Instrumental Variable Estimation of Treatment Response Models.Journal of Econometrics, 113: 231-63.

Calonico, Sebastian, Matias Cattaneo & Rocio Titiunik. 2014. Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs. Unpublished Manuscript.

Diamond and Sekhon 2013, “Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics & Statistics. 

Green, Donald P., et al. 2009. “Testing the accuracy of regression discontinuity analysis using experimental benchmarks.” Political Analysis 17(4): 400-417.

Imbens, Guido & Karthik Kalyanaraman. 2010. Optimal Bandwidth Choice for the Regression Discon- tinuity Estimator. Cemmap Working Paper, # CWP05/10.

McCrary, Justin. 2008. “Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test.” Journal of Econometrics, 142(2): 698-714.


Abadie, Alberto and Javier Gardeazabal. 2003. The Economic Costs of Conflict: a Case-Control Study for the Basque Country. American Economic Review 92 (1).

Card, David and Alan B. Krueger. 1994. Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania. The American Economic Review 84: 772-793.

Caughey, Devin, and Jasjeet S. Sekhon. "Elections and the regression discontinuity design: Lessons from close us house races, 1942–2008." Political Analysis 19.4 (2011): 385-408.

Dal Bó, Ernesto, Pedro Dal Bó, and Jason Snyder. "Political dynasties." The Review of Economic Studies 76.1 (2009): 115-142.

Dinas, Elias. "Does choice bring loyalty? Electoral participation and the development of party identification." American Journal of Political Science 58.2 (2014): 449-465.


Eggers, Andrew C. "Proportionality and Turnout: Evidence from French Municipalities." Comparative Political Studies, Forthcoming (2014).

Eggers, Andrew, et al. "On the validity of the regression discontinuity design for estimating electoral effects: New evidence from over 40,000 close races." Formerly MIT Political Science Department Working Paper Series 2013-26 (2014).

Hainmueller, Jens, and Holger Lutz Kern. "Incumbency as a source of spillover effects in mixed electoral systems: Evidence from a regression-discontinuity design." Electoral Studies 27.2 (2008): 213-227.

Kern, Holger Lutz, and Jens Hainmueller. "Opium for the masses: How foreign media can stabilize authoritarian regimes." Political Analysis (2009): mpp017.

Montalvo, José. 2011. “Voting after the Bombings: A Natural Experiment on the Effect of Terrorist Attacks on Democratic Elections.” Review of Economics and Statistics, 93(4):1146-1154.

Rosenbaum, Paul R. 2002. Observational Studies. New York: Springer-Verlag 2nd edition.

The following other ECPR Methods School courses could be useful in combination with this one in a ‘training track .
Recommended Courses Before

Research Designs

Multiple Regression Analysis and Generalized Linear Modeling

Multiple Regression Analysis: Estimation, Diagnostics and Modeling

Recommended Courses After

Advanced Topics in Applied Regression

Additional Information


This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Convenors

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.

Share this page