Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”


Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Causal Inference in the Social Sciences II: Difference in Difference, Regression Discontinuity and Instruments

Course Dates and Times

Monday 5 – Friday 9 August 

09:00–10:30 / 11:00–12:30


Dániel Horn

Eötvös Loránd University

This course is strongly linked to Program Evaluation and Impact Assessment (SB112A), which is the basis for this one. However, the two courses can be taken separately.

The course will introduce you to methods of causal inference in the social sciences. The objective is to learn how statistical methods can help us draw causal claims about phenomena of interest.

By the end of the course, you will be able to

  • critically evaluate statements about causal relationships based on some analysis of data
  • apply a variety of design-based easy-to-implement methods that will help you draw causal inferences in your own research

One of the keys goals of empirical research is to test causal hypotheses. This task is notoriously difficult without the luxury of experimental data. This course will introduce you into methods that allow you to make convincing causal claims without working with experimental data.

By the end of the course, you will know how to estimate causal effects using the following designs:

  1. Matching
  2. Instrumental Variables
  3. Regression Discontinuity Design
  4. Difference-in-Differences

You can only learn statistics by doing statistics. This is why this course includes a laboratory component, where you will learn to apply these techniques to the analysis of discipline-specific data.

ECTS Credits for this course and below, tasks for additional credits:

2 credits Complete the readings and take an active part in the course.

3 credits As above, plus replicate two papers:

  • one that uses an Instrumental Variables and a Difference-In-Difference Design, due Thursday morning
  • one that uses a Regression Discontinuity Design, due Friday morning.

4 credits As above, plus complete a second paper that uses an Instrumental Variables and a Difference-In-Difference Design. This should be emailed to the Instructor by Saturday morning.

Instructor Bio

Dániel Horn is a research fellow at the Centre for Economic and Regional Studies of the Hungarian Academy of Sciences, and associate professor at the department of Economics, Eötvös Loránd University in Budapest.

Besides economics courses he has taught statistics, introduction to Stata and different public policy design and evaluation courses for over five years at PhD, MA and Bachelor levels.

He has been conducting educational impact assessment for over a decade. His research areas include education economics, social stratification and educational measurement issues.


‘I used to think correlation implied causation. Then I took a statistics class. Now I don’t.’

‘Sounds like the class helped!’

‘Well, maybe.’

This course is a thorough introduction to the most widely used methods of causal inference. It is an applied methods course, with equal emphasis on methods and their applications.

Do hospitals make people healthier?

Is it a problem that more people die in hospitals than in bars?

Does an additional year of schooling increase future earnings?

Do parties that enter the parliament enjoy vote gains in subsequent elections?

The answers to these questions (and many others which affect our daily life) involve the identification and measurement of causal links: an old problem in philosophy and statistics. To address this problem we either use experiments or try to mimic them by collecting information on potential factors that may affect both treatment assignment and potential outcomes.

Customary ways of doing this in the past entailed the specification of sophisticated versions of multivariate regressions. However, it is by now well understood that causality can only be dealt with during the design, not during the estimation process.

The goal of this course is to familiarise you with the logic of casual inference, the underlying theory behind it, and to introduce research methods that help you approach experimental benchmarks with observational data. This will be a much-applied course, giving you ideas for strong research designs in your own work, and the knowledge to derive and interpret causal estimates based on these designs.

We start by discussing the fundamental problem of causal inference. After that, I introduce the potential outcomes framework (covered thoroughly in SB112A). I then illustrate how the selection problem creates bias in naïve estimators of such quantities using observational data and we will see how randomisation solves the problem of selection bias. The next steps will be dedicated into the methods through which we can approach the experimental benchmark using observational data.

The first method we will examine is matching. Although matching is not itself a design of causal inference but a family of techniques to ensure balance on a series of observables (and is thus based on the conditional-on-observables, aka. uncounfoundedness assumption), it is very useful as a first application to the potential outcomes language. We will discuss the logic behind matching, its identification assumptions and see how it differs from standard regression methods.

After matching we will switch to the three designs. We start with instrumental variables (IVs), motivating the discussion with causal diagrams. We then employ a running example to help us first unpack the identification assumptions upon which IVs can deliver unbiased causal estimates. We then focus on estimation issues and applications. We look at the Wald estimator and its covariate extension, i.e. the 2SLS estimator.

Next we examine the regression discontinuity design (RD), motivating the discussion with examples from various subfields in sociology, political science and economics to help you grasp the intuition behind the design. Then we move to the clarification of the assumptions upon which identification is based: under what assumptions does the RD generate unbiased causal estimators? Moreover, which causal quantity of interest is estimated? I then spend time explaining how exactly these effects can be estimated. I will cover parametric and non-parametric estimation. We will discuss inference, using also robust confidence intervals for the point estimates. We go through the procedure through which the bandwidth for the RD analysis is chosen. An important next step in this design is to discuss the plethora of robustness checks one needs to do when using the RD. Before moving to the lab applications, we will also look how the fuzzy RD operates. We will see the extra assumption needed for this design and look at examples to gauge the key intuition. Estimation with a fuzzy RD will be also discussed.

The last estimator we focus on is the difference-in-differences estimator (DiD). After explaining the logic of the method, we see the key assumption needed to identify causal effects through the DD estimator. We then look at how you can estimate these effects using a variety of designs, with two groups and with multiple groups. We then go back and discuss the parallel trends assumption in more detail, showing under what conditions one can examine whether it holds or not. We also look at an extension of this design, namely the difference-in-differences-in-differences estimator. Numerous hands-on applications will be covered and one of them used as the main example for our applied session in the lab.

Lab sessions will draw on the discussion related to each of the three designs. All exercises will be covered in STATA in class, but I will also provide codes for R. It is worth emphasising that this is not a software-intense course. I will not spend much time teaching you how STATA works in general (and especially not R). Full code will be provided so we can focus on the analysis. That said, if you already use either of the two programs, after this course you will be in position to implement your analyses using any of the techniques covered in the course.

Solid knowledge of statistics and regression analysis at undergraduate level.

Some knowledge of Stata (or R)

Familiarity with the OLS regression estimator.

We will be working mainly in Stata. Full code will be provided.

No knowledge of any specific software is required, but some knowledge of Stata is useful.

Day Topic Details
1 Session 1: Introduction to the Potential Outcomes Framework (recap of SB112A) Session 2: Matching: Intuition, Estimation, Applications


2 Session 1: Matching Lab Session Session 2: Instrumental Variables: Identification & Estimation Wald Estimator, 2SLS Estimator, Estimator)

Session 1: Lab

Session 2: Lecture

3 Session 1: IV Lab Session 2: Regression Discontinuity Design (Motivation, Identification, Estimation Strategies)

Session 1: Lab

Session 2: Lecture

4 Session 1: RD Lab Session 2: Difference-in-Differences (Motivation, Identification, Estimation)

Session 1: Lab

Session 2: Lecture

5 Session 1: Dif-in-Dif Lab Session 2: Extensions, Discussion with participants on how they could use these methods in their own research

Session 1: Lab

Session 2: Student presentations

Day Readings

Angrist, Joshua and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion, Princeton: Princeton University Press. Chapters 1 & 4.

Morgan Stephen L. and Christopher Winship. 2007. Counterfactuals and Causal Inference: Methods and Principles for Social Research, Cambridge: Cambridge University Press.  Chapters 1, 2 & 7.


Sovey, Allison & Donald Green. 2010. “Instrumental Variables Estimation in Political Science: A 
Readers’ Guide.” American Journal of Political Science, 55(1): 188-200.

Imbens, Guido. 2014. Instrumental Variables: An Econometrician’s Perspective. NBER Working Paper # 19983.


Angrist & Pischke Ch. 6

Imbens, Guido W., and Thomas Lemieux. "Regression discontinuity designs: A guide to practice." Journal of econometrics 142.2 (2008): 615-635.

Lee, David S., and Thomas Lemieux. Regression discontinuity designs in economics. No. w14723. National Bureau of Economic Research, 2009.


Angrist & Pischke Ch. 5

Pischke 2007. The Impact of Length of the School Year on Student Performance and Earnings: Evidence from the German Short School Years. The Economic Journal Vol 117 No. 523.



If you have an idea and/or data where one of the methods can be applied, prepare a 5-minute presentation of how you think of implementing your analysis. Discuss potential threats and how you plan to address them.

(If you really do not have a project in mind, choose any of the readings from the section titled Applications and prepare a 5-minute presentation discussing the motivation of the paper and pointing to its identification strategy.)

Software Requirements

We will use Stata extensively throughout the course.

Hardware Requirements

Please bring your own laptop with Stata (or R) installed.



Impact Evaluation in Practice (World Bank, 2011)

Bound, John, David A. Jaeger, and Regina M. Baker. 1995. “Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogeneous Explanatory Variable Is Weak.” Journal of the American Statistical Association 90(430):443–50.

Caliendo, Marco and Sabine Kopeinig. 2005. Some Practical Guidance for the Implementation of Propensity Score Matching. Institute for the Study of Labor (IZA).

Jacob, Robin, Pei Zhu, Marie-Andrée Somers, and Howard Bloom. 2012. “A Practical Guide to Regression Discontinuity.” mdrc. Retrieved March 5, 2017 (



Angrist, Joshua D. and Victor Lavy. 2001. “Does Teacher Training Affect Pupil Learning? Evidence from Matched Comparisons in Jerusalem Public Schools.” Journal of Labor Economics 19(2):343–69.

Dolton, Peter and Jeffrey A. Smith. 2011. The Impact of the UK New Deal for Lone Parents on Benefit Receipt. Rochester, NY: Social Science Research Network.

Instrumental Variables

Angrist, J. D. and A. B. Krueger. 1991. “Does Compulsory School Attendance Affect Schooling and Earnings?” The Quarterly Journal of Economics 106(4):979–1014.

Angrist, Joshua D. 1998. “Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants.” Econometrica 66(2):249–88.

Angrist, Joshua D. and Victor Lavy. 1999. “Using Maimonides’ Rule to Estimate the Effect of Class Size on Scholastic Achievement.” The Quarterly Journal of Economics 114(2):533–75.

Regression Discontinuity Design

Lalive, Rafael. 2008. “How Do Extended Benefits Affect Unemployment Duration? A Regression Discontinuity Approach.” Journal of Econometrics 142(2):785–806.

Ludwig, Jens and Douglas L. Miller. 2007. “Does Head Start Improve Children’s Life Chances? Evidence from a Regression Discontinuity Design.” The Quarterly Journal of Economics 122(1):159–208.


Borjas, George J. 2015. The Wage Impact of the Marielitos: A Reappraisal. National Bureau of Economic Research.

Card, David. 1990. “The Impact of the Mariel Boatlift on the Miami Labor Market.” ILR Review 43(2):245–57.

Card, David and Alan B. Krueger. 1994. “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania.” American Economic Review 84(4):772–93.

Card, David and Alan B. Krueger. 2000. “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania: Reply.” American Economic Review 90(5):1397–1420.

Pischke, Jörn-Steffen. 2007. “The Impact of Length of the School Year on Student Performance and Earnings: Evidence From the German Short School Years*.” The Economic Journal 117(523):1216–42.

Recommended Courses to Cover Before this One

Causal Inference in the Social Sciences I

Introduction to Stata

Multiple Regression Analysis: Estimation, Diagnostics, and Modelling