Your subscription could not be saved. Please try again.

Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

* Provide your email address to subscribe. For e.g abc@xyz.com

I agree to receive your newsletters and accept the data privacy statement.

You may unsubscribe at any time using the link in our newsletter.

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Methods of Modern Causal Analysis Based on Observational Data

Course Dates and Times

Monday 17 – Friday 21 February 2019, 09:00–12:30
15 hours over five days

Michael Gebel

michael.gebel@uni-bamberg.de

Institution: University of Bamberg

Estimating causal effects is a central aim of quantitative empirical analysis in social sciences.

Social scientists, however, often have to rely on non-experimental data, which suffer from the problems of self-selection based on (unobserved) heterogeneity and effect heterogeneity.

Regression models try to account for these problems by controlling for observable variables, which, however, is often not sufficient.

In recent social science literature, methods of modern causal analysis have become more and more popular. These methods build on clear concepts of causality – potential outcomes and directed acyclic graphs (DAGs) – and try to account for the above-mentioned problems in a rigorous way.

This course provides an introduction into methods of causal inference such as Propensity Score Matching (PSM), Instrument Variables (IV), and Difference-in-Differences (DID) approaches in cross-sectional and longitudinal designs.

We will discuss empirical examples and apply methods in computer exercises using Stata and real-world data.

Tasks for ECTS Credits

2 credits (pass/fail grade) Attend at least 90% of course hours, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.

3 credits (to be graded) As above, plus complete one task (tbc).

4 credits (to be graded) As above, plus complete two tasks (tbc).

Instructor Bio

Michael Gebel is Full Professor of Methods of Empirical Social Research, at the University of Bamberg.

He graduated in economics and social sciences, and earned his doctoral degree in sociology at the University of Mannheim.

Michael has received a European Research Council (ERC) Starting Grant for the project The socio-economic consequences of temporary employment: A comparative panel data analysis (SECCOPA) for the period 2018–23.

His specific research interests include international comparative research, longitudinal data analysis and methods of modern causal analysis.

@gebel_michael

The central aim of this course is to empower you to think about causality and to apply new tools of modern causal analysis in your own research.

Experiments tend to be seen as the gold standard for drawing causal inference because of the manipulation of the treatment and the random assignment to the treatment groups. But they have several potential pitfalls that pose threats to the internal and external validity.

For ethical and practical reasons, experiments are often not feasible in social sciences. This course will therefore focus on modern methods of causal inference based on non-experimental data.

The course is structured around four key topics:

I present the general idea of causality based on the potential outcome framework and directed acyclic graphs (DAGs).
I introduce linear regression and propensity score matching (PSM) as methods of modern causal analysis for cross-sectional data that rely on the crucial assumption of selection on observed variables.
I present instrumental variables (IV) estimators that can deal with the problem of selection on unobserved variables.
We discuss basic and advanced topics of the fixed-effect logic and difference-in-differences (DID) approaches that use the benefits of longitudinal data.

Rather than get lost in the details of mathematical proofs and philosophical debates, the course offers an applied introduction and hands-on experience in lab sessions.

We will discuss the strengths and limitations of each approach, and I will illustrate them using examples from the social science literature.

I Causality, Counterfactuals and Causal Graphs

How can we define causality in social science research? This course starts with one of the most important, basic questions of the philosophy of science.

You will learn to distinguish different kinds of causal hypotheses and reflect on the basic conditions considered important for making causal claims.

Researchers often try to avoid making causal inferences, but based on practical examples from applied research, I will show that any serious research hypothesis postulates explicitly or, at least, implicitly, a causal relationship between two or more variables.

The first class will introduce Rubin’s notation of potential outcomes, which has become the backbone of modern causal analysis in social sciences because it clearly defines different types of causal effects (the ATT, ATNT and ATE) and allows for causal effect heterogeneity.

Based on this model you will also learn how to pose properly formulated questions about causal effects. But you will also critically discuss the basic assumptions of counterfactuals, manipulability and the stable unit treatment value assumption (SUTVA).

Directed acyclic graphs (DAGs) offer an illustrative graphical approach to the problem of causal inference. We will use them to clarify the crucial difference between (self-)selection processes into the X-variable of interest based on observed variables versus unobserved variables.

We will discuss advanced topics of endogenous selection bias, common-cause confounding and overcontrol bias in the framework of DAGs.

II Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM)

The most common non-experimental method is multiple linear regression (MLR) analysis. Its strategy is to condition on observable confounding variables in order to disentangle the causal effect of X on Y.

However, in practice, researchers often control for the wrong variables and neglect important control variables because they are not aware of modern causal analysis.

Based on the principles of DAGs, you will learn how to select the right control variables in a linear regression based on examples of applied research.

You will then apply this knowledge about counterfactuals to understand linear regression and its potential biases in the notation of potential outcomes.

Propensity Score Matching (PSM) is an alternative method to condition on selection on observables, which has several advantages over MLR.

Applying your knowledge on the counterfactual model and DAGs, you will learn how to implement the different steps of PSM.

I will explain how to estimate propensity scores, how to implement and choose between different matching and estimation options and how to test whether PSM succeeded in balancing the observed control variables. We will apply the different steps based on real-world data in a computer lab session. We will then briefly discuss differences between the PSM approach and other matching approaches.

We conclude with an outlook on model extensions such as multiple treatments or sensitivity analysis in terms of Rosenbaum bounds.

III Selection on Unobservables: The Instrumental Variable (IV) Approach

MLR and PSM will produce biased estimates of causal effects if there is selection into the X-variable of interest based on unobserved factors.

Instrumental variables (IV) are seen as a solution to this problem. Its underlying identification strategy is to find an instrumental variable that is correlated with the X-variable of interest without having an independent effect on the outcome Y-variable.

This course introduces IV estimators based on the counterfactual model and DAGs. You will critically discuss examples of instrumental variables from the applied research to understand the problems of IV estimators (weak instruments, violation of exclusion restrictions etc).

This course will move beyond the classical textbooks on IV by introducing Angrist’s alternative interpretation of the IV estimator as identifying a Local Average Treatment Effect (LATE) in the context of heterogeneous effects, which is more reasonable for applications in social sciences.

In a lab session you will learn how to implement IV estimators in Stata based on real-world data, and how to interpret the results.

We will place special emphasis on critical discussion of the validity of IV approach assumptions in the context of practical research examples.

IV Using Longitudinal Data: The Fixed-Effect Logic and the Difference-in-Differences (DID) Approach

When prospective or retrospective longitudinal data, i.e. repeated measurements of the outcome Y-variable, are available, we can use other approaches to deal with the problem of selection on observable and unobservable variables.

The true strength of longitudinal data is that it allows us to observe the outcomes of the same observational unit over time.

In contrast to the random effect approach, we can easily remove time-constant observed and unobserved characteristics of the observational unit when applying the logic of before-after or fixed-effect panel estimators, and we can model anticipation effects and the impact function of the treatment.

The difference-in-differences (DID) approach combines this fixed-effect logic with a control group comparison.

Comparing time trends in the outcome Y-variable in the so-called treatment and control group allows for eliminating not only time-constant individual effects but also common time trends.

You will learn how to apply the DID estimator in a linear regression design, and how to combine the DID estimator with PSM to construct the control group in an innovative way.

You will apply your knowledge of DAGs in the longitudinal context to discuss advanced topics such as the potential biases induced by introducing a lagged dependent variable.

In a lab session you will learn how to implement DID-regression and DID-PSM estimators in Stata based on real-world data, and how to interpret the results.

You should be familiar with the basics of statistics and multiple linear regression analysis.

Empirical applications will be implemented in Stata, so you should have a basic knowledge of Stata.

Prepared datasets and full syntax codes will be provided.

Day	Topic	Details
Day 1	(I) Causality, Counterfactuals and Causal Graphs 1. Posing causal research questions 2. The counterfactual model of causality 3. Introduction to directed acyclic graphs (DAG) and advanced topics	90min lecture/discussions + 90min lecture/discussions For detailed description see long course outline
Day 2	(II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM) 1. MLR and directed acyclic graphs (DAG) 2. MLR and potential outcomes 3. PSM: Basic assumptions 4. PSM: Matching algorithms 5. PSM: Balancing tests 6. PSM: Advanced topics	90min lecture/discussions + 90min lecture/discussions For detailed description see long course outline
Day 3	(II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM) (continued) 7. PSM: Practical implementation and empirical examples in Stata (III) Selection on Observables: The Instrument Variable (IV) Approach 1. IV: The classical IV estimator 2. IV: The modern LATE interpretation	90min lecture/discussions + 90min lab session For detailed description see long course outline
Day 4	(III) Selection on Observables: The Instrument Variable (IV) Approach (continued) 3. IV: Practical implementation and empirical examples in Stata (IV) Using Longitudinal Data: The Fixed-Effect Logic and the Difference-in-Differences (DID) Approach 1. The benefits of longitudinal data 2. The before-after/fixed-effect estimator 3. The difference-in-differences (DID) estimator	90min lecture/discussions + 90min lab session For detailed description see long course outline
Day 5	(IV) Using Longitudinal Data: The Fixed-Effect Logic and the Difference-in-Differences (DID) Approach (continued) 4. DID combined with PSM 5. Modelling anticipation effects and the impact function of the treatment 6. Advanced topics of DAGs in longitudinal design 7. DID and DID-PSM: Practical implementation and empirical examples in Stata 5. DID and DID-PSM: Practical implementation and empirical examples in Stata	90min lecture/discussions + 90min lab session For detailed description see long course outline

Day	Readings
Day 1	Elwert / Winship (2014) Morgan / Winship (2015) Ch2 'Counterfactuals and the Potential Outcome Model' only pp. 37–62, Ch3 'Causal Graphs' only pp.77–84 Keele (2015) Hernán (2018)
Day 2	Gangl (2014) Caliendo / Kopeinig (2008) Ho / Imai / King / Stuart (2007)
Day 3	Muller et al (2014) Angrist et al (1996)
Day 4	Brüderl / Ludwig (2015) Gangl (2010) Ch5.3 'Fixed-Effects and Difference-in-Differences Estimators'
Day 5	Lechner (2011)
	Details on reference see 11. Literature

Software Requirements

Stata version 15 (or higher)

Hardware Requirements

No specific requirements

Literature

Abadie, A. and M. Cattaneo (2018). Econometric methods for program evaluation. Annual Review of Economics 10: 465–503

Angrist, J., G. Imbens and D. Rubin (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444-455

Bertrand, M., E. Duflo and S. Mullainathan (2004). How much should we trust differences-in-differences estimates. Quarterly Journal of Economics, 249-275

Bound, J., D. Jaeger and R. Baker (1995)
Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak
Journal of the American Statistical Association 90, 443–50

Brüderl, Josef and Volker Ludwig (2015)
Fixed-effects panel regression
In: Best, H. and C. Wolf (Eds) The SAGE Handbook of regression analysis and causal inference, pp. 327–357
London: Sage Publications

Caliendo, M. and S. Kopeinig (2008)
Some practical guidance for the implementation of propensity score matching
Journal of Economic Surveys 22, 31–72

Elwert, F. and C. Winship (2014)
Endogenous Selection Bias: The problem of conditioning on a collider variable
Annual Review of Sociology, 40:31–53

Gangl, M. (2010)
Causal inference in sociological research
Annual Review of Sociology 36, 21–47

Gangl, M. (2014)
Matching estimators for treatment effects
In: Best, H. and C. Wolf (Eds) The SAGE Handbook of regression analysis and causal inference, pp. 251–276
London: Sage Publications

Heckman, J. and S. Navarro-Lozano (2004)
Using matching, instrumental variables, and control functions to estimate economic choice models
Review of Economics and Statistics 86, 30–57

Hernán, M. (2018)
The c-word: Scientific euphemisms do not improve causal inference from observational data
American Journal of Public Health 108(5): 616–619

Ho, D., Imai, K, King, G. and E. Stuart (2007)
Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference
Political Analysis 15, 199–236

Holland, P. (1986)
Statistics and causal inference
Journal of American Statistical Association 81, 945–960

Imbens, G. and D. Rubin (2015)
Causal inference for statistics, social, and biomedical sciences Ch.1 'Causality: The basic framework'

Keele, L. (2015)
The statistics of causal inference: A view from political methodology
Political Analysis, 23, 313–335

Lechner, M. (2011)
The estimation of causal effects by difference-in-difference methods
Foundations and Trends in Econometrics, 4(3), 165–224

Morgan, S. and C. Winship (2015)
Counterfactuals and causal inference Ch.2 'Counterfactuals and the Potential Outcome Model'
Cambridge: Cambridge University Press

Morgan, S. and D. Harding (2006)
Matching estimators of causal effects: Prospects and pitfalls in theory and practice
Sociological Methods and Research 35, 3–60

Muller, C., Winship, C. and S. Morgan (2014)
Instrumental variables regression
In: Best, H. and C. Wolf (Eds) The SAGE Handbook of regression analysis and causal inference, pp. 251–276
London: Sage Publications

Sekhon, J. (2009)
Opiates for the matches: Matching methods for causal inference
Annual Review of Political Science 12, 487–508

Sovey, A and D. Green (2010)
Instrumental variables estimation in political science: A readers’ guide
American Journal of Political Science, 55(1), 188–200

Stuart, E. (2010)
Matching methods for causal inference: A review and a look forward
Statistical Science 25, 1–21

Wooldridge, J. (2015)
Control function methods in applied econometrics
Journal of Human Resources 50: 420–445

Recommended Courses to Cover Before this One

Summer School

Introduction to STATA

Multiple Regression Analysis: Estimation, Diagnostics, and Modelling

Linear Regression with R/Stata: Estimation, Interpretation and Presentation

Winter School