ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Methods of Modern Causal Analysis Based on Observational Data

Course Dates and Times

Monday 25 February – Friday 1 March, 09:00–12:30
15 hours over 5 days

Michael Gebel

michael.gebel@uni-bamberg.de

University of Bamberg

Estimating causal effects is a central aim of quantitative empirical analysis in social sciences.

Social scientists, however, often have to rely on non-experimental data, which suffer from the problems of self-selection based on (unobserved) heterogeneity and effect heterogeneity.

Linear regression tries to account for these problems by controlling for observable variables, but these often lead to biased estimates.

In recent social science literature, new methods of modern causal analysis have become more and more popular. These methods build on clear concepts of causality – potential outcomes and directed acyclic graphs (DAGs) – and try to account for the above-mentioned problems in a rigorous way.

This course provides an introduction into new methods such as Propensity Score Matching (PSM), Instrument Variables (IV), and Difference-in-Differences (DID) approaches in cross-sectional and longitudinal designs.

We will discuss empirical examples and apply methods in computer exercises using Stata and real-world data.

Tasks for ECTS Credits

2 credits (pass/fail grade) Attend at least 90% of course hours, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.

3 credits (to be graded) As above, plus complete one task (tbc).

4 credits (to be graded) As above, plus complete two tasks (tbc).


Instructor Bio

Michael Gebel is Full Professor of Methods of Empirical Social Research, at the University of Bamberg.

He graduated in economics and social sciences, and earned his doctoral degree in sociology at the University of Mannheim.

Michael has received a European Research Council (ERC) Starting Grant for the project The socio-economic consequences of temporary employment: A comparative panel data analysis (SECCOPA) for the period 2018–23.

His specific research interests include international comparative research, longitudinal data analysis and methods of modern causal analysis.

 @gebel_michael

The central aim of this course is to empower you to think about causality and to apply new tools of modern causal analysis in your own research.

Experiments tend to be seen as the gold standard for drawing causal inference because of the manipulation of the treatment and the random assignment to the treatment groups. But they have several potential pitfalls that pose threats to the internal and external validity.

For ethical and practical reasons, experiments are often not feasible in social sciences. This course will therefore focus on modern methods of causal inference based on non-experimental data.

The course is structured around four key topics:

  1. I present the general idea of causality based on the potential outcome framework and directed acyclic graphs (DAGs).
  2. I introduce linear regression and propensity score matching (PSM) as methods of modern causal analysis for cross-sectional data that rely on the crucial assumption of selection on observed variables.
  3. I present instrumental variables (IV) estimators that can deal with the problem of selection on unobserved variables.
  4. We discuss basic and advanced topics of the fixed-effect logic and difference-in-differences (DID) approaches that use the benefits of longitudinal data.

Rather than get lost in the details of mathematical proofs and philosophical debates, the course offers an applied introduction and hands-on experience in lab sessions.

We will discuss the strengths and limitations of each approach, and I will illustrate them using examples from the social science literature.

 I Causality, Counterfactuals and Causal Graphs

How can we define causality in social science research? This course starts with one of the most important, basic questions of the philosophy of science.

You will learn to distinguish different kinds of causal hypotheses and reflect on the basic conditions considered important for making causal claims.

Researchers often try to avoid making causal inferences, but based on practical examples from applied research, I will show that any serious research hypothesis postulates explicitly or, at least, implicitly, a causal relationship between two or more variables.

The first class will introduce Rubin’s notation of potential outcomes, which has become the backbone of modern causal analysis in social sciences because it clearly defines different types of causal effects (the ATT, ATNT and ATE) and allows for causal effect heterogeneity.

Based on this model you will also learn how to pose properly formulated questions about causal effects. But you will also critically discuss the basic assumptions of counterfactuals, manipulability and the stable unit treatment value assumption (SUTVA).

Directed acyclic graphs (DAGs) offer an illustrative graphical approach to the problem of causal inference. We will use them to clarify the crucial difference between (self-)selection processes into the X-variable of interest based on observed variables versus unobserved variables.

We will discuss advanced topics of endogenous selection bias, common-cause confounding and overcontrol bias in the framework of DAGs.

II Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM)

The most common non-experimental method is multiple linear regression (MLR) analysis. Its strategy is to condition on observable confounding variables in order to disentangle the causal effect of X on Y.

However, in practice, researchers often control for the wrong variables and neglect important control variables because they are not aware of modern causal analysis.

You will apply your new knowledge about counterfactuals to understand linear regression in the notation of potential outcomes.

Applying the principles of DAGs, you will learn how to select the right control variables in a linear regression based on examples of applied research.

Propensity Score Matching (PSM) is an alternative method to condition on selection on observables, which has several advantages over MLR.

Applying your knowledge on the counterfactual model and DAGs, you will learn how to implement the different steps of PSM.

I will explain how to estimate propensity scores, how to implement and choose between different matching and estimation options and how to test whether PSM succeeded in balancing the observed control variables. We will apply the different steps based on real-world data in a computer lab session. We will then briefly discuss differences between the PSM approach and other matching approaches.

We conclude with an outlook on model extensions such as multiple treatments or sensitivity analysis in terms of Rosenbaum bounds.

III Selection on Unobservables: The Instrumental Variable (IV) Approach

MLR and PSM will produce biased estimates of causal effects if there is selection into the X-variable of interest based on unobserved factors.

Instrumental variables (IV) are seen as a solution to this problem. Its underlying identification strategy is to find an instrumental variable that is correlated with the X-variable of interest without having an independent effect on the outcome Y-variable.

This course introduces IV estimators based on the counterfactual model and DAGs. You will critically discuss examples of instrumental variables from the applied research to understand the problems of IV estimators (weak instruments, violation of exclusion restrictions etc).

This course will move beyond the classical textbooks on IV by introducing Angrist’s alternative interpretation of the IV estimator as identifying a Local Average Treatment Effect (LATE) in the context of heterogeneous effects, which is more reasonable for applications in social sciences.

In a lab session you will learn how to implement IV estimators in Stata based on real-world data, and how to interpret the results.

We will place special emphasis on critical discussion of the validity of IV approach assumptions in the context of practical research examples.

IV Using Longitudinal Data: The Fixed-Effect Logic and the Difference-in-Differences (DID) Approach

When prospective or retrospective longitudinal data, i.e. repeated measurements of the outcome Y-variable, are available, we can use other approaches to deal with the problem of selection on observable and unobservable variables.

The true strength of longitudinal data is that it allows us to observe the outcomes of the same observational unit over time.

In contrast to the random effect approach, we can easily remove time-constant observed and unobserved characteristics of the observational unit when applying the logic of before-after or fixed-effect panel estimators, and we can model anticipation effects and the impact function of the treatment.

The difference-in-differences (DID) approach combines this fixed-effect logic with a control group comparison.

Comparing time trends in the outcome Y-variable in the so-called treatment and control group allows for eliminating not only time-constant individual effects but also common time trends.

You will learn how to apply the DID estimator in a linear regression design, and how to combine the DID estimator with PSM to construct the control group in an innovative way.

You will apply your knowledge of causal graph analysis in the longitudinal context to discuss advanced topics such as the potential biases induced by introducing a lagged dependent variable.

In a lab session you will learn how to implement DID-regression and DID-PSM estimators in Stata based on real-world data, and how to interpret the results.

I expect you to be familiar with the basics of statistics and multiple linear regression analysis.

Empirical applications will be implemented in Stata, so knowledge of Stata is helpful but not a prerequisite because pre-programmed data sets and full syntax codes will be provided.

Day Topic Details
Day 1 (I) Causality, Counterfactuals and Causal Graphs 1. Posing causal research questions 2. The counterfactual model of causality 3. Introduction to directed acyclic graphs (DAG) and advanced topics

90min lecture/discussions + 90min lecture/discussions

For detailed description see long course outline

Day 2 (II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM) 1. MLR and directed acyclic graphs (DAG) 2. MLR and potential outcomes 3. PSM: Basic assumptions 4. PSM: Matching algorithms 5. PSM: Balancing tests 6. PSM: Advanced topics

90min lecture/discussions + 90min lecture/discussions

For detailed description see long course outline

Day 3 (II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM) (continued) 7. PSM: Practical implementation and empirical examples in Stata (III) Selection on Observables: The Instrument Variable (IV) Approach 1. IV: The classical IV estimator 2. IV: The modern LATE interpretation

90min lecture/discussions + 90min lab session

For detailed description see long course outline

Day 4 (III) Selection on Observables: The Instrument Variable (IV) Approach (continued) 3. IV: Practical implementation and empirical examples in Stata (IV) Using Longitudinal Data: The Fixed-Effect Logic and the Difference-in-Differences (DID) Approach 1. The benefits of longitudinal data 2. The before-after/fixed-effect estimator 3. The difference-in-differences (DID) estimator

90min lecture/discussions + 90min lab session

For detailed description see long course outline

Day 5 (IV) Using Longitudinal Data: The Fixed-Effect Logic and the Difference-in-Differences (DID) Approach (continued) 4. DID combined with PSM 5. Modelling anticipation effects and the impact function of the treatment 6. Advanced topics of DAGs in longitudinal design 7. DID and DID-PSM: Practical implementation and empirical examples in Stata 5. DID and DID-PSM: Practical implementation and empirical examples in Stata

90min lecture/discussions + 90min lab session

For detailed description see long course outline

Day Readings
Day 1

Keele (2015)

Morgan/Winship (2015) Ch2 'Counterfactuals and the Potential Outcome Model' only pp. 37–62, Ch3 'Causal Graphs' only pp.77–84

Elwert (2013)

Day 2

Gangl (2014)

Caliendo / Kopeinig (2008)

Ho / Imai / King / Stuart (2007)

Day 3

Morgan / Winship (2015) Ch9 'Instrumental Variable Estimators of Causal Effects'

Angrist et al (1996)

Day 4

Brüderl / Ludwig (2015)

Gangl (2010) Ch5.3 'Fixed-Effects and Difference-in-Differences Estimators'

Day 5

Winship and Morgan (1999) Ch. 'Longitudinal Methods', pp. 687–704

Lechner (2010)

Details on reference see 11. Literature

Software Requirements

Stata version 15 (or higher)

Hardware Requirements

No specific requirements

Literature

Angrist, J., G. Imbens and D. Rubin (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444-455.

Bertrand, M., E. Duflo and S. Mullainathan (2004). How much should we trust differences-in-differences estimates, Quarterly Journal of Economics, 249-275.

Bound, J., D. Jaeger and R. Baker (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak, Journal of the American Statistical Association 90, 443-50.

Brüderl,  Josef  and Volker Ludwig (2015) Fixed-Effects Panel Regression. In: Best, H. and C. Wolf (Eds) The SAGE Handbook of Regression Analysis and Causal Inference. London: Sage Publications, pp. 327-357.

Caliendo, M. and S. Kopeinig (2008). Some practical guidance for the implementation of propensity score matching, Journal of Economic Surveys 22, 31-72.

DiPrete, T. and M. Gangl (2004). Assessing bias in the estimation of causal effects: Rosenbaum bounds on matching estimators and instrumental variables estimation with imperfect instruments. Sociological Methodology 34, 271-310.

Elwert, Felix. 2013. Graphical Causal Models. In Stephen L. Morgan (Hrsg.). Handbook of Causal Analysis for Social Research. Dodrecht: Springer, pp. 245-273.

Elwert, F. and C. Winship (2014). Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable, Annual Review of Sociology, 40:31–53.

Gangl, M. (2010). Causal inference in sociological research. Annual Review of Sociology 36, 21-47.

Gangl, M. (2014). Matching estimators for treatment effects. In Best, H. and C. Wolf (Eds) The SAGE Handbook of Regression Analysis and Causal Inference. London: Sage Publications, pp. 251-276.

Muller, C., Winship, C. and S. Morgan (2014). Instrumental variables regression. In Best, H. and C. Wolf (Eds)  The SAGE Handbook of Regression Analysis and Causal Inference. London: Sage Publications, pp. 251-276.

Heckman, J. and S. Navarro-Lozano (2004). Using matching, instrumental variables, and control functions to estimate economic choice models. Review of Economics and Statistics 86, 30-57.

Ho, D., Imai, K, King, G. and E. Stuart (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15, 199-236.

Holland, P. (1986). Statistics and causal inference. Journal of American Statistical Association 81, 945-960.

Imai, K., Keele, L., Tingley, D. and T. Yamamoto (2011). Unpacking the black box of causality. Learning about causal mechanisms from experimental and observational studies. American Political Science Review, 105, 765–789.

Imbens, G. and D. Rubin (2015). Causal inference for statistics, social, and biomedical sciences. Ch.1 “Causality: The basic framework“.

Keele, L. (2015). The statistics of causal inference. A view from political methodology. Political Analysis, 23, 313–335.

Lechner, M. (2011). The estimation of causal effects by difference-in-difference methods. Foundations and Trends in Econometrics, 4(3), 165-224,

Morgan, S. and C. Winship (2015). Counterfactuals and causal inference. Cambridge: Cambridge University Press.

Morgan, S. and D. Harding (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research 35, 3-60.

Rubin, D. (1986). Which ifs have causal answers? Journal of American Statistical Association 81, 961-962.

Sekhon, J. (2009). Opiates for the matches: Matching methods for causal inference. Annual Review of Political Science 12, 487-508.

Sovey, A and D. Green (2010). Instrumental variables estimation in political science: A readers’ guide. American Journal of Political Science, 55(1), 188-200.

Stuart, E. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science 25, 1-21.

Recommended Courses to Cover Before this One

Summer School

Introduction to STATA

Multiple Regression Analysis: Estimation, Diagnostics, and Modelling

Linear Regression with R/Stata: Estimation, Interpretation and Presentation

Winter School

Introduction to STATA

Multiple Regression Analysis: Estimation, Diagnostics, and Modelling

Linear Regression with R/Stata: Estimation, Interpretation and Presentation

 

Recommended Courses to Cover After this One

Summer School

Introduction to Experimental Research in the Social Sciences

Winter School

Experimental Methods