ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Methods of Modern Causal Analysis Based on Observational Data

Course Dates and Times

Monday 5 to Friday 9 March 2018
09:00-12:30
15 hours over 5 days

Michael Gebel

michael.gebel@uni-bamberg.de

University of Bamberg

Estimating causal effects is a central aim of quantitative empirical analysis in social sciences. Though, social scientists often have to rely on non-experimental data, which suffer from the problems of self-selection based on (unobserved) heterogeneity and effect heterogeneity. Classical linear regression tries to account for these problems by controlling for observable variables, which, however, often leads to biased estimates. In the recent social science literature, new methods of modern causal analysis have become more and more popular. These methods build on clear concepts of causality – potential outcomes and directed acyclic graphs (DAGs) – and try to account for the above mentioned problems in a rigorous way. This course provides an introduction into new methods such as Propensity Score Matching (PSM), Instrument Variables (IV), and Difference-in-Difference (DID) approaches. In order to guarantee a practical orientation of the workshop, empirical examples are discussed and methods are applied in computer exercises using Stata and real-world data.

Tasks for ECTS Credits

  • Participants attending the course: 2 credits (pass/fail grade) The workload for the calculation of ECTS credits is based on the assumption that students attend classes and carry out the necessary reading and/or other work prior to, and after, classes.
  • Participants attending the course and completing one task (see below): 3 credits (to be graded)
  • Participants attending the course, and completing two tasks (see below): 4 credits (to be graded)

Instructor Bio

Michael Gebel is Full Professor of Methods of Empirical Social Research, at the University of Bamberg.

He graduated in economics and social sciences, and earned his doctoral degree in sociology at the University of Mannheim.

Michael has received a European Research Council (ERC) Starting Grant for the project The socio-economic consequences of temporary employment: A comparative panel data analysis (SECCOPA) for the period 2018–23.

His specific research interests include international comparative research, longitudinal data analysis and methods of modern causal analysis.

 @gebel_michael

The central aim of this course is to empower participants to think about causality and to apply new tools of modern causal analysis in their own research. Experiments are seen as the gold standard for drawing causal inference because of the manipulation of the treatment and the random assignment to the treatment groups. But there are several potential pitfalls of experiments that pose threats to the internal and external validity. Moreover, since experiments are often not feasible in social sciences due to ethical and practical reasons, this course will focus on modern methods of causal inference based on non-experimental data. The course content is structured around four key topics: First, the general idea of causality is presented based on the potential outcome framework and directed acyclic graphs (DAGs). Second, linear regression and propensity score matching (PSM) are introduced as methods of modern causal analysis for cross-sectional data that rely on the crucial assumption of selection on observed variables. Third, instrumental variables (IV) estimators are presented that can deal with the problem of selection on unobserved variables. Fourth, difference-in-difference (DID) approaches are discussed that use the benefits of longitudinal data. Instead of getting lost in the details of mathematical proofs and philosophical debates, the course offers an applied introduction and hands-on experience in lab sessions. Strengths and limitations of each approach will be discussed and illustrated with examples from the social science literature.

In the following, a detailed course outline is provided:

(I) Causality, Counterfactuals and Causal Graphs

How can we define causality in social science research? Starting with this basic question this course starts with one of the most important questions of the philosophy of science. Participants will learn to distinguish different kinds of causal hypotheses and they will reflect on the basic conditions that are seen as important for making causal claims. Although researchers often try to avoid making causal inferences, based on practical examples from applied research it will be shown that any serious research hypothesis postulates explicitly or, at least, implicitly a causal relationship between two or more variables.

The first class will introduce Rubin’s notation of potential outcomes, which has become the backbone of modern causal analysis in social sciences because it clearly defines different types of causal effects (the ATT, ATNT and ATE) and allows for causal effect heterogeneity. Based on this model participants will also learn how to pose properly formulated questions about causal effects. But participants will also critically discuss the basic assumptions of counterfactuals, manipulability and the stable unit treatment value assumption (SUTVA). Moreover, directed acyclic graphs (DAGs) will be introduced because they offer an illustrative graphical approach to the problem of causal inference. They will be used to clarify the crucial difference between (self-)selection processes into the X-variable of interest based on observed variables versus unobserved variables. Advanced topics of endogenous selection bias, common-cause confounding and overcontrol bias will be discussed in the framework of DAGs.

(II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM)

Regarding non-experimental methods, the most prominent approach is multiple linear regression (MLR) analysis. Its strategy is to condition on observable confounding variables in order to disentangle the causal effect of X on Y. However, in practice, researchers often control for the wrong variables and neglect important control variables because they are not aware of modern causal analysis. Participants will apply their new knowledge about counterfactuals to understand linear regression in the notation of potential outcomes. Moreover, applying the principles of DAGs participants will learn how to select the right control variables in a linear regression based on examples of applied research.

Propensity Score Matching (PSM) is an alternative method to condition on selection on observables, which has several advantages compared to MLR. Applying their knowledge on the counterfactual model and DAGs, participants will learn how to implement the different steps of PSM. Specifically, it will be explained how to estimate propensity scores, how to implement and choose between different matching and estimation options and how to test whether PSM succeeded in balancing the observed control variables. The different steps will be applied based on real-world data in a computer lab session. Differences between the PSM approach and other matching approaches will be shortly discussed. This section will conclude by an outlook on model extensions such as multiple treatments or sensitivity analysis in terms of Rosenbaum bounds.

(III) Selection on Unobservables: The Instrumental Variable (IV) Approach

MLR and PSM will produce biased estimates of causal effects if there is selection into the X-variable of interest based on unobserved factors. Instrumental variables (IV) are seen as a solution to this problem. Its underlying identification strategy is to find an instrumental variable that is correlated with the X-variable of interest without having an independent effect on the outcome Y-variable. This course introduces IV estimators based on the counterfactual model and DAGs. Participants will critically discuss examples of instrumental variables from the applied research in order to understand the problems of IV estimators (weak instruments, violation of exclusion restrictions etc.).

This course will move beyond the classical textbooks on IV by introducing Angrist’s alternative interpretation of the IV estimator as identifying a Local Average Treatment Effect (LATE) in the context of heterogeneous effects, which is more reasonable for applications in social sciences. Moreover, model extensions in terms of the control function (CF) estimator can even identify the ATE by making distributional assumptions about the error terms. This approach explicitly models the selection processes into the X-variable of interest. As Winship and Morgan (1992) argue selectivity is not only a source of bias in research, but also a genuine theoretical idea in social science because selectivity results naturally from human behavior such as individuals’ decisions/choices. In a lab session participants will learn how to implement IV and CF estimators in Stata based on real-world data and how to interpret the results. A special emphasis will lay on the critical discussion of the validity of the assumptions of both the IV and CF approach in the context of practical research examples.

(IV) Using Longitudinal Data: The Difference-in-Differences (DID) Approach

When prospective or retrospective longitudinal data, i.e. repeated measurements of the outcome Y-variable, are available additional approaches can be employed to deal with the problem of selection on observable and unobservable variables. The true strength of longitudinal data lies in the possibility to observe the outcomes of the same observational unit over time. Thus, applying the logic of before-after or fixed-effect panel estimators, time-constant observed and unobserved characteristics of the observational unit can be easily removed. The difference-in-differences (DID) approach combines this fixed-effect logic with a control group comparison. Comparing time trends in the outcome Y-variable in the so called treatment and control group allows for eliminating not only time-constant individual effects but also common time trends. Participants will not only learn how to apply the DID estimator in a linear regression design but also how to combine the DID estimator with the method of propensity score matching (PSM) in an innovative way. Using the PSM logic is an innovative strategy to form an appropriate control group in the DID design. In a lab session participants will learn how to implement the DID-regression and the DID-PSM estimators in Stata based on real-world data and how to interpret the results.

 

Participants are expected to be familiar with the basics of statistics and they should be familiar with multiple linear regression analysis. Empirical applications will be implemented in Stata. Therefore, any knowledge of Stata is very helpful. However, it is not a prerequisite because pre-programmed data sets and full syntax codes will be provided.

Day Topic Details
1 (I) Causality, Counterfactuals and Causal Graphs 1. Posing causal research questions 2. The counterfactual model of causality 3. Introduction to directed acyclic graphs (DAG)

90min lecture/discussions + 90min. lecture/discussions. For detailed description see long course outline.

2 (I) Causality, Counterfactuals and Causal Graphs (continued) 4. Advanced topics of directed acyclic graphs (DAG) (II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM) 1. MLR and directed acyclic graphs (DAG) 2. MLR and potential outcomes 3. PSM: Basic assumptions 4. PSM: Matching algorithms 5. PSM: Balancing tests

90min lecture/discussions + 90min lecture/discussions.

For detailed description see long course outline.

3 (II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM) (continued) 7. PSM: Advanced topics 8. PSM: Practical implementation and empirical examples in Stata

90min lecture/discussions + 90,in lab session.

For detailed description see long course outline.

4 (III) Selection on Observables: The Instrument Variable (IV) Approach 1. IV: The classical IV estimator 2. IV: The modern LATE interpretation 3. The control function (CF) estimator: The treatment effect selection model 4. IV and CF: Practical implementation and empirical examples in Stata

90min lecture/discussions + 90min lab session. For detailed description see long course outline.

5 (IV) Using Longitudinal Data: The Difference-in-Differences (DID) Approach 1. The benefits of longitudinal data 2. The before-after/fixed-effect estimator 3. The difference-in-differences (DID) estimator 4. DID combined with PSM 5. DID and DID-PSM: Practical implementation and empirical examples in Stata

90min lecture/discussions + 90min lab session. For detailed description see long course outline.

Day Readings
1

Keele (2015)

Morgan/Winship (2015) Ch2 “Counterfactuals and the Potential Outcome Model” only pp. 37–62, Ch3 “Causal Graphs” only pp.77–84

Elwert (2013)

2

Morgan/Harding (2006)

Ho/Imai/King/Stuart (2007)

3

Caliendo/Kopeinig (2008)

Gangl (2014)

4

Morgan/Winship (2015) Ch9 “Instrumental Variable Estimators of Causal Effects”

Sovey/Green (2010)

Angrist et al (1996)

Winship/Mare (1992)

5 Gangl (2010) Ch5.3 “Fixed-Effects and Difference-in-Differences Estimators” Winship and Morgan (1999) Ch. “Longitudinal Methods”, pp. 687–704 Lechner (2010)
Details on reference see “10. Literature”

Software Requirements

Stata version 14 (or higher)

Hardware Requirements

No specific requirements

Literature

Angrist, J., G. Imbens and D. Rubin (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444-455.

Bertrand, M., E. Duflo and S. Mullainathan (2004). How much should we trust differences-in-differences estimates, Quarterly Journal of Economics, 249-275.

Bound, J., D. Jaeger and R. Baker (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak, Journal of the American Statistical Association 90, 443-50.

Caliendo, M. and S. Kopeinig (2008). Some practical guidance for the implementation of propensity score matching, Journal of Economic Surveys 22, 31-72.

DiPrete, T. and M. Gangl (2004). Assessing bias in the estimation of causal effects: Rosenbaum bounds on matching estimators and instrumental variables estimation with imperfect instruments. Sociological Methodology 34, 271-310.

Elwert, Felix. 2013. Graphical Causal Models. S. 245-273 in Stephen L. Morgan (Hrsg.). Handbook of Causal Analysis for Social Research. Dodrecht: Springer.

Elwert, F. and C. Winship (2014). Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable, Annual Review of Sociology, 40:31–53.

Gangl, M. (2010). Causal inference in sociological research. Annual Review of Sociology 36, 21-47.

Gangl, M. (2014). Matching estimators for treatment effects. In Best, H. and C. Wolf (Eds) The SAGE Handbook of Regression Analysis and Causal Inference. London: Sage Publications, pp. 251-276.

Muller, C., Winship, C. and S. Morgan (2014). Instrumental variables regression. In Best, H. and C. Wolf (Eds)  The SAGE Handbook of Regression Analysis and Causal Inference. London: Sage Publications, pp. 251-276.

Heckman, J. and S. Navarro-Lozano (2004). Using matching, instrumental variables, and control functions to estimate economic choice models. Review of Economics and Statistics 86, 30-57.

Ho, D., Imai, K, King, G. and E. Stuart (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15, 199-236.

Holland, P. (1986). Statistics and causal inference. Journal of American Statistical Association 81, 945-960.

Imai, K., Keele, L., Tingley, D. and T. Yamamoto (2011). Unpacking the black box of causality. Learning about causal mechanisms from experimental and observational studies. American Political Science Review, 105, 765–789.

Imbens, G. and D. Rubin (2015). Causal inference for statistics, social, and biomedical sciences. Ch.1 “Causality: The basic framework“.

Keele, L. (2015). The statistics of causal inference. A view from political methodology. Political Analysis, 23, 313–335.

Lechner, M. (2011). The estimation of causal effects by difference-in-difference methods. Foundations and Trends in Econometrics, 4(3), 165-224,

Morgan, S. and C. Winship (2015). Counterfactuals and causal inference. Cambridge: Cambridge University Press.

Morgan, S. and D. Harding (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research 35, 3-60.

Rubin, D. (1986). Which ifs have causal answers? Journal of American Statistical Association 81, 961-962.

Sekhon, J. (2009). Opiates for the matches: Matching methods for causal inference. Annual Review of Political Science 12, 487-508.

Sovey, A and D. Green (2010). Instrumental variables estimation in political science: A readers’ guide. American Journal of Political Science, 55(1), 188-200.

Stuart, E. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science 25, 1-21.

Winship, C. and R. Mare (1992). Models for sample selection bias. Annual Review of Sociology 18, 327-350.

Winship, C. and S. Morgan (1999). The estimation of causal effects from observational data. Annual Review of Sociology 25, 659-706.

Recommended Courses to Cover Before this One

Summer School

Introduction to STATA

Multiple Regression Analysis: Estimation, Diagnostics, and Modelling

Winter School

Introduction to STATA

Multiple Regression Analysis: Estimation, Diagnostics, and Modelling

 

Recommended Courses to Cover After this One

Summer School

Introduction to Experimental Research in the Social Sciences

Winter School

Experimental Methods