ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Back to Panel Details
Back to Panel Details

Methods of Modern Causal Analysis Based on Observational Data

Michael Gebel
michael.gebel@uni-bamberg.de

University of Bamberg

Michael Gebel is Full Professor of Methods of Empirical Social Research, at the University of Bamberg.

He graduated in economics and social sciences, and earned his doctoral degree in sociology at the University of Mannheim.

Michael has received a European Research Council (ERC) Starting Grant for the project The socio-economic consequences of temporary employment: A comparative panel data analysis (SECCOPA) for the period 2018–23.

His specific research interests include international comparative research, longitudinal data analysis and methods of modern causal analysis.

 @gebel_michael


Course Dates and Times

Monday 6 to Friday 10 March 2017
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days

Prerequisite Knowledge

Participants are expected to be familiar with the basics of statistics and they should be familiar with multiple linear regression analysis. Empirical applications will be implemented in Stata. Therefore, any knowledge of Stata is very helpful. However, it is not a prerequisite because pre-programmed data sets and full syntax codes will be provided.


Short Outline

Estimating causal effects is a central aim of quantitative empirical analysis in social sciences. Though, social scientists often have to rely on non-experimental data, which suffer from the problems of self-selection based on (unobserved) heterogeneity and effect heterogeneity. Classical linear regression tries to account for these problems by controlling for observable variables, which, however, often leads to biased estimates. In the recent social science literature, new methods of modern causal analysis have become more and more popular. These methods build on clear concepts of causality – potential outcomes and directed acyclic graphs (DAGs) – and try to account for the above mentioned problems in a rigorous way. This course provides an introduction into new methods such as Propensity Score Matching (PSM), Instrument Variables (IV), Control Functions (CF) and Difference-in-Difference (DID) approaches. In order to guarantee a practical orientation of the workshop, empirical examples are discussed and methods are applied in computer exercises using Stata and real-world data.


Long Course Outline

The central aim of this course is to empower participants to think about causality and to apply new tools of modern causal analysis in their own research. The course content is structured around four key topics: First, the general idea of causality is presented based on the potential outcome framework and directed acyclic graphs (DAGs). Second, linear regression and propensity score matching (PSM) are introduced as methods of modern causal analysis for cross-sectional data that rely on the crucial assumption of selection on observed variables. Third, instrumental variables (IV) and control function (CF) are presented that can deal with the problem of selection on unobserved variables. Fourth, difference-in-difference (DID) approaches are discussed that use the benefits of longitudinal data. Instead of getting lost in the details of mathematical proofs and philosophical debates, the course offers an applied introduction and hands-on experience in lab sessions. Strengths and limitations of each approach will be discussed and illustrated with examples from the social science literature.

In the following, a detailed course outline is provided:

(I) Causality, Counterfactuals and Causal Graphs

How can we define causality in social science research? Starting with this basic question this course starts with one of the most important questions of the philosophy of science. Participants will reflect on the basic conditions that are seen as important for making causal claims. Although researchers often try to avoid making causal inferences, based on practical examples from applied research it will be shown that any serious research hypothesis postulates explicitly or, at least, implicitly a causal relationship between two or more variables.

The first class will introduce Rubin’s notation of potential outcomes, which has become the backbone of modern causal analysis in social sciences because it clearly defines different types of causal effects (the ATT, ATNT and ATE) and allows for causal effect heterogeneity. Based on this model participants will also learn how to pose properly formulated questions about causal effects. But participants will also critically discuss the basic assumptions of counterfactuals, manipulability and the stable unit treatment value assumption (SUTVA). Moreover, directed acyclic graphs (DAGs) will be introduced because they offer an illustrative graphical approach to the problem of causal inference. They will be used to clarify the crucial difference between (self-)selection processes into the X-variable of interest based on observed variables versus unobserved variables.

Then we will discuss the experimental method from the perspective of the counterfactual model. Experiments are seen as the golden standard for drawing causal inference because of the randomized assignment into the treatment. But there are several potential pitfalls of experiments that pose threats to the internal and external validity. Moreover, since experiments are often not feasible in social sciences due to ethical and practical reasons, this course will focus on modern methods of causal inference based on non-experimental data in the following.

(II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM)

Regarding non-experimental methods, the most prominent approach is multiple linear regression (MLR) analysis. Its strategy is to condition on observable confounding variables in order to disentangle the causal effect of X on Y. However, in practice, researchers often control for the wrong variables and neglect important control variables because they are not aware of modern causal analysis. Participants will apply their new knowledge about counterfactuals to understand linear regression in the notation of potential outcomes. Moreover, applying the principles of DAGs participants will learn how to select the right control variables in a linear regression based on examples of applied research.

Propensity Score Matching (PSM) is an alternative method to condition on selection on observables, which has several advantages compared to MLR. Applying their knowledge on the counterfactual model and DAGs, participants will learn how to implement the different steps of PSM. Specifically, it will be explained how to estimate propensity scores, how to implement and choose between different matching and estimation options and how to test whether PSM succeeded in balancing the observed control variables. The different steps will be applied based on real-world data in a computer lab session. This section will conclude by discussing sensitivity analysis that can strengthen the claims made with PSM.

(III) Selection on Observables: Instrument Variable (IV) and Control Function (CF) Approaches

MLR and PSM will produce biased estimates of causal effects if there is selection into the X-variable of interest based on unobserved factors. Instrumental variables (IV) are seen as a solution to this problem. Its underlying identification strategy is to find an instrumental variable that is correlated with the X-variable of interest without having an independent effect on the outcome Y-variable. This course introduces IV estimators based on the counterfactual model and DAGs. Participants will critically discuss examples of instrument variables from the applied research in order to understand the problems of IV estimators (weak instruments, violation of exclusion restrictions etc.). This course will move beyond the classical textbooks on IV by introducing Angrist’s alternative interpretation of the IV estimator as identifying a Local Average Treatment Effect (LATE) in the context of heterogenous effects, which is more reasonable for applications in social sciences.

While IV estimators only allow identifying the LATE in case of heterogeneous effects, control function (CF) estimators can also identify other causal effects (the ATT, ATNT and ATE) by making distributional assumptions about the error terms. This approach explicitly models the selection processes into the X-variable of interest. As Winship and Morgan (1992) argue selectivity is not only a source of bias in research, but also a genuine theoretical idea in social science because selectivity results naturally from human behavior such as individual decisions/choices. In a lab session participants will learn how to implement IV and CF estimators in Stata based on real-world data and how to interpret the results. A special emphasis will lay on the critical discussion of the validity of the assumptions of both the IV and CF approach in the context of practical research examples.

(IV) Using Longitudinal Data: The Difference-in-Differences (DID) Approach

When prospective or retrospective longitudinal data, i.e. repeated measurements of the outcome Y-variable, are available additional approaches can be employed to deal with the problem of selection on observable and unobservable variables. The true strength of longitudinal data lies in the possibility to observe the outcomes of the same observational unit over time. Thus, applying the logic of before-after or fixed-effect panel estimators, time-constant observed and unobserved characteristics of the observational unit can be easily removed. The difference-in-differences (DID) approach combines this fixed-effect logic with a control group comparison. Comparing time trends in the outcome Y-variable in the so called treatment and control group allows for eliminating not only time-constant individual effects but also common time trends. Participants will not only learn how to apply the DID estimator in a linear regression design but also how to combine the DID estimator with the method of propensity score matching (PSM) in an innovative way. Using the PSM logic is an innovative strategy to form an appropriate control group in the DID design. In a lab session participants will learn how to implement the DID-regression and the DID-PSM estimators in Stata based on real-world data and how to interpret the results.

Day Topic Details
1 (I) Causality, Counterfactuals and Causal Graphs: 1. Posing causal research questions 2. The counterfactual model of causality 3. Directed acyclic graphs (DAG) 4. Randomization in experiments as the gold standard of causal inference? (II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM): 1. MLR and potential outcomes 2. MLR and directed acyclic graphs (DAG) 90min lecture/discussions + 90min. lecture/discussions. For detailed description see long course outline.
2 (II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM) (continued): 1. PSM: Basic assumptions 2. PSM: Matching algorithms 3. PSM: Balancing tests 4. PSM: Sensitivity tests 5. PSM: Practical implementation and empirical examples in Stata 90min lecture/discussions + 90min lab session. For detailed description see long course outline.
3 (II) Selection on Observables: Multiple Linear Regression (MLR) and Propensity Score Matching (PSM) (continued): 7. PSM: Practical implementation and empirical examples in Stata (continued) (III) Selection on Observables: Instrument Variable (IV) and Control Function (CF) Approaches 1. IV: The classical IV estimator 2. IV: The modern LATE interpretation 90min lab session + 90min lecture/discussions. For detailed description see long course outline.
4 (III) Selection on Observables: Instrument Variable (IV) and Control Function (CF) Approaches (continued) 1. CF: Heckman’s selection correction model 2. CF: The treatment effect selection model 3. IV and CF: Practical implementation and empirical examples in Stata 90min lecture/discussions + 90min lab session. For detailed description see long course outline.
5 (IV) Using Longitudinal Data: The Difference-in-Differences (DID) Approach 1. The benefits of longitudinal data 2. The before-after/fixed-effect estimator 3. The difference-in-differences (DID) estimator 4. DID combined with PSM 5. DID and DID-PSM: Practical implementation and empirical examples in Stata 90min lecture/discussions + 90min lab session. For detailed description see long course outline.
Day Readings
1 Keele (2015) Morgan/Winship (2015) Ch2 “Counterfactuals and the Potential Outcome Model” only pp. 37–62, Ch3 “Causal Graphs” only pp.77–84 , Ch4 “Models of Causal Exposure and Identification Criteria for Conditioning Estimators” only pp. 105–130, Ch6 “Regression Estimators of Causal Effects” only pp. 188–206
2

Gangl (2015) Morgan/Harding (2006) Caliendo/Kopeinig (2008) Ho/Imai/King/Stuart (2007)

3 Morgan/Winship (2015) Ch9 “Instrument Variable Estimators of Causal Effects” Sovey/Green (2010) Angrist et al (1996)
4 Winship/Mare (1992) Vella (1998), only pp. 127–139
5 Gangl (2010) Ch5.3 “Fixed-Effects and Difference-in-Differences Estimators” Winship and Morgan (1999) Ch. “Longitudinal Methods”, pp. 687–704 Lechner (2010)
Details on reference see “10. Literature”

Software Requirements

Stata version 13 (or higher)

Hardware Requirements

No specific requirements

Literature


Angrist, J., G. Imbens and D. Rubin (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444-455.

Bertrand, M., E. Duflo and S. Mullainathan (2004). How much should we trust differences-in-differences estimates, Quarterly Journal of Economics, 249-275.

Bound, J., D. Jaeger and R. Baker (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak, Journal of the American Statistical Association 90, 443-50.

Caliendo, M. and S. Kopeinig (2008). Some practical guidance for the implementation of propensity score matching, Journal of Economic Surveys 22, 31-72.

DiPrete, T. and M. Gangl (2004). Assessing bias in the estimation of causal effects: Rosenbaum bounds on matching estimators and instrumental variables estimation with imperfect instruments. Sociological Methodology 34, 271-310.

Gangl, M. (2010). Causal inference in sociological research. Annual Review of Sociology 36, 21-47.

Gangl, M. (2015). Matching estimators for treatment effects. In Best, H. and C. Wolf (Eds) The SAGE Handbook of Regression Analysis and Causal Inference. London: Sage Publications, pp. 251-276.

Muller, C., Winship, C. and S. Morgan (2015). Instrumental variables regression. In Best, H. and C. Wolf (Eds) The SAGE Handbook of Regression Analysis and Causal Inference. London: Sage Publications, pp. 251-276.

Heckman, J. and S. Navarro-Lozano (2004). Using matching, instrumental variables, and control functions to estimate economic choice models. Review of Economics and Statistics 86, 30-57.

Ho, D., Imai, K, King, G. and E. Stuart (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15, 199-236.

Holland, P. (1986). Statistics and causal inference. Journal of American Statistical Association 81, 945-960.

Imai, K., Keele, L., Tingley, D. and T. Yamamoto (2011). Unpacking the black box of causality. Learning about causal mechanisms from experimental and observational studies. American Political Science Review, 105, 765–789.

Imbens, G. and D. Rubin (2015). Causal inference for statistics, social, and biomedical sciences. Ch.1 “Causality: The basic framework“.

Keele, L. (2015). The statistics of causal inference. A view from political methodology. Political Analysis, 23, 313–335.

Lechner, M. (2011). The estimation of causal effects by difference-in-difference methods. Foundations and Trends in Econometrics, 4(3), 165-224,

Morgan, S. and C. Winship (2015). Counterfactuals and causal inference. Cambridge: Cambridge University Press.

Morgan, S. and D. Harding (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research 35, 3-60.

Rubin, D. (1986). Which ifs have causal answers? Journal of American Statistical Association 81, 961-962.

Sekhon, J. (2009). Opiates for the matches: Matching methods for causal inference. Annual Review of Political Science 12, 487-508.

Sovey, A and D. Green (2010). Instrumental variables estimation in political science: A readers’ guide. American Journal of Political Science, 55(1), 188-200.

Stuart, E. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science 25, 1-21.

Vella, F. (1998). Estimating models with sample selection bias: A survey. The Journal of Human Resources 33, 127-169.

Winship, C. and R. Mare (1992). Models for sample selection bias. Annual Review of Sociology 18, 327-350.

Winship, C. and S. Morgan (1999). The estimation of causal effects from observational data. Annual Review of Sociology 25, 659-706.

Recommended Courses to Cover Before this One

<p>Summer School SA106 Introduction to STATA SA104 Basics of Inferential Statistics for Political Scientists or WB106 Introduction to Statistics for Political and Social Scientists SB101 Research Designs or WB-101 Research Designs Fundamentals Winter School SA104 Basics of Inferential Statistics for Political Scientists or WB106 Introduction to Statistics for Political and Social Scientists SB101 Research Designs or WB-101 Research Designs Fundamentals</p>

Recommended Courses to Cover After this One

<p>Winter School WB105 Experimental Methods</p>


Additional Information

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc). Registered participants will be informed in due time.

Note from the Academic Conveners

By registering for this course, you confirm that you possess the knowledge required to follow it. The instructor will not teach these prerequisite items. If in doubt, contact the instructor before registering.