ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

Event History and Survival Analysis

Course Dates and Times

Monday 17 – Friday 21 February 2019, 09:00–12:30
15 hours over five days

Janez Stare

janez.stare@mf.uni-lj.si

University of Ljubljana

In event history analysis (and survival analysis, which is the name used mostly in bio sciences, where the methods were first applied) we are interested in time intervals between successive state transitions or events. Typical examples are: duration of unemployment, duration of marriage, recidivism in criminology, duration of political systems, time from diagnosis to death, and so on.

The most distinctive feature of time to event data is that the event is often not observed at the time of analysis. Applying standard statistical methods to such data leads to severe bias or loss of information.

Special methods are therefore needed to extract information which we are accustomed to get using standard methods (formally this means estimating the distribution function and incorporate predictive variables into such estimation).

Further complications arise when covariates change in time, when times between recurring events are correlated, when there are competing risks, or when effects change in time.

In this course we will thoroughly study a situation when there is only one event per subject, but we will also quickly review the extensions to a sufficient degree for you to be able to continue your work in the area.

Roughly a third of the time will be devoted to practical examples, for which we will use the software package R. Familiarity with R is not assumed, but you will receive a short introduction to the package before the course begins.

While it is impossible to avoid all formulas, I will focus on the concepts in my lectures, but will support the lectures with more rigorous written material.

Tasks for ECTS Credits

2 credits (pass/fail grade) Attend at least 90% of course hours, participate fully in in-class activities, and carry out the necessary reading and/or other work prior to, and after, class.

3 credits (to be graded) As above, plus complete two daily assignments, to be given on Tuesday and Thursday and returned on Wednesday and Friday.

4 credits (to be graded) As above, plus complete a take-home assignment.

 


Instructor Bio

Janez Stare graduated from the Faculty of Mathematics, University of Ljubljana, then gained a Master's Degree and PhD in Biostatistics from the University of Ljubljana's Faculty of Medicine.

He is currently full Professor of Biostatistics and Head of the Institute of Biostatistics and Medical Informatics, Faculty of Medicine, Ljubljana, and Head of the Doctoral Programme in Statistics at University of Ljubljana.

His research interests are explained variation in survival analysis, predictive ability of regression models in survival analysis, frailties, random effects in survival models, relative survival, goodness of fit of regression models, and scientometrics.

Say we are interested in how long people keep their first job. We start our study at some point in time and include a sample of people who obtained their first job after the study started.

After some time, say, a number of years, the study stops and we want to analyse data. Some people have lost their job in the meantime, some have changed it, but some are still working and we do not have complete data on their time at job. If somebody has become unable to work (accident, death), we also don’t know what his event time would have been had he still been able to work.

When the event is not observed at the time of analysis we say that censoring has occurred. With such data we cannot even calculate the mean, or draw a histogram, let alone use linear regression or similar methods.

Special methods are therefore needed, and most of them use the hazard (or intensity) function. Since this is defined via the conditional probability of event occurring in some time interval given it has not occurred before, the hazard can be estimated even in the presence of censored data.

As we shall see, knowing the hazard function is equivalent to knowing the distribution function, which is the main goal of any analysis.

In Survival analysis, and consequently in Event History Analysis, it has become customary to talk about the survival function, which is simply one minus the distribution function.

I will first illustrate usage of logistic regression for event history data, and explain why such an approach is not satisfactory.

Then we will deal with estimating the survival and the hazard function (parametrically and non-parametrically), some measures of central tendency commonly used, and learn how to write down the likelihood function in the presence of censoring.

To continue our example, we might then be interested to find out whether there are any differences between men and women in their ability to keep a job, between people with different levels of education, among different working environments and so on, in short, do some covariates influence the time a person stays in her/his first job?

We will therefore learn about tests for comparing survival functions and discuss two of the most commonly used parametric models for inclusion of covariates.

The main focus of the course will be on the Cox proportional hazards model which is by far most often used in the analysis of time to event data. While the model is very simple, it is also very flexible, and an experienced statistician can make it fit almost any data.

We will learn the basics about the estimating procedure, interpretation, testing, checking the modelling assumptions and relaxing them, and some extensions like the stratified model, time varying effects and frailties.

For now we have not distinguished between a person losing his job, and a person changing his job. We also have not considered studying several spells for one person (one person can change or lose jobs several times in the study period). These problems fall under the headings of competing risks, multistate models and recurring events. Essentially, such data can be analysed using the methods learned in this course. On the last day I will quickly review the basic approaches.

Topic list

Univariate event history analysis

  1. Censoring
  2. Survival function
  3. Hazard and cumulative hazard function
  4. Mean time, mean residual time, median time
  5. Likelihood function for censored time to event data
  6. Parametric models for the survival function (exponential, Weibull)
  7. Non parametric estimation of the survival function (Kaplan Meier and Nelson Aalen estimators)
  8. Variance of the survival function, confidence intervals
  9. Comparison of survival functions

Regression models for time to event data

  1. Parametric models (exponential, Weibull)
  2. Cox model (proportional hazards model):
    • Estimation (partial likelihood)
    • Interpretation
    • Testing the null hypothesis (Wald, score, and likelihood ratio test)
    • Some model fitting techniques
    • Categorical variables in the model
    • Relaxing the linearity assumption for continuous variables using splines
    • Checking the model assumptions
    • Goodness-of-fit and explained variation
    • Stratified model
    • Time varying covariates
    • Frailties
    • Time varying effects
  3. Competing risks
  4. Recurring events
  5. Multistate models

THIS IS THE WINTER SCHOOL 2019 OUTLINE

You should have some working knowledge of linear regression models and be familiar with the basics of inferential statistics.

I understand that you may not have strong mathematical skills, but for those that do, the written material contains more rigorous treatment of the subject.

Even though I use formulas only to explain the concepts, I suggest that you clear the dust from the maths buried in your memory, preferably with the notion of the integral included. Not being afraid of the formulas is an advantage and certainly helps understand the subject better.

I assume some practical experience with statistical software. R is the preferred package but not a requirement.

Day Topic Details
Monday • Introduction to Event History Analysis • Using logistic regression to analyze survival data • Event History and Social Science • Event history data structures • Basic definitions
  • topics of course
  • course goals
  • overview of course schedule
  • illustration of using logistic regression for survival data
  • examples
  • censoring
  • survival function
  • hazard and cumulative hazard function

2.5 hours lecture, 30´ examples in R

Tuesday • Parametric and nonparametric descriptive methods • Comparison of survival functions • Parametric regression models for single-spell duration data • Methods to check parametric assumptions
  • Mean survival time, mean residual time, median time
  • Exponential and Weibull distribution
  • Kaplan-Meier estimator
  • Life tables
  • Log rank test
  • Exponential and Weibull regression model

2 hours lecture, 1 hour examples in R

Wednesday Cox model
  • Estimation (partial likelihood)
  • Interpretation
  • Some model fitting techniques
  • Categorical variables in the model

2 hours lecture, 1 hour examples in R

Thursday Cox model (continued)
  • Relaxing the linearity assumption for continuous variables using splines
  • Checking the model assumptions
  • Stratified model
  • Time varying covariates
  • Frailties
  • Time varying effects

2 hours lecture, 1 hour examples in R

Friday Competing risks and Multiple events

Cox model for competing risks, repeated events

and multistate models

1.5 hours lecture, 1.5 hours examples in R

Day Readings

I have chosen two books to be used as a supplementary reading:

  1. Janet M. Box-Steffensmeier, Bradford S. Jones
    Event History Modeling: A Guide for Social Scientists
    Cambridge University Press 2004
  2. Habs-Peter Blossfeld, Götz Rohwer
    Techniques of Event History Analysis
    Lawrence Erlbaum Associates, London 2002
Monday

Box-Steffensmeier 1, 2; Blossfeld 2

Tuesday

Blossfeld 3.1, 3.2, 3.3; Box-Steffensmeier 1,2

Wednesday

Box-Steffensmeier 4, 6

Thursday

Box-Steffensmeier 7, 9: Blossfeld 10.1

Friday

Box-Steffensmeier 10

Software Requirements

We will use the software package R for illustrations and exercises. Experience with R is not essential, but some familiarity would be useful.

Download R for free 

You will not be required to work practically with R, but you will benefit from some hands-on experience. Those doing home assignments to get more ECTS credits should have R (Stata) or similar installed.

If you have problems installing R, we can help you when you are in Bamberg.

Download R the survival package

New versions of R appear regularly, and we will let you know in advance if you need to install a new version. We cannot help with other software, but you can of course use something else on your own.

Hardware Requirements

Bring your own laptop with R (or something else) installed.

Literature

In the biostatistical field there are a lot of good books on Event History and Survival Analysis. The following are good examples:

Collett D.
Modelling Survival Data in Medical Research
Chapman and Hall/CRC; 2 edition (March 30, 2003)

Hosmer D.W., Lemeshow S., May S.
Applied Survival Analysis: Regression Modeling of Time to Event Data (Wiley Series in Probability and Statistics)
Wiley-Interscience; 2 edition (March 7, 2008)

Recommended Courses to Cover Before this One

Summer School

Stats Refresher

Winter School

Introduction to R (entry level)