Say we are interested in how long people keep their first job. We start our study at some point in time and include a sample of people that obtained their first job after the study started. After some time, say a number of years, the study stops and we want to analyse data. Some people have lost their job in the meantime, some have changed it, but some are still working and we do not have complete data on their time at job. Further, if somebody has stopped working because of his inability to work (accident, death), we also don’t know what his event time would have been had he still been able to work. When the event is not observed at the time of analysis we say that censoring has occurred. With such data we cannot even calculate the mean, or draw a histogram, let alone use linear regression or similar methods. Special methods are therefore needed, and most of them use the hazard (or intensity) function. Since this is defined via the conditional probability of event occurring in some time interval given it has not occurred before, the hazard can be estimated even in the presence of censored data. As we shall see, knowing the hazard function is equivalent to knowing the distribution function, which is the main goal of any analysis. In Survival analysis, and consequently in Event History Analysis, it has become customary to talk about the survival function, which is simply one minus the distribution function.
In the first day I will illustrate usage of logistic regression for event history data, and explain why such an approach is not satisfactory.
Then we will deal with estimating the survival and the hazard function (parametrically and non-parametrically), some measures of central tendency commonly used, and learn how to write down the likelihood function in the presence of censoring.
To continue our example, we might then be interested if there are any differences in keeping the job between men and women, between people with different levels of education, among different working environments and so on, in short, do some covariates influence the time a person stays in her/his first job.
We will therefore learn about tests for comparing survival functions and discuss two most commonly used parametric models for inclusion of covariates.
The focus in the second week will be on the Cox proportional hazards model which is by far most often used in the analysis of time to event data. While the model is very simple, it is also very flexible, and an experienced statistician can make it fit to almost any data. We will learn the basics about the estimating procedure, interpretation, testing, checking the modelling assumptions and relaxing them, and some extensions like the stratified model, frailties and time-varying effects..
For now we have not distinguished between a person losing his job, and a person changing his job. We also have not considered studying several spells for one person (one person can change jobs several times in the study period). These problems fall under the headings of competing risks and recurring events. The last two days will be devoted to these, with more time devoted to recurrent events as these seem to be quite common in political and social sciences (wars and goverments are two obvious examples).
Here is a list of topics:
Univariate event history analysis
- Survival function
- Hazard and cumulative hazard function
- Mean time, mean residual time, median time
- Likelihood function for censored time to event data
- Parametric models for the survival function (exponential, Weibull)
- Non parametric estimation of the survival function (Kaplan Meier and Nelson Aalen estimators)
- Variance of the survival function, confidence intervals
- Comparison of survival functions
Regression models for time to event data
- Parametric models (exponential, Weibull)
- Cox model (proportional hazards model):
- Estimation (partial likelihood)
- Testing the null hypothesis (Wald, score, and likelihood ratio test)
- Some model fitting techniques
- Categorical variables in the model
- Relaxing the linearity assumption for continuous variables using splines
- Checking the model assumptions
- Goodness-of-fit and explained variation
- Stratified model
- Time varying covariates
- Time varying effects
- Competing risks
- Recurring events
- Multistate models