Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Member rate £492.50
Non-Member rate £985.00
Save £45 Loyalty discount applied automatically*
Save 5% on each additional course booked
*If you attended our Methods School in the last calendar year, you qualify for £45 off your course fee.
Monday 1 to Friday 5 August and Monday 8 to Friday 12 August 2016
Generally classes are either 09:00-12:30 or 14:00-17:30
30 hours over 10 days
In event history analysis (and survival analysis, which is the name used mostly in bio sciences, where the methods were first applied) we are interested in time intervals between successive state transitions or events. Typical examples are: duration of unemployment, duration of marriage, recidivism in criminology, duration of political systems, time from diagnosis to death, and so on. The most distinctive feature of time to event data is that the event is often not observed at the time of analysis. Applying standard statistical methods to such data leads to severe bias or loss of information. Special methods are therefore needed to extract information which we are used to get using standard methods (formally this means estimating the distribution function and incorporate predictive variables into such estimation). Further complications arise when covariates change in time, when times between recurring events are correlated, when there are competing risks, or when effects change in time.
In this course we will thoroughly study a situation when there is only one event per subject, but will also review the extensions to a sufficient degree for students to be able to continue their work in the area. Roughly half of the time will be devoted to practical exercises, for which the package R will be used. Familiarity with R is not assumed, but students will receive a short introductory material to the package before the summer school begins.
While it is impossible to avoid all formulas, I will focus on the concepts in my lectures, but will support the lectures with more rigorous written material.
Janez Stare graduated from the Faculty of Mathematics, University of Ljubljana, then gained a Master's Degree and PhD in Biostatistics from the University of Ljubljana's Faculty of Medicine.
He is currently full Professor of Biostatistics and Head of the Institute of Biostatistics and Medical Informatics, Faculty of Medicine, Ljubljana, and Head of the Doctoral Programme in Statistics at University of Ljubljana.
His research interests are explained variation in survival analysis, predictive ability of regression models in survival analysis, frailties, random effects in survival models, relative survival, goodness of fit of regression models, and scientometrics.
Say we are interested in how long people keep their first job. We start our study at some point in time and include a sample of people that obtained their first job after the study started. After some time, say a number of years, the study stops and we want to analyse data. Some people have lost their job in the meantime, some have changed it, but some are still working and we do not have complete data on their time at job. Further, if somebody has stopped working because of his inability to work (accident, death), we also don’t know what his event time would have been had he still been able to work. When the event is not observed at the time of analysis we say that censoring has occurred. With such data we cannot even calculate the mean, or draw a histogram, let alone use linear regression or similar methods. Special methods are therefore needed, and most of them use the hazard (or intensity) function. Since this is defined via the conditional probability of event occurring in some time interval given it has not occurred before, the hazard can be estimated even in the presence of censored data. As we shall see, knowing the hazard function is equivalent to knowing the distribution function, which is the main goal of any analysis. In Survival analysis, and consequently in Event History Analysis, it has become customary to talk about the survival function, which is simply one minus the distribution function.
In the first day I will illustrate usage of logistic regression for event history data, and explain why such an approach is not satisfactory.
Then we will deal with estimating the survival and the hazard function (parametrically and non-parametrically), some measures of central tendency commonly used, and learn how to write down the likelihood function in the presence of censoring.
To continue our example, we might then be interested if there are any differences in keeping the job between men and women, between people with different levels of education, among different working environments and so on, in short, do some covariates influence the time a person stays in her/his first job.
We will therefore learn about tests for comparing survival functions and discuss two most commonly used parametric models for inclusion of covariates.
The focus in the second week will be on the Cox proportional hazards model which is by far most often used in the analysis of time to event data. While the model is very simple, it is also very flexible, and an experienced statistician can make it fit to almost any data. We will learn the basics about the estimating procedure, interpretation, testing, checking the modelling assumptions and relaxing them, and some extensions like the stratified model, frailties and time-varying effects..
For now we have not distinguished between a person losing his job, and a person changing his job. We also have not considered studying several spells for one person (one person can change jobs several times in the study period). These problems fall under the headings of competing risks and recurring events. The last two days will be devoted to these, with more time devoted to recurrent events as these seem to be quite common in political and social sciences (wars and goverments are two obvious examples).
Here is a list of topics:
Univariate event history analysis
Regression models for time to event data
Participants should have some working knowledge of linear regression models and be familiar with the basics of inferential statistics. If not, for the latter a short course in inferential statistics is highly recommended. As for mathematics, it is understood that the participants will not have much mathematical skills, but for those that will, the written material contains more rigorous treatment of the subject. Even though I use the formulas to explain the concepts, I would suggest that the participants clear the dust from the mathematics lying buried in their memory, preferably with the notion of the integral included. Not being afraid of the formulas is an advantage and certainly helps in understanding the subject better.
Day | Topic | Details |
---|---|---|
Wednesday 3 | Parametric and nonparametric descriptive methods |
Mean survival time, mean residual time, median time Exponential and Weibull distribution Kaplan-Meier estimator Life tables 2 hours lecture, 1 hour lab |
Monday 1 | Introduction to Event History Analysis Using logistic regression to analyze survival data |
90’ lecture
90’ lecture Illustration of using logistic regression for survival data |
Tuesday 2 | Event History and Social Science Event history data structures Basic definitions |
Censoring Survival function Hazard and cumulative hazard function 2 hours lecture, 1 hour lab |
Thursday 4 | Comparison of survival functions Parametric regression models for single-spell duration data Methods to check parametric assumptions |
Log rank test Exponential and Weibull regression model 90’ lecture, 90’ lab |
Friday 5 | Introduction to Cox model |
Estimation (partial likelihood) Interpretation
90’ lecture, 90’ lab |
Monday 8 | Participants present their own data Cox model (cont.) |
Some model fitting techniques Categorical variables in the model 90’ lecture (together with presentations), 90’ lab |
Tuesday 9 | Cox model (cont.) |
Relaxing the linearity assumption for continuous variables using splines Checking the model assumptions Stratified model
90’ lecture, 90’ lab |
Wednesday 10 | Cox model (cont.) |
Time varying covariates Frailties Time varying effects
90’ lecture, 90’ lab |
Thursday 11 | Competing risks Models for multiple events |
Cox model for competing risks, repeated events and multistate models
90’ lecture, 90’ lab |
Friday 12 | Models for multiple events (labs only) |
Cox model for repeated events and multistate models
3 hours lab |
Day | Readings |
---|---|
Monday 1 |
Box-Steffensmeier 1,2 |
Tuesday 2 |
Blossfeld 2 |
Wednesday 3 |
Blossfeld 3.1, 3.2 |
Note |
I have chosen two books that could be used as a supplementary reading:
The first is less, the second more technical. I am listing books that have Event History in the title, but there are books under the name Survival Analysis which are better. David Collett’s is one, Hosmer&Lemshov’s is another example (see point 10. below). Most of the material covered in the course will be contained in my own hand-outs, which the participants will be able to download from the web one month before the course begins. |
Thursday 4 |
Box-Steffensmeier 1,2, Blossfeld 3.3 |
Friday 5 |
Box-Steffensmeier 4 |
Monday 8 |
Box-Steffensmeier 6 |
Tuesday 9 |
Box-Steffensmeier 7, 9, Blossfeld 10.1 |
Wednesday 10 |
Box-Steffensmeier 10 |
Thursday 11 |
Box-Steffensmeier 10 |
Friday 12 |
None. |
R package will be used. R is freely available at http://cran.r-project.org/. It is not essential to have any experience with the package, but some familiarity is welcome, and a crash course, offered at the summer school just prior to this course, is recommended.
If there are any problems with the installation of R, we will help you when you are in Budapest. On the first day we use half an hour or so to check your installations. In particular, the survival package will have to be downloaded. New versions of R are coming often, and we will let you know in advance if you need to install a new version (usually not needed).
Participants need to bring their own laptop with R installed.
Hosmer D.W., Lemeshow S., May S. Applied Survival Analysis: Regression Modeling of Time to Event Data (Wiley Series in Probability and Statistics). Wiley-Interscience; 2 edition (March 7, 2008)
Basics of Inferential Statistics for Political Scientists (refresher course)
Introduction to R (refresher course)
Linear Algebra and Calculus (refresher course)