I am conducting a descriptive study on patients with liver transplantation. My mentor asks me to do survival analysis. What is survival analysis? And what are the statistical methods to estimate survival time from data?
Thanks Omer, but I have been told that there are statistical methods to estimate the survival data, would you please describe them in simple words and when I should use each test?
Survival analysis can be statistically treated in some different forms, from nonparametric to parametric approaches. The Kaplan-Meyer method is the most common and simple, but other parametric or semi-parametric approaches will be useful if you want to explore the relationships between variables and time-to-event data.
Try to read the tutorials bellow, both on R. Even if you do not use R, read them! =D
If you have a variable that changes over time (e.g. prothrombin index) and you measure each patient several times, you can use the joint analysis of longitudinal data and survival.
Some references:
A book:
Rizopoulos, D. (2012). Joint Models for Longitudinal and Time-to-Event Data With Applications in R. Chapman and Hall/CRC.
HI Fahmi, excellent question. Survival analysis, also event history analysis or proportional hazard modeling are ways of looking at time to a dichotomous event. Like a lot of other statistical things, you can probably find a lot to look at just by Googling the topic; also try Google Scholar. I haven't checked, but usually there are excellent rescources at UCLA. I believe it is also covered in the very readable text by Judy Singer and John Willett called Applied Longitudinal Data Analsyis (Oxford Univesity Press), which I am sure you can get from Amazon and other sources. You might find it's treatment of other sorts oif longitudinal models useful in your work, too. So the idea is that as time passes more and more individuals pass from one state to the other. In the case of true "survival" they go from being alive to dead. As mentioned, one feature of it is censoring, which means that even by the time you are finished collecting data not everyone has changed. In other words if you are studying survival and your study is five years long, some of the people you were studying might still be alive when you end your study. Without going into the details, this approach solved problems with methods of trying to compute life expectancy, for example, when the data were censored. So it used to be that you had to run a model for each year (or other time period) and total up the individual hazards by yourself, but now a lot of software will run all the statistics for you. In some cases, it may be an extra cost module for a particular piece of software. In other cases it may be a macro or user-devoped routine you can download from the Internet. There is a somewhat complex (because it is multilevel and retrospective) example on tobacco initiation on my profile called : "Estimating multi-level discrete-time hazard models using cross-sectional data: Neighborhood effects on the onset of adolescent cigarette use. " There's another one about puberty called: "Socioeconomic Status, Race, and Girls' Pubertal Maturation: Results From the Project on Human Development in Chicago Neighborhoods. " I am sure you will find some excellent pages with Google to get started. Bob
Not to go into details - the responses before me have already given sources for basic understanding on survival. For liver transplantation - you may like to google/pubmed relevant diseases which examines survival analysis (or Cox survival) for diseases such as kidney transplantation, liver cancer. There has been extensive work done looking at survival with the Cancers : look up the IARC website on publications and statistics on cohort analysis (free pdf). For your project - it is important to understand what is the research question. The survival analysis method is then the process to get answers. Are you looking at predictors for some outcomes? (eg survival after transplant). What is the quality of data you have and how detailed? Will you have biostatistician support for the project - in this case, have a detailed discussion with them. In the analysis - do also look out for post-modelling/estimations testing of your assumptions eg goodness of fit of your model. A very good basic text for multivariate modelling strategies is Hosmer and Lemeshow, The UCLA website using Stata illustrates these principles very well.
Chan raises some excellent points, Fahmi. In your question you mention that you are doing a "descriptive study" yet your mentor apparently wants more than that - specifically, survival analysis. Before proceeding it is important that you know what specific information the mentor would like you to obtain about your sample. This information could be in the form of time-to-event estimates, the raw number whose status (any number of pieces of information come to mind, such as comorbidity) might change, etc. I'd recommend you query your mentor about his/her goal in your conduct of survival analysis. The variety of procedures of this type notwithstanding, knowing what goal is sought with this analysis has paramount importance especially as you persue understanding of which specific methodology/software you will employ.
Survival analysis is a branch of statistics which deals with survival of organisms in relation to a particular treatment or environmental circumstance that can cause mortality.
More generally, survival analysis involves the modeling of time to event data; in this context, mortality is considered an "event" or expected endpoint in such studies.
For a start i'll recommend that u establish a time interval for your study and utilize a probit plot to examine mortality of patients along a specified time scale.
The survival analysis is an important in biology, because survival is meant for survivors. There are so many survival analysis are available. Among them Kaplan -Meier survival analysis. In surgery, the duration of analgesia of a particular anaesthetic drug was compared with the another or adding a additional drug the above analysis is very much useful.
Like Omer said, Survival Analysis typically focuses on time to event data.
In may case, I often use Survival Analysis in the fiels of adherence to therapy. The response variabel is "time ´to descontinuation of therapy". Kaplan -Meier currves are very siimple to calculate and helpe us to determine de median time to event. Also if you want to compare several curves, for example by sex, age groups, etc you can performe Logrank test. Cox-Models are also important to identify variables wish may contribute significantly your time to event in terms of Hazard Ratios. But once should analyse carefully variables that have non proportional hazards and include them in the model with dummy variables. Several statiscal packages can help to perform this analysis.
Also don't forget censored data, usual coded in de models as a 0/1 variable (1 for individuals that didn´t get the event in the time period).
Survival analysis is the study of statistical techniques which deals with time to event data. By time to event data we mean that time untill a specified event, normally called as failure occurs. Here one should not get confused with the term failure. Failure means the event of interest for example in coin tossing experiment failure may be defined as occurence of head.. In your case the survival time may be the the time uptill the patient survives after liver transplantation.
Before estimating the survival time it will be better that you find survial function, hazard(failure) rate, mean residual life, mean survival time. Actually these are some functions which describes the time to event. After that you try to draw hazard curve so that you could know the pattern of failure. usually in these kind of transplantation the survival probability is low at the early stage i.e. the risk is high i.e. failure rate is high but as time passes failure rate becomes low..
To model such kind of data we generally use log normal distribution or log logistic distribution. first you try this later on you will be able to use various estimation procedure like Kaplan Miere estimator which is mainly used for non parametric estimation, Nelson Aaen Estimator , Cox Hazard proportional model etc..but before that one should have good understanding of above mentioned basic terms of survival analysis....
it is time to event data i.e.(time to first AMI or time to death or time to first diagnose or time of recovery from diarrhoea) so your time will be start after transplant till the death of an individual in specified period of time. the statistics here uses are. Kaplan meier estimation, log rank test, hazard function and hazard ratio, and more test for comparing other groups.
The link provides a really nice introduction to Survival Analysis in SAS. You can do it in other programs too. Might check the UCLA website for your program of choice.
Survival analysis is generally described as a set of methods employed for analyzing data where the outcome variable is the time until the occurrence of an event of interest. The event cab be death, occurrence of disease, marriage, divorce, etc. In survival analysis, subjects are usually followed over a specific time period, and the focus is on the time (t) at which the event of interest occurs. Various methods used in survival analysis includes: The Kaplan- Meier method, Log-Rank test, Life table method, Cox Model etc. The link bellow may provide you with further insight on the scope of the discuss...hope this helps?
Survival data analysis designed when you have an interest in time to an event. Basically the should be censoring or unobserved observations, that is it's peculiar characteristics otherwise you can use other common regression methods.
Oumer has explained it well, so I advise you, you should know the variable of interest, you have said that it is patients with liver transplantation, so what are you going to see with these patients who have received this services, do you want to study the recurrent time, time to death after the liver transplantation, time to developing infection, or what? you see this is the main question. The other is, you are looking for the descriptive statistics, that is fine and is the simplest statistics you are advised to do. So, you can follow Oumer's advise. In general, the methods of analysis depend on the data, it could be non-parametric, semi-parametric or parametric depending on the +- of distribution. But in your case, since you need to work the descriptive part, you go for the non-parametric methods.