R 简明教程

R - Survival Analysis

生存分析用于预测特定事件发生的的时间。它也称为失效时间分析或死亡时间分析。例如,预测患有癌症的人的存活天数或预测机械系统何时失效。

Survival analysis deals with predicting the time when a specific event is going to occur. It is also known as failure time analysis or analysis of time to death. For example predicting the number of days a person with cancer will survive or predicting the time when a mechanical system is going to fail.

名为 survival 的 R 包用于执行生存分析。此包包含函数 Surv() ,该函数将输入数据作为 R 公式,并在选定的变量中创建生存对象以进行分析。然后,我们使用函数 survfit() 为分析创建绘图。

The R package named survival is used to carry out survival analysis. This package contains the function Surv() which takes the input data as a R formula and creates a survival object among the chosen variables for analysis. Then we use the function survfit() to create a plot for the analysis.

Install Package

install.packages("survival")

Syntax

在 R 中创建生存分析的基本语法是 −

The basic syntax for creating survival analysis in R is −

Surv(time,event)
survfit(formula)

以下是所用参数的描述 -

Following is the description of the parameters used −

  1. time is the follow up time until the event occurs.

  2. event indicates the status of occurrence of the expected event.

  3. formula is the relationship between the predictor variables.

Example

我们将在上述安装的生存包中考虑名为“pbc”的数据集。它描述了患有原发性胆汁性肝硬化 (PBC) 的人的生存数据点。在数据集中存在的众多列中,我们主要关注字段“时间”和“状态”。时间表示患者登记到患者接受肝移植或死亡之间发生的事件的时间,以较早者为准。

We will consider the data set named "pbc" present in the survival packages installed above. It describes the survival data points about people affected with primary biliary cirrhosis (PBC) of the liver. Among the many columns present in the data set we are primarily concerned with the fields "time" and "status". Time represents the number of days between registration of the patient and earlier of the event between the patient receiving a liver transplant or death of the patient.

# Load the library.
library("survival")

# Print first few rows.
print(head(pbc))

当我们执行上述代码时,它会产生以下结果和图表:

When we execute the above code, it produces the following result and chart −

  id time status trt      age sex ascites hepato spiders edema bili chol
1  1  400      2   1 58.76523   f       1      1       1   1.0 14.5  261
2  2 4500      0   1 56.44627   f       0      1       1   0.0  1.1  302
3  3 1012      2   1 70.07255   m       0      0       0   0.5  1.4  176
4  4 1925      2   1 54.74059   f       0      1       1   0.5  1.8  244
5  5 1504      1   2 38.10541   f       0      1       1   0.0  3.4  279
6  6 2503      2   2 66.25873   f       0      1       0   0.0  0.8  248
  albumin copper alk.phos    ast trig platelet protime stage
1    2.60    156   1718.0 137.95  172      190    12.2     4
2    4.14     54   7394.8 113.52   88      221    10.6     3
3    3.48    210    516.0  96.10   55      151    12.0     4
4    2.54     64   6121.8  60.63   92      183    10.3     4
5    3.53    143    671.0 113.15   72      136    10.9     3
6    3.98     50    944.0  93.00   63       NA    11.0     3

从以上数据中,我们考虑时间和状态来进行分析。

From the above data we are considering time and status for our analysis.

Applying Surv() and survfit() Function

现在,我们开始将 Surv() 函数应用到以上数据集,并创建一个用于显示趋势的绘图。

Now we proceed to apply the Surv() function to the above data set and create a plot that will show the trend.

# Load the library.
library("survival")

# Create the survival object.
survfit(Surv(pbc$time,pbc$status == 2)~1)

# Give the chart file a name.
png(file = "survival.png")

# Plot the graph.
plot(survfit(Surv(pbc$time,pbc$status == 2)~1))

# Save the file.
dev.off()

当我们执行上述代码时,它会产生以下结果和图表:

When we execute the above code, it produces the following result and chart −

Call: survfit(formula = Surv(pbc$time, pbc$status == 2) ~ 1)

      n  events  median 0.95LCL 0.95UCL
    418     161    3395    3090    3853
survival

上图中的趋势帮助我们预测在一定数量的日期结束时存活的概率。

The trend in the above graph helps us predicting the probability of survival at the end of a certain number of days.