In this vignette, we demonstrate how to use the simu function in rocTree package to generate simulated data from various scenarios.

Notations & Simulation Settings

Let \(Z\) be a \(p\)-dimensional vector of possible time-dependent covariate and \(\beta\) be the vector of regression coefficient. The function simu generates survival times (\(T\)) under the following scenarios:

Time-independent covariate

Scenario 1.1, proportional hazards model:

Survival times are generated from the hazard function \[\lambda(t|Z) = \lambda_0(t)\exp\{-0.5Z_1 + 0.5Z_2 - 0.5Z_3 + \ldots + 0.5Z_{10}\},\] with \(\lambda_0(t)=2t\).

Scenario 1.2, proportional hazards model with noise variable:

Survival times are generated from the hazard function \[\lambda(t|Z) = \lambda_0(t)\exp\{2Z_1 + 2Z_2 + 0\cdot Z_3 + 0\cdot Z_4 + \ldots + 0\cdot Z_{10}\},\] with \(\lambda_0(t)=2t\).

Scenario 1.3, proportional hazards model with nonlinear covariate effects:

Survival times are generated from the hazard function \[\lambda(t|Z) = \lambda_0(t) \exp\{2\sin(2\pi Z_1) + 2 |Z_2 - 0.5|\}, \] with \(\lambda_0(t)=2t\).

Scenario 1.4, accelerated failure time model:

Survival times are generated from \[\log(T) = -2 + 2Z_1 + 2Z_2 + \epsilon, \] where \(\epsilon\sim\mbox{N}(0, 0.5^2)\).

Scenario 1.5, generalized gamma family:

Survival times are generated from \[T = e^{\sigma\omega}, \] where \(\omega = \log(Q^2g)/Q\), \(g\) follows gamma\((Q^{-2}, 1)\), \(\sigma = 2Z_1\), \(Q=2Z_2\).

Time-dependent covariate

Scenario 2.1, dichotomous time dependent covariate with at most one change in value:

Survival times are generated from the hazard function \[\lambda(t|Z(t)) = e^{2Z_1(t) + 2Z_2}, \] where \(Z_1(t) = \theta I(t\ge U_0) + (1 - \theta)I(t<U_0)\), \(\theta\) is a Bernoulli variable with equal probability, and \(U_0\) follows a uniform over \([0,1]\).

Scenario 2.2, dichotomous time dependent covariate with multiple jumps:

Survival times are generated from the hazard function \[\lambda(t|Z(t)) = e^{2Z_1(t) + 2Z_2}, \] where \(Z_1(t) = \theta\left[I(U_1 \le t < U_2) + I(U_3\le t)\right] + (1 - \theta)\left[I(t < U_1) + I(U_2\le t < U_3)\right]\), \(\theta\) is a Bernoulli variable with equal probability and \(U_1\le U_2\le U_3\) are the first three terms of a stationary Poisson process with rate 10.

Scenario 2.3, proportional hazard model with a continuous time dependent covariate:

Survival times are generated from the hazard function \[\lambda(t|Z(t)) = 0.1 e^{Z_1(t) + Z_2}, \] where \(Z_1(t)=kt+b\), \(k\) and \(b\) are independent uniform random variables over \([1, 2]\).

Scenario 2.4, non-proportional hazard model with a continuous time dependent covariate:

Survival times are generated from the hazard function \[\lambda(t|Z(t)) = 0.1 \cdot\left[1 + \sin\{Z_1(t) + Z_2\}\right],\] where \(Z_1(t)=kt+b\), \(k\) and \(b\) are independent uniform random variables over \([1, 2]\).

Scenario 2.5, non-proportional hazard model with a nonlinear time dependent covariate:

Survival times are generated from the hazard function \[\lambda(t|Z(t)) = 0.1 \cdot\left[1 + \sin\{Z_1(t) + Z_2\}\right],\] where \(Z_1(t)=2kt\cdot\{I(t>5) - 1\}\), \(k\) and \(b\) are independent uniform random variables over \([1, 2]\).

The simu function

The simu function can be used to generate survival times from the above scenarios. The complete list of arguments in simu are as follow:

> library(rocTree)
> args(simu)
function (n, cen, scenario, summary = FALSE) 
NULL

The arguments are as follows

  • n an integer value indicating the number of subjects.
  • cen is a numeric value indicating the censoring percentage; three levels, 0%, 25%, 50%, are allowed.
  • scenario can be either a numeric value or a character sting. This indicates the simulation scenario noted above.
  • summary a logical value indicating whether a brief data summary will be printed.

The simu places the simulated data in a tibble environment with the columns:

  • id is the subject id.
  • Time is the observed follow-up time.
  • death is the death indicator; death = 1 if an event (death) occurs and death = 0 if censored.
  • z1z10 are the possible time-dependent covariate.
  • k, b, U are the latent variables used to generate \(Z_1(t)\) in Scenario 2.1 – 2.5.

Example 1

We first generate a small dataset with n = 5, 25% censoring rate, under scenario 1.2.

> set.seed(2019)
> dat1 <- simu(n = 5, cen = 0.25, sce = 1.2, summary = TRUE)
Summary results:
Number of subjects: 5
Number of subjects experienced death: 4
Number of covariates: 10
Time independent covaraites: z1 z2 z3 z4 z5 z6 z7 z8 z9 z10
Number of unique observation times: 5
Median survival time: 0.4231
> dat1
   id      Time death         z1          z2        z3         z4
1   1 0.0931342     0 0.76990155 0.043218804 0.7698180 0.63545297
2   1 0.1464104     0 0.76990155 0.043218804 0.7698180 0.63545297
3   1 0.3397603     0 0.76990155 0.043218804 0.7698180 0.63545297
4   1 0.4231000     1 0.76990155 0.043218804 0.7698180 0.63545297
5   2 0.0931342     0 0.71283973 0.820176206 0.6605425 0.06812013
6   2 0.1464104     0 0.71283973 0.820176206 0.6605425 0.06812013
7   2 0.3397603     1 0.71283973 0.820176206 0.6605425 0.06812013
8   3 0.0931342     0 0.30336020 0.009614496 0.2169243 0.70031486
9   3 0.1464104     0 0.30336020 0.009614496 0.2169243 0.70031486
10  3 0.3397603     0 0.30336020 0.009614496 0.2169243 0.70031486
11  3 0.4231000     0 0.30336020 0.009614496 0.2169243 0.70031486
12  3 1.2479563     1 0.30336020 0.009614496 0.2169243 0.70031486
13  4 0.0931342     0 0.61823636 0.102491504 0.1950175 0.37479527
14  4 0.1464104     0 0.61823636 0.102491504 0.1950175 0.37479527
15  5 0.0931342     1 0.05048374 0.608572199 0.6947276 0.46909425
          z5         z6          z7         z8        z9       z10
1  0.4059750 0.01103272 0.644595051 0.08595205 0.6123926 0.4948702
2  0.4059750 0.01103272 0.644595051 0.08595205 0.6123926 0.4948702
3  0.4059750 0.01103272 0.644595051 0.08595205 0.6123926 0.4948702
4  0.4059750 0.01103272 0.644595051 0.08595205 0.6123926 0.4948702
5  0.5814981 0.59250859 0.009253307 0.37178613 0.2572539 0.9159192
6  0.5814981 0.59250859 0.009253307 0.37178613 0.2572539 0.9159192
7  0.5814981 0.59250859 0.009253307 0.37178613 0.2572539 0.9159192
8  0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
9  0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
10 0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
11 0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
12 0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
13 0.1710192 0.62405627 0.170301794 0.01181090 0.6711542 0.2094706
14 0.1710192 0.62405627 0.170301794 0.01181090 0.6711542 0.2094706
15 0.1318687 0.45561955 0.038503417 0.78263931 0.8071130 0.1992966
> class(dat1)
[1] "data.frame"

In this scenario, the covariate information was observed at Time = 0.0931, 0.146, 0.340, and 0.423 for subject #1, who died (death = 1) at Time = 0.423. Since the covariate are time-independent, its values is invariant to time.

Example 2

The following codes generate a small dataset with n = 5, 50% censoring rate, under scenario 2.1.

> set.seed(2019)
> dat2 <- simu(n = 5, cen = 0.5, sce = 2.1, summary = TRUE)
Summary results:
Number of subjects: 5
Number of subjects experienced death: 1
Number of covariates: 2
Time independent covaraites: z1.
Time dependent covaraites: z2.
Number of unique observation times: 5
Median survival time: NA
> dat2
   id        Time death z1         z2 e          u
1   1 0.008826172     0  0 0.76990155 0 0.10792722
2   1 0.101672586     1  0 0.76990155 0 0.10792722
3   2 0.008826172     0  1 0.71283973 1 0.06421699
4   2 0.101672586     0  0 0.71283973 1 0.06421699
5   2 0.105494961     0  0 0.71283973 1 0.06421699
6   2 0.136815371     0  0 0.71283973 1 0.06421699
7   3 0.008826172     0  0 0.30336020 0 0.30429404
8   3 0.101672586     0  0 0.30336020 0 0.30429404
9   3 0.105494961     0  0 0.30336020 0 0.30429404
10  4 0.008826172     0  0 0.61823636 0 0.05418119
11  5 0.008826172     0  1 0.05048374 1 0.43387271
12  5 0.101672586     0  1 0.05048374 1 0.43387271
13  5 0.105494961     0  1 0.05048374 1 0.43387271
14  5 0.136815371     0  1 0.05048374 1 0.43387271
15  5 0.474006872     0  0 0.05048374 1 0.43387271

In this scenario, the covariate information was observed at Time = 0.00883 and 0.102 for subject #1, who died (death = 1) at Time = 0.102. Similarly, the covariate information was observed at Time = 0.00883, 0.102, 0.105, and 0.137 for subject #2, who was censored (death = 0) at Time 0.137. Moreover, z1 is a time-dependent covariate and its value changed from 1 (at Time = 0.00883) to 0 ( at Time \(\ge\) 0.102) for subject #2.