Simulating recurrent event data

In this vignette, we demonstrate how to use the simGSC() function in reReg package to simulate recurrent event data from a general scale-change model. Since the scale-change model includes the Cox-type model and the accelerated mean model as special cases, simGSC() can also be used to generate data from these sub-models. The simGSC() function allows the censoring time to be non-informative (independent given covariate) or informative about the recurrent event process.

Notations

Let \(N(t)\) be the number of recurrent events occurring over the interval \([0, t]\) and \(D\) be the failure time of interest subjects to right censoring by \(C\). Define the composite censoring time \(Y = \min(D, C, \tau)\) and the failure event indicator \(\Delta = I\{D\le \min(C, \tau)\}\), where \(\tau\) is the maximum follow-up time. We assume the recurrent event process \(N(\cdot)\) is observed up to \(Y\). Let \(X\) be a \(p\)-dimensional covariate vector. Consider a random sample of \(n\) subjects, the observed data are independent and identically distributed (iid) copies of \(\{Y_i, \Delta_i, X_i, N_i(t), 0\le t\le Y_i\}\), where the subscript \(i\) denotes the index of a subject for \(i= 1, \ldots, n\). Let \(m_i = N_i(Y_i)\) be the number of recurrent events the \(i\)th subject experienced before time \(Y_i\), then the jump times of \(N_i(t)\) give the observed recurrent event times \(t_{i1}, \ldots, t_{im_i}\) when \(m_i > 0\). Thus, the observed data can also be expressed as iid copies of \(\{Y_i, \Delta_i, X_i, m_i, (t_{i1}, \ldots, t_{m_i})\}\). The primary interests in recurrent event data analysis often lie in making inference about the recurrent event process and the failure event and understanding the corresponding covariate effects. The built-in dataset, simDat, is a simulated example of a recurrent event data that is generated from the simGSC() function:

> library(reReg)
> packageVersion("reReg")

[1] '1.4.6'

> data(simDat)

The `simGSC` function

The function simGSC generates the recurrent times from the joint model \[\begin{equation} \begin{cases} \lambda(t) = \displaystyle Z\lambda_0(te^{X^\top\alpha})e^{X^\top\beta},&\\ h(t) = \displaystyle Zh_0(te^{X^\top\eta})e^{X^\top\theta}, &t\in[0, \tau], \end{cases} \end{equation}\] where \(\lambda_0(t)\) is the baseline rate function for the recurrent event process, \(h_0(t)\) is the baseline hazard function for the failure time, \(Z\) is a non-negative subject-specific latent frailty variable. The \(p\)-dimensional regression coefficients \((\alpha, \eta)\) and \((\beta, \theta)\) correspond to the shape and the size parameters, respectively. The arguments in simGSC are as follow

> args(simGSC)

function (n, summary = FALSE, para, xmat, censoring, frailty, 
    tau, origin, Lam0, Haz0) 
NULL

The arguments are as follows

n is the number of individual
summary is a logical value indicating whether a brief data summary will be printed.
para is a list of regression parameters. The list members alpha, beta, eta, and theta correspond to \(\alpha\), \(\beta\), \(\eta\), and \(\theta\) in (1), respectively.
xmat is the \(n\times p\) design matrix.
censoring is a \(n\)-dimensional numerical vector to specify the independent censoring time, \(C\).
frailty is a \(n\)-dimensional numerical vector to specify the frailty variable, \(Z\).
tau is the maximum follow-up time, \(\tau\).
origin is the time origin for each subject.
Lam0 is a single argument function used to specify the baseline cumulative rate function, \(\Lambda_0(t)\).
Haz0 is a single argument function used to specify the baseline cumulative hazard function, \(H_0(t)\).

At the default setting, the simGSC() function assumes \(p = 2\) and the regression parameters to be \(\alpha = \eta=(0, 0)^\top\), \(\beta = (-1, -1)^\top\), and \(\theta = (1, 1)^\top\). When the xmat and the censoring arguments are not specified, the simGSC() function assumes \(X_i\) is a two-dimensional vector \(X_i = (X_{i1}, X_{i2}), i = 1, \ldots, n\), where \(X_{i1}\) is a Bernoulli variable with rate 0.5 and \(X_{i2}\) is a standard normal variable. With the default xmat, the censoring time \(C\) is generated from an independent uniform distribution in \([0, 2\tau X_{i1} + 2Z^2\tau(1 - X_{i1})]\). Thus, the censoring distribution is covariate dependent and is informative when \(Z\) is not a constant. On the other hand, when xmat is specified by the users, the censoring time \(C\) is generated from an independent uniform distribution \([0, 2 \tau]\). When the frailty argument is not specified, the frailty variable \(Z\) is generated from a gamma distribution with a unit mean and a variance of 0.25. The default values for tau and origin are 60 and 0, respectively. When arguments Lam0 and Haz0 are left unspecified, the simGSC() function uses \(\Lambda_0(t) = 2\log(1 + t)\) and \(H_0(t) = \log(1 + t) / 5\), respectively. This is equivalent to setting Lam0 = function(x) 2 * log(1 + x) and Haz0 = function(x) log(1 + x) / 5. In summary, the default specifications generate the recurrent events and the terminal events from the model: \[\begin{cases} \lambda(t) = \displaystyle \frac{2Z}{1 + te^{-X_{i1} - X_{i2}}}, &\\ h(t) = \displaystyle \frac{Z}{5(1 + te^{X_{i1} + X_{i2}})}, & t\in[0, 60]. \end{cases}\]

Examples

`simDat`

The simGSC() function generates simulated data from the above specification and returns a data.frame in the same format as the built-in data set, simDat. Specifically, simDat was generated using the default settings of simGSC().

> data(simDat)
> set.seed(0); dat <- simGSC(200, summary = TRUE)

Call: 
simGSC(n = 200, summary = TRUE)

Summary:
Sample size:                                    200 
Number of recurrent event observed:             667 
Average number of recurrent event per subject:  3.335 
Proportion of subjects with a terminal event:   0.595 
Median time-to-terminal event:                  7.378

> identical(dat, simDat)

[1] FALSE

Simulating from general scale-change models

The simDat is an example that simulates recurrent event data when both the rate function and the hazard function are Cox-type model.

> set.seed(0); datCox <- simGSC(200, summary = TRUE)

Call: 
simGSC(n = 200, summary = TRUE)

Summary:
Sample size:                                    200 
Number of recurrent event observed:             667 
Average number of recurrent event per subject:  3.335 
Proportion of subjects with a terminal event:   0.595 
Median time-to-terminal event:                  7.378

The hypothesis test proposed by Xu et al. (2020) shows a strong evidence that the rate function has a Cox structure.

> fit <- reReg(Recur(t.start %to% t.stop, id, event, status) ~ x1 + x2, 
+              model = "gsc", B = 200, se = "sand", data = datCox)
> summary(fit, test = TRUE)

Call: 
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~ 
    x1 + x2, data = datCox, model = "gsc", B = 200, se = "sand")

Recurrent event process (shape):
   Estimate   StdErr z.value p.value
x1 0.306169 0.344301  0.8892  0.3739
x2 0.072077 0.286502  0.2516  0.8014

Recurrent event process (size):
   Estimate   StdErr z.value   p.value    
x1 -0.59885  0.22498 -2.6618  0.007773 ** 
x2 -1.11749  0.27864 -4.0106 6.057e-05 ***

Hypothesis tests:
Ho: shape = 0 (Cox-type model):
     X-squared = 0.793, df = 2, p-value = 0.6727
Ho: size = 0 (Accelerated rate model):
     X-squared = 18.5657, df = 2, p-value = 1e-04
Ho: shape = size (Accelerated mean model):
     X-squared = 24.8642, df = 2, p-value = 0

Similarly, the hypothesis test shows a strong evidence that the rate function has an accelerated mean or accelerated rate structure when the data is simulated under these.

When rate has an accelerated mean structure:

> set.seed(0); datAM <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(-1, -1)))
> summary(update(fit, data = datAM), test = TRUE)

Call: 
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~ 
    x1 + x2, data = datAM, model = "gsc", B = 200, se = "sand")

Recurrent event process (shape):
   Estimate   StdErr z.value  p.value   
x1 -1.88457  0.67925 -2.7745 0.005529 **
x2 -1.77292  0.56818 -3.1204 0.001806 **

Recurrent event process (size):
   Estimate   StdErr z.value   p.value    
x1 -1.68454  0.39280 -4.2886 1.798e-05 ***
x2 -1.09378  0.31676 -3.4530 0.0005543 ***

Hypothesis tests:
Ho: shape = 0 (Cox-type model):
     X-squared = 11.0149, df = 2, p-value = 0.0041
Ho: size = 0 (Accelerated rate model):
     X-squared = 32.2029, df = 2, p-value = 0
Ho: shape = size (Accelerated mean model):
     X-squared = 1.1139, df = 2, p-value = 0.573

When rate has an accelerated rate structure:

> set.seed(0); datAR <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(0, 0)))
> summary(update(fit, data = datAR), test = TRUE)

Call: 
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~ 
    x1 + x2, data = datAR, model = "gsc", B = 200, se = "sand")

Recurrent event process (shape):
   Estimate   StdErr z.value   p.value    
x1 -1.49782  0.44370 -3.3757 0.0007362 ***
x2 -0.92654  0.37955 -2.4411 0.0146407 *  

Recurrent event process (size):
   Estimate   StdErr z.value p.value  
x1 -0.49245  0.19487 -2.5271  0.0115 *
x2 -0.23990  0.16629 -1.4427  0.1491  

Hypothesis tests:
Ho: shape = 0 (Cox-type model):
     X-squared = 11.7142, df = 2, p-value = 0.0029
Ho: size = 0 (Accelerated rate model):
     X-squared = 7.8841, df = 2, p-value = 0.0194
Ho: shape = size (Accelerated mean model):
     X-squared = 8.5924, df = 2, p-value = 0.0136

Reference

Xu, Gongjun, Sy Han Chiou, Jun Yan, Kieren Marr, and Chiung-Yu Huang. 2020. “Generalized Scale-Change Models for Recurrent Event Processes Under Informative Censoring.” Statistica Sinica 30: 1773–95.

Sy Han (Steven) Chiou

2024-05-20

Notations

The `simGSC` function

Examples

`simDat`

Simulating from general scale-change models

Reference

Simulating recurrent event data

Sy Han (Steven) Chiou

2024-05-20

Notations

The simGSC function

Examples

simDat

Simulating from general scale-change models

Reference

The `simGSC` function

`simDat`