In this vignette, we demonstrate how to use the simGSC() function in reReg package to simulate recurrent event data from a general scale-change model. Since the scale-change model includes the Cox-type model and the accelerated mean model as special cases, simGSC() can also be used to generate data from these sub-models. The simGSC() function allows the censoring time to be non-informative (independent given covariate) or informative about the recurrent event process.

Notations

Let \(N(t)\) be the number of recurrent events occurring over the interval \([0, t]\) and \(D\) be the failure time of interest subjects to right censoring by \(C\). Define the composite censoring time \(Y = \min(D, C, \tau)\) and the failure event indicator \(\Delta = I\{D\le \min(C, \tau)\}\), where \(\tau\) is the maximum follow-up time. We assume the recurrent event process \(N(\cdot)\) is observed up to \(Y\). Let \(X\) be a \(p\)-dimensional covariate vector. Consider a random sample of \(n\) subjects, the observed data are independent and identically distributed (iid) copies of \(\{Y_i, \Delta_i, X_i, N_i(t), 0\le t\le Y_i\}\), where the subscript \(i\) denotes the index of a subject for \(i= 1, \ldots, n\). Let \(m_i = N_i(Y_i)\) be the number of recurrent events the \(i\)th subject experienced before time \(Y_i\), then the jump times of \(N_i(t)\) give the observed recurrent event times \(t_{i1}, \ldots, t_{im_i}\) when \(m_i > 0\). Thus, the observed data can also be expressed as iid copies of \(\{Y_i, \Delta_i, X_i, m_i, (t_{i1}, \ldots, t_{m_i})\}\). The primary interests in recurrent event data analysis often lie in making inference about the recurrent event process and the failure event and understanding the corresponding covariate effects. The built-in dataset, simDat, is a simulated example of a recurrent event data that is generated from the simGSC() function:

> library(reReg)
> packageVersion("reReg")
[1] '1.4.5'
> data(simDat)

The simGSC function

The function simGSC generates the recurrent times from the joint model \[\begin{equation} \begin{cases} \lambda(t) = \displaystyle Z\lambda_0(te^{X^\top\alpha})e^{X^\top\beta},&\\ h(t) = \displaystyle Zh_0(te^{X^\top\eta})e^{X^\top\theta}, &t\in[0, \tau], \end{cases} \end{equation}\] where \(\lambda_0(t)\) is the baseline rate function for the recurrent event process, \(h_0(t)\) is the baseline hazard function for the failure time, \(Z\) is a non-negative subject-specific latent frailty variable. The \(p\)-dimensional regression coefficients \((\alpha, \eta)\) and \((\beta, \theta)\) correspond to the shape and the size parameters, respectively. The arguments in simGSC are as follow

> args(simGSC)
function (n, summary = FALSE, para, xmat, censoring, frailty, 
    tau, origin, Lam0, Haz0) 
NULL

The arguments are as follows

  • n is the number of individual
  • summary is a logical value indicating whether a brief data summary will be printed.
  • para is a list of regression parameters. The list members alpha, beta, eta, and theta correspond to \(\alpha\), \(\beta\), \(\eta\), and \(\theta\) in (1), respectively.
  • xmat is the \(n\times p\) design matrix.
  • censoring is a \(n\)-dimensional numerical vector to specify the independent censoring time, \(C\).
  • frailty is a \(n\)-dimensional numerical vector to specify the frailty variable, \(Z\).
  • tau is the maximum follow-up time, \(\tau\).
  • origin is the time origin for each subject.
  • Lam0 is a single argument function used to specify the baseline cumulative rate function, \(\Lambda_0(t)\).
  • Haz0 is a single argument function used to specify the baseline cumulative hazard function, \(H_0(t)\).

At the default setting, the simGSC() function assumes \(p = 2\) and the regression parameters to be \(\alpha = \eta=(0, 0)^\top\), \(\beta = (-1, -1)^\top\), and \(\theta = (1, 1)^\top\). When the xmat and the censoring arguments are not specified, the simGSC() function assumes \(X_i\) is a two-dimensional vector \(X_i = (X_{i1}, X_{i2}), i = 1, \ldots, n\), where \(X_{i1}\) is a Bernoulli variable with rate 0.5 and \(X_{i2}\) is a standard normal variable. With the default xmat, the censoring time \(C\) is generated from an independent uniform distribution in \([0, 2\tau X_{i1} + 2Z^2\tau(1 - X_{i1})]\). Thus, the censoring distribution is covariate dependent and is informative when \(Z\) is not a constant. On the other hand, when xmat is specified by the users, the censoring time \(C\) is generated from an independent uniform distribution \([0, 2 \tau]\). When the frailty argument is not specified, the frailty variable \(Z\) is generated from a gamma distribution with a unit mean and a variance of 0.25. The default values for tau and origin are 60 and 0, respectively. When arguments Lam0 and Haz0 are left unspecified, the simGSC() function uses \(\Lambda_0(t) = 2\log(1 + t)\) and \(H_0(t) = \log(1 + t) / 5\), respectively. This is equivalent to setting Lam0 = function(x) 2 * log(1 + x) and Haz0 = function(x) log(1 + x) / 5. In summary, the default specifications generate the recurrent events and the terminal events from the model: \[\begin{cases} \lambda(t) = \displaystyle \frac{2Z}{1 + te^{-X_{i1} - X_{i2}}}, &\\ h(t) = \displaystyle \frac{Z}{5(1 + te^{X_{i1} + X_{i2}})}, & t\in[0, 60]. \end{cases}\]

Examples

simDat

The simGSC() function generates simulated data from the above specification and returns a data.frame in the same format as the built-in data set, simDat. Specifically, simDat was generated using the default settings of simGSC().

> data(simDat)
> set.seed(0); dat <- simGSC(200, summary = TRUE)
Call: 
simGSC(n = 200, summary = TRUE)

Summary:
Sample size:                                    200 
Number of recurrent event observed:             667 
Average number of recurrent event per subject:  3.335 
Proportion of subjects with a terminal event:   0.595 
Median time-to-terminal event:                  7.378 
> identical(dat, simDat)
[1] FALSE

Simulating from general scale-change models

The simDat is an example that simulates recurrent event data when both the rate function and the hazard function are Cox-type model.

> set.seed(0); datCox <- simGSC(200, summary = TRUE)
Call: 
simGSC(n = 200, summary = TRUE)

Summary:
Sample size:                                    200 
Number of recurrent event observed:             667 
Average number of recurrent event per subject:  3.335 
Proportion of subjects with a terminal event:   0.595 
Median time-to-terminal event:                  7.378 

The hypothesis test proposed by Xu et al. (2020) shows a strong evidence that the rate function has a Cox structure.

> fit <- reReg(Recur(t.start %to% t.stop, id, event, status) ~ x1 + x2, 
+              model = "gsc", B = 200, se = "sand", data = datCox)
> summary(fit, test = TRUE)
Call: 
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~ 
    x1 + x2, data = datCox, model = "gsc", B = 200, se = "sand")

Recurrent event process (shape):
   Estimate   StdErr z.value p.value
x1 0.306169 0.344301  0.8892  0.3739
x2 0.072077 0.286502  0.2516  0.8014

Recurrent event process (size):
   Estimate   StdErr z.value   p.value    
x1 -0.59885  0.22498 -2.6618  0.007773 ** 
x2 -1.11749  0.27864 -4.0106 6.057e-05 ***

Hypothesis tests:
Ho: shape = 0 (Cox-type model):
     X-squared = 0.793, df = 2, p-value = 0.6727
Ho: size = 0 (Accelerated rate model):
     X-squared = 18.5657, df = 2, p-value = 1e-04
Ho: shape = size (Accelerated mean model):
     X-squared = 24.8642, df = 2, p-value = 0

Similarly, the hypothesis test shows a strong evidence that the rate function has an accelerated mean or accelerated rate structure when the data is simulated under these.

When rate has an accelerated mean structure:

> set.seed(0); datAM <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(-1, -1)))
> summary(update(fit, data = datAM), test = TRUE)
Call: 
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~ 
    x1 + x2, data = datAM, model = "gsc", B = 200, se = "sand")

Recurrent event process (shape):
   Estimate   StdErr z.value  p.value   
x1 -1.88457  0.67925 -2.7745 0.005529 **
x2 -1.77292  0.56818 -3.1204 0.001806 **

Recurrent event process (size):
   Estimate   StdErr z.value   p.value    
x1 -1.68454  0.39280 -4.2886 1.798e-05 ***
x2 -1.09378  0.31676 -3.4530 0.0005543 ***

Hypothesis tests:
Ho: shape = 0 (Cox-type model):
     X-squared = 11.0149, df = 2, p-value = 0.0041
Ho: size = 0 (Accelerated rate model):
     X-squared = 32.2029, df = 2, p-value = 0
Ho: shape = size (Accelerated mean model):
     X-squared = 1.1139, df = 2, p-value = 0.573

When rate has an accelerated rate structure:

> set.seed(0); datAR <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(0, 0)))
> summary(update(fit, data = datAR), test = TRUE)
Call: 
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~ 
    x1 + x2, data = datAR, model = "gsc", B = 200, se = "sand")

Recurrent event process (shape):
   Estimate   StdErr z.value   p.value    
x1 -1.49782  0.44370 -3.3757 0.0007362 ***
x2 -0.92654  0.37955 -2.4411 0.0146407 *  

Recurrent event process (size):
   Estimate   StdErr z.value p.value  
x1 -0.49245  0.19487 -2.5271  0.0115 *
x2 -0.23990  0.16629 -1.4427  0.1491  

Hypothesis tests:
Ho: shape = 0 (Cox-type model):
     X-squared = 11.7142, df = 2, p-value = 0.0029
Ho: size = 0 (Accelerated rate model):
     X-squared = 7.8841, df = 2, p-value = 0.0194
Ho: shape = size (Accelerated mean model):
     X-squared = 8.5924, df = 2, p-value = 0.0136

Reference

Xu, Gongjun, Sy Han Chiou, Jun Yan, Kieren Marr, and Chiung-Yu Huang. 2020. “Generalized Scale-Change Models for Recurrent Event Processes Under Informative Censoring.” Statistica Sinica 30: 1773–95.