In this vignette, we demonstrate how to use the simGSC() function in reReg package to simulate recurrent event data from a general scale-change model. Since the scale-change model includes the Cox-type model and the accelerated mean model as special cases, simGSC() can also be used to generate data from these sub-models. The simGSC() function allows the censoring time to be non-informative (independent given covariate) or informative about the recurrent event process.

## Notations

Let $$N(t)$$ be the number of recurrent events occurring over the interval $$[0, t]$$ and $$D$$ be the failure time of interest subjects to right censoring by $$C$$. Define the composite censoring time $$Y = \min(D, C, \tau)$$ and the failure event indicator $$\Delta = I\{D\le \min(C, \tau)\}$$, where $$\tau$$ is the maximum follow-up time. We assume the recurrent event process $$N(\cdot)$$ is observed up to $$Y$$. Let $$X$$ be a $$p$$-dimensional covariate vector. Consider a random sample of $$n$$ subjects, the observed data are independent and identically distributed (iid) copies of $$\{Y_i, \Delta_i, X_i, N_i(t), 0\le t\le Y_i\}$$, where the subscript $$i$$ denotes the index of a subject for $$i= 1, \ldots, n$$. Let $$m_i = N_i(Y_i)$$ be the number of recurrent events the $$i$$th subject experienced before time $$Y_i$$, then the jump times of $$N_i(t)$$ give the observed recurrent event times $$t_{i1}, \ldots, t_{im_i}$$ when $$m_i > 0$$. Thus, the observed data can also be expressed as iid copies of $$\{Y_i, \Delta_i, X_i, m_i, (t_{i1}, \ldots, t_{m_i})\}$$. The primary interests in recurrent event data analysis often lie in making inference about the recurrent event process and the failure event and understanding the corresponding covariate effects. The built-in dataset, simDat, is a simulated example of a recurrent event data that is generated from the simGSC() function:

> library(reReg)
> packageVersion("reReg")
[1] '1.4.6'
> data(simDat)

## The simGSC function

The function simGSC generates the recurrent times from the joint model $$$\begin{cases} \lambda(t) = \displaystyle Z\lambda_0(te^{X^\top\alpha})e^{X^\top\beta},&\\ h(t) = \displaystyle Zh_0(te^{X^\top\eta})e^{X^\top\theta}, &t\in[0, \tau], \end{cases}$$$ where $$\lambda_0(t)$$ is the baseline rate function for the recurrent event process, $$h_0(t)$$ is the baseline hazard function for the failure time, $$Z$$ is a non-negative subject-specific latent frailty variable. The $$p$$-dimensional regression coefficients $$(\alpha, \eta)$$ and $$(\beta, \theta)$$ correspond to the shape and the size parameters, respectively. The arguments in simGSC are as follow

> args(simGSC)
function (n, summary = FALSE, para, xmat, censoring, frailty,
tau, origin, Lam0, Haz0)
NULL

The arguments are as follows

• n is the number of individual
• summary is a logical value indicating whether a brief data summary will be printed.
• para is a list of regression parameters. The list members alpha, beta, eta, and theta correspond to $$\alpha$$, $$\beta$$, $$\eta$$, and $$\theta$$ in (1), respectively.
• xmat is the $$n\times p$$ design matrix.
• censoring is a $$n$$-dimensional numerical vector to specify the independent censoring time, $$C$$.
• frailty is a $$n$$-dimensional numerical vector to specify the frailty variable, $$Z$$.
• tau is the maximum follow-up time, $$\tau$$.
• origin is the time origin for each subject.
• Lam0 is a single argument function used to specify the baseline cumulative rate function, $$\Lambda_0(t)$$.
• Haz0 is a single argument function used to specify the baseline cumulative hazard function, $$H_0(t)$$.

At the default setting, the simGSC() function assumes $$p = 2$$ and the regression parameters to be $$\alpha = \eta=(0, 0)^\top$$, $$\beta = (-1, -1)^\top$$, and $$\theta = (1, 1)^\top$$. When the xmat and the censoring arguments are not specified, the simGSC() function assumes $$X_i$$ is a two-dimensional vector $$X_i = (X_{i1}, X_{i2}), i = 1, \ldots, n$$, where $$X_{i1}$$ is a Bernoulli variable with rate 0.5 and $$X_{i2}$$ is a standard normal variable. With the default xmat, the censoring time $$C$$ is generated from an independent uniform distribution in $$[0, 2\tau X_{i1} + 2Z^2\tau(1 - X_{i1})]$$. Thus, the censoring distribution is covariate dependent and is informative when $$Z$$ is not a constant. On the other hand, when xmat is specified by the users, the censoring time $$C$$ is generated from an independent uniform distribution $$[0, 2 \tau]$$. When the frailty argument is not specified, the frailty variable $$Z$$ is generated from a gamma distribution with a unit mean and a variance of 0.25. The default values for tau and origin are 60 and 0, respectively. When arguments Lam0 and Haz0 are left unspecified, the simGSC() function uses $$\Lambda_0(t) = 2\log(1 + t)$$ and $$H_0(t) = \log(1 + t) / 5$$, respectively. This is equivalent to setting Lam0 = function(x) 2 * log(1 + x) and Haz0 = function(x) log(1 + x) / 5. In summary, the default specifications generate the recurrent events and the terminal events from the model: $\begin{cases} \lambda(t) = \displaystyle \frac{2Z}{1 + te^{-X_{i1} - X_{i2}}}, &\\ h(t) = \displaystyle \frac{Z}{5(1 + te^{X_{i1} + X_{i2}})}, & t\in[0, 60]. \end{cases}$

## Examples

### simDat

The simGSC() function generates simulated data from the above specification and returns a data.frame in the same format as the built-in data set, simDat. Specifically, simDat was generated using the default settings of simGSC().

> data(simDat)
> set.seed(0); dat <- simGSC(200, summary = TRUE)
Call:
simGSC(n = 200, summary = TRUE)

Summary:
Sample size:                                    200
Number of recurrent event observed:             667
Average number of recurrent event per subject:  3.335
Proportion of subjects with a terminal event:   0.595
Median time-to-terminal event:                  7.378
> identical(dat, simDat)
[1] FALSE

### Simulating from general scale-change models

The simDat is an example that simulates recurrent event data when both the rate function and the hazard function are Cox-type model.

> set.seed(0); datCox <- simGSC(200, summary = TRUE)
Call:
simGSC(n = 200, summary = TRUE)

Summary:
Sample size:                                    200
Number of recurrent event observed:             667
Average number of recurrent event per subject:  3.335
Proportion of subjects with a terminal event:   0.595
Median time-to-terminal event:                  7.378

The hypothesis test proposed by Xu et al. (2020) shows a strong evidence that the rate function has a Cox structure.

> fit <- reReg(Recur(t.start %to% t.stop, id, event, status) ~ x1 + x2,
+              model = "gsc", B = 200, se = "sand", data = datCox)
> summary(fit, test = TRUE)
Call:
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~
x1 + x2, data = datCox, model = "gsc", B = 200, se = "sand")

Recurrent event process (shape):
Estimate   StdErr z.value p.value
x1 0.306169 0.344301  0.8892  0.3739
x2 0.072077 0.286502  0.2516  0.8014

Recurrent event process (size):
Estimate   StdErr z.value   p.value
x1 -0.59885  0.22498 -2.6618  0.007773 **
x2 -1.11749  0.27864 -4.0106 6.057e-05 ***

Hypothesis tests:
Ho: shape = 0 (Cox-type model):
X-squared = 0.793, df = 2, p-value = 0.6727
Ho: size = 0 (Accelerated rate model):
X-squared = 18.5657, df = 2, p-value = 1e-04
Ho: shape = size (Accelerated mean model):
X-squared = 24.8642, df = 2, p-value = 0

Similarly, the hypothesis test shows a strong evidence that the rate function has an accelerated mean or accelerated rate structure when the data is simulated under these.

When rate has an accelerated mean structure:

> set.seed(0); datAM <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(-1, -1)))
> summary(update(fit, data = datAM), test = TRUE)
Call:
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~
x1 + x2, data = datAM, model = "gsc", B = 200, se = "sand")

Recurrent event process (shape):
Estimate   StdErr z.value  p.value
x1 -1.88457  0.67925 -2.7745 0.005529 **
x2 -1.77292  0.56818 -3.1204 0.001806 **

Recurrent event process (size):
Estimate   StdErr z.value   p.value
x1 -1.68454  0.39280 -4.2886 1.798e-05 ***
x2 -1.09378  0.31676 -3.4530 0.0005543 ***

Hypothesis tests:
Ho: shape = 0 (Cox-type model):
X-squared = 11.0149, df = 2, p-value = 0.0041
Ho: size = 0 (Accelerated rate model):
X-squared = 32.2029, df = 2, p-value = 0
Ho: shape = size (Accelerated mean model):
X-squared = 1.1139, df = 2, p-value = 0.573

When rate has an accelerated rate structure:

> set.seed(0); datAR <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(0, 0)))
> summary(update(fit, data = datAR), test = TRUE)
Call:
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~
x1 + x2, data = datAR, model = "gsc", B = 200, se = "sand")

Recurrent event process (shape):
Estimate   StdErr z.value   p.value
x1 -1.49782  0.44370 -3.3757 0.0007362 ***
x2 -0.92654  0.37955 -2.4411 0.0146407 *

Recurrent event process (size):
Estimate   StdErr z.value p.value
x1 -0.49245  0.19487 -2.5271  0.0115 *
x2 -0.23990  0.16629 -1.4427  0.1491

Hypothesis tests:
Ho: shape = 0 (Cox-type model):
X-squared = 11.7142, df = 2, p-value = 0.0029
Ho: size = 0 (Accelerated rate model):
X-squared = 7.8841, df = 2, p-value = 0.0194
Ho: shape = size (Accelerated mean model):
X-squared = 8.5924, df = 2, p-value = 0.0136

## Reference

Xu, Gongjun, Sy Han Chiou, Jun Yan, Kieren Marr, and Chiung-Yu Huang. 2020. “Generalized Scale-Change Models for Recurrent Event Processes Under Informative Censoring.” Statistica Sinica 30: 1773–95.