In this vignette, we demonstrate how to use the simGSC()
function in reReg
package to simulate recurrent event data from a general scale-change model. Since the scale-change model includes the Cox-type model and the accelerated mean model as special cases, simGSC()
can also be used to generate data from these submodels. The simGSC()
function allows the censoring time to be non-informative (independent given covariate) or informative about the recurrent event process.
Let \(N(t)\) be the number of recurrent events occurring over the interval \([0, t]\) and \(D\) be the failure time of interest subjects to right censoring by \(C\). Define the composite censoring time \(Y = \min(D, C, \tau)\) and the failure event indicator \(\Delta = I\{D\le \min(C, \tau)\}\), where \(\tau\) is the maximum follow-up time. We assume the recurrent event process \(N(\cdot)\) is observed up to \(Y\). Let \(X\) be a \(p\)-dimensional covariate vector. Consider a random sample of \(n\) subjects, the observed data are independent and identically distributed (iid) copies of \(\{Y_i, \Delta_i, X_i, N_i(t), 0\le t\le Y_i\}\), where the subscript \(i\) denotes the index of a subject for \(i= 1, \ldots, n\). Let \(m_i = N_i(Y_i)\) be the number of recurrent events the \(i\)th subject experienced before time \(Y_i\), then the jump times of \(N_i(t)\) give the observed recurrent event times \(t_{i1}, \ldots, t_{im_i}\) when \(m_i > 0\). Thus, the observed data can also be expressed as iid copies of \(\{Y_i, \Delta_i, X_i, m_i, (t_{i1}, \ldots, t_{m_i})\}\). The primary interests in recurrent event data analysis often lie in making inference about the recurrent event process and the failure event and understanding the corresponding covariate effects. The built-in dataset, simDat
, is a simulated example of a recurrent event data that is generated from the simGSC()
function:
> library(reReg)
> packageVersion("reReg")
[1] '1.4.2'
> data(simDat)
simGSC
functionThe function simGSC
generates the recurrent times from the joint model \[\begin{equation}
\begin{cases}
\lambda(t) = \displaystyle Z\lambda_0(te^{X^\top\alpha})e^{X^\top\beta},&\\
h(t) = \displaystyle Zh_0(te^{X^\top\eta})e^{X^\top\theta}, &t\in[0, \tau],
\end{cases}
(\#eqn:sc|sc)
\end{equation}\] where \(\lambda_0(t)\) is the baseline rate function for the recurrent event process, \(h_0(t)\) is the baseline hazard function for the failure time, \(Z\) is a non-negative subject-specific latent frailty variable. The \(p\)-dimensional regression coefficients \((\alpha, \eta)\) and \((\beta, \theta)\) correspond to the shape and the size parameters, respectively. The arguments in simGSC
are as follow
> args(simGSC)
function (n, summary = FALSE, para, xmat, censoring, frailty,
tau, origin, Lam0, Haz0)
NULL
The arguments are as follows
n
is the number of individualsummary
is a logical value indicating whether a brief data summary will be printed.para
is a list of regression parameters. The list members alpha
, beta
, eta
, and theta
correspond to \(\alpha\), \(\beta\), \(\eta\), and \(\theta\) in @ref(eqn:sc|sc), respectively.xmat
is the \(n\times p\) design matrix.censoring
is a \(n\)-dimensional numerical vector to specify the independent censoring time, \(C\).frailty
is a \(n\)-dimensional numerical vector to specify the frailty variable, \(Z\).tau
is the maximum follow-up time, \(\tau\).origin
is the time origin for each subject.Lam0
is a single argument function used to specify the baseline cumulative rate function, \(\Lambda_0(t)\).Haz0
is a single argument function used to specify the baseline cumulative hazard function, \(H_0(t)\).At the default setting, the simGSC()
function assumes \(p = 2\) and the regression parameters to be \(\alpha = \eta=(0, 0)^\top\), \(\beta = (-1, -1)^\top\), and \(\theta = (1, 1)^\top\). When the xmat
and the censoring
arguments are not specified, the simGSC()
function assumes \(X_i\) is a two-dimensional vector \(X_i = (X_{i1}, X_{i2}), i = 1, \ldots, n\), where \(X_{i1}\) is a Bernoulli variable with rate 0.5 and \(X_{i2}\) is a standard normal variable. With the default xmat
, the censoring time \(C\) is generated from an independent uniform distribution in \([0, 2\tau X_{i1} + 2Z^2\tau(1 - X_{i1})]\). Thus, the censoring distribution is covariate dependent and is informative when \(Z\) is not a constant. On the other hand, when xmat
is specified by the users, the censoring time \(C\) is generated from an independent uniform distribution \([0, 2 \tau]\). When the frailty
argument is not specified, the frailty variable \(Z\) is generated from a gamma distribution with a unit mean and a variance of 0.25. The default values for tau
and origin
are 60 and 0, respectively. When arguments Lam0
and Haz0
are left unspecified, the simGSC()
function uses \(\Lambda_0(t) = 2\log(1 + t)\) and \(H_0(t) = \log(1 + t) / 5\), respectively. This is equivalent to setting Lam0 = function(x) 2 * log(1 + x)
and Haz0 = function(x) log(1 + x) / 5
. In summary, the default specifications generate the recurrent events and the terminal events from the model: \[\begin{cases}
\lambda(t) = \displaystyle \frac{2Z}{1 + te^{-X_{i1} - X_{i2}}}, &\\
h(t) = \displaystyle \frac{Z}{5(1 + te^{X_{i1} + X_{i2}})}, & t\in[0, 60].
\end{cases}\]
simDat
The simGSC()
function generates simulated data from the above specification and returns a data.frame
in the same format as the built-in data set, simDat
. Specifically, simDat
was generated using the default settings of simGSC()
.
> data(simDat)
> set.seed(0); dat <- simGSC(200, summary = TRUE)
Call:
simGSC(n = 200, summary = TRUE)
Summary:
Sample size: 200
Number of recurrent event observed: 674
Average number of recurrent event per subject: 3.37
Proportion of subjects with a terminal event: 0.59
Median time-to-terminal event: 6.975
> identical(dat, simDat)
[1] TRUE
The simDat
is an example that simulates recurrent event data when both the rate function and the hazard function are Cox-type model.
> set.seed(0); datCox <- simGSC(200, summary = TRUE)
Call:
simGSC(n = 200, summary = TRUE)
Summary:
Sample size: 200
Number of recurrent event observed: 674
Average number of recurrent event per subject: 3.37
Proportion of subjects with a terminal event: 0.59
Median time-to-terminal event: 6.975
The hypothesis test proposed by Xu et al. (2020) shows a strong evidence that the rate function has a Cox structure.
> fit <- reReg(Recur(t.start %to% t.stop, id, event, status) ~ x1 + x2,
+ model = "gsc", B = 200, se = "sand", data = datCox)
> summary(fit, test = TRUE)
Call:
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~
x1 + x2, data = datCox, model = "gsc", B = 200, se = "sand")
Recurrent event process:
Estimate StdErr z.value p.value
x1 -1.13602 0.13704 -8.2900 < 2.2e-16 ***
x2 -1.07493 0.14264 -7.5361 < 2.2e-16 ***
Similarly, the hypothesis test shows a strong evidence that the rate function has an accelerated mean or accelerated rate structure when the data is simulated under these.
When rate has an accelerated mean structure:
> set.seed(0); datAM <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(-1, -1)))
> summary(update(fit, data = datAM), test = TRUE)
Call:
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~
x1 + x2, data = datAM, model = "gsc", B = 200, se = "sand")
Recurrent event process:
Estimate StdErr z.value p.value
x1 -0.57478 0.12960 -4.4351 1e-05 ***
x2 -0.45066 0.18189 -2.4777 0.01322 *
When rate has an accelerated rate structure:
> set.seed(0); datAR <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(0, 0)))
> summary(update(fit, data = datAR), test = TRUE)
Call:
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~
x1 + x2, data = datAR, model = "gsc", B = 200, se = "sand")
Recurrent event process:
Estimate StdErr z.value p.value
x1 0.48918 0.10080 4.8530 <2e-16 ***
x2 0.41743 0.09635 4.3326 1e-05 ***
Xu, Gongjun, Sy Han Chiou, Jun Yan, Kieren Marr, and Chiung-Yu Huang. 2020. “Generalized Scale-Change Models for Recurrent Event Processes Under Informative Censoring.” Statistica Sinica 30: 1773–95.