vignettes/reReg-sims.Rmd
reReg-sims.Rmd
In this vignette, we demonstrate how to use the simGSC()
function in reReg
package to simulate recurrent event data from a general scale-change model. Since the scale-change model includes the Cox-type model and the accelerated mean model as special cases, simGSC()
can also be used to generate data from these sub-models. The simGSC()
function allows the censoring time to be non-informative (independent given covariate) or informative about the recurrent event process.
Let \(N(t)\) be the number of recurrent events occurring over the interval \([0, t]\) and \(D\) be the failure time of interest subjects to right censoring by \(C\). Define the composite censoring time \(Y = \min(D, C, \tau)\) and the failure event indicator \(\Delta = I\{D\le \min(C, \tau)\}\), where \(\tau\) is the maximum follow-up time. We assume the recurrent event process \(N(\cdot)\) is observed up to \(Y\). Let \(X\) be a \(p\)-dimensional covariate vector. Consider a random sample of \(n\) subjects, the observed data are independent and identically distributed (iid) copies of \(\{Y_i, \Delta_i, X_i, N_i(t), 0\le t\le Y_i\}\), where the subscript \(i\) denotes the index of a subject for \(i= 1, \ldots, n\). Let \(m_i = N_i(Y_i)\) be the number of recurrent events the \(i\)th subject experienced before time \(Y_i\), then the jump times of \(N_i(t)\) give the observed recurrent event times \(t_{i1}, \ldots, t_{im_i}\) when \(m_i > 0\). Thus, the observed data can also be expressed as iid copies of \(\{Y_i, \Delta_i, X_i, m_i, (t_{i1}, \ldots, t_{m_i})\}\). The primary interests in recurrent event data analysis often lie in making inference about the recurrent event process and the failure event and understanding the corresponding covariate effects. The built-in dataset, simDat
, is a simulated example of a recurrent event data that is generated from the simGSC()
function:
> library(reReg)
> packageVersion("reReg")
[1] '1.4.6'
> data(simDat)
simGSC
function
The function simGSC
generates the recurrent times from the joint model \[\begin{equation}
\begin{cases}
\lambda(t) = \displaystyle Z\lambda_0(te^{X^\top\alpha})e^{X^\top\beta},&\\
h(t) = \displaystyle Zh_0(te^{X^\top\eta})e^{X^\top\theta}, &t\in[0, \tau],
\end{cases}
\end{equation}\] where \(\lambda_0(t)\) is the baseline rate function for the recurrent event process, \(h_0(t)\) is the baseline hazard function for the failure time, \(Z\) is a non-negative subject-specific latent frailty variable. The \(p\)-dimensional regression coefficients \((\alpha, \eta)\) and \((\beta, \theta)\) correspond to the shape and the size parameters, respectively. The arguments in simGSC
are as follow
> args(simGSC)
function (n, summary = FALSE, para, xmat, censoring, frailty,
tau, origin, Lam0, Haz0)
NULL
The arguments are as follows
n
is the number of individualsummary
is a logical value indicating whether a brief data summary will be printed.para
is a list of regression parameters. The list members alpha
, beta
, eta
, and theta
correspond to \(\alpha\), \(\beta\), \(\eta\), and \(\theta\) in (1), respectively.xmat
is the \(n\times p\) design matrix.censoring
is a \(n\)-dimensional numerical vector to specify the independent censoring time, \(C\).frailty
is a \(n\)-dimensional numerical vector to specify the frailty variable, \(Z\).tau
is the maximum follow-up time, \(\tau\).origin
is the time origin for each subject.Lam0
is a single argument function used to specify the baseline cumulative rate function, \(\Lambda_0(t)\).Haz0
is a single argument function used to specify the baseline cumulative hazard function, \(H_0(t)\).At the default setting, the simGSC()
function assumes \(p = 2\) and the regression parameters to be \(\alpha = \eta=(0, 0)^\top\), \(\beta = (-1, -1)^\top\), and \(\theta = (1, 1)^\top\). When the xmat
and the censoring
arguments are not specified, the simGSC()
function assumes \(X_i\) is a two-dimensional vector \(X_i = (X_{i1}, X_{i2}), i = 1, \ldots, n\), where \(X_{i1}\) is a Bernoulli variable with rate 0.5 and \(X_{i2}\) is a standard normal variable. With the default xmat
, the censoring time \(C\) is generated from an independent uniform distribution in \([0, 2\tau X_{i1} + 2Z^2\tau(1 - X_{i1})]\). Thus, the censoring distribution is covariate dependent and is informative when \(Z\) is not a constant. On the other hand, when xmat
is specified by the users, the censoring time \(C\) is generated from an independent uniform distribution \([0, 2 \tau]\). When the frailty
argument is not specified, the frailty variable \(Z\) is generated from a gamma distribution with a unit mean and a variance of 0.25. The default values for tau
and origin
are 60 and 0, respectively. When arguments Lam0
and Haz0
are left unspecified, the simGSC()
function uses \(\Lambda_0(t) = 2\log(1 + t)\) and \(H_0(t) = \log(1 + t) / 5\), respectively. This is equivalent to setting Lam0 = function(x) 2 * log(1 + x)
and Haz0 = function(x) log(1 + x) / 5
. In summary, the default specifications generate the recurrent events and the terminal events from the model: \[\begin{cases}
\lambda(t) = \displaystyle \frac{2Z}{1 + te^{-X_{i1} - X_{i2}}}, &\\
h(t) = \displaystyle \frac{Z}{5(1 + te^{X_{i1} + X_{i2}})}, & t\in[0, 60].
\end{cases}\]
simDat
The simGSC()
function generates simulated data from the above specification and returns a data.frame
in the same format as the built-in data set, simDat
. Specifically, simDat
was generated using the default settings of simGSC()
.
> data(simDat)
> set.seed(0); dat <- simGSC(200, summary = TRUE)
Call:
simGSC(n = 200, summary = TRUE)
Summary:
Sample size: 200
Number of recurrent event observed: 667
Average number of recurrent event per subject: 3.335
Proportion of subjects with a terminal event: 0.595
Median time-to-terminal event: 7.378
> identical(dat, simDat)
[1] FALSE
The simDat
is an example that simulates recurrent event data when both the rate function and the hazard function are Cox-type model.
> set.seed(0); datCox <- simGSC(200, summary = TRUE)
Call:
simGSC(n = 200, summary = TRUE)
Summary:
Sample size: 200
Number of recurrent event observed: 667
Average number of recurrent event per subject: 3.335
Proportion of subjects with a terminal event: 0.595
Median time-to-terminal event: 7.378
The hypothesis test proposed by Xu et al. (2020) shows a strong evidence that the rate function has a Cox structure.
> fit <- reReg(Recur(t.start %to% t.stop, id, event, status) ~ x1 + x2,
+ model = "gsc", B = 200, se = "sand", data = datCox)
> summary(fit, test = TRUE)
Call:
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~
x1 + x2, data = datCox, model = "gsc", B = 200, se = "sand")
Recurrent event process (shape):
Estimate StdErr z.value p.value
x1 0.306169 0.344301 0.8892 0.3739
x2 0.072077 0.286502 0.2516 0.8014
Recurrent event process (size):
Estimate StdErr z.value p.value
x1 -0.59885 0.22498 -2.6618 0.007773 **
x2 -1.11749 0.27864 -4.0106 6.057e-05 ***
Hypothesis tests:
Ho: shape = 0 (Cox-type model):
X-squared = 0.793, df = 2, p-value = 0.6727
Ho: size = 0 (Accelerated rate model):
X-squared = 18.5657, df = 2, p-value = 1e-04
Ho: shape = size (Accelerated mean model):
X-squared = 24.8642, df = 2, p-value = 0
Similarly, the hypothesis test shows a strong evidence that the rate function has an accelerated mean or accelerated rate structure when the data is simulated under these.
When rate has an accelerated mean structure:
> set.seed(0); datAM <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(-1, -1)))
> summary(update(fit, data = datAM), test = TRUE)
Call:
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~
x1 + x2, data = datAM, model = "gsc", B = 200, se = "sand")
Recurrent event process (shape):
Estimate StdErr z.value p.value
x1 -1.88457 0.67925 -2.7745 0.005529 **
x2 -1.77292 0.56818 -3.1204 0.001806 **
Recurrent event process (size):
Estimate StdErr z.value p.value
x1 -1.68454 0.39280 -4.2886 1.798e-05 ***
x2 -1.09378 0.31676 -3.4530 0.0005543 ***
Hypothesis tests:
Ho: shape = 0 (Cox-type model):
X-squared = 11.0149, df = 2, p-value = 0.0041
Ho: size = 0 (Accelerated rate model):
X-squared = 32.2029, df = 2, p-value = 0
Ho: shape = size (Accelerated mean model):
X-squared = 1.1139, df = 2, p-value = 0.573
When rate has an accelerated rate structure:
> set.seed(0); datAR <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(0, 0)))
> summary(update(fit, data = datAR), test = TRUE)
Call:
reReg(formula = Recur(t.start %to% t.stop, id, event, status) ~
x1 + x2, data = datAR, model = "gsc", B = 200, se = "sand")
Recurrent event process (shape):
Estimate StdErr z.value p.value
x1 -1.49782 0.44370 -3.3757 0.0007362 ***
x2 -0.92654 0.37955 -2.4411 0.0146407 *
Recurrent event process (size):
Estimate StdErr z.value p.value
x1 -0.49245 0.19487 -2.5271 0.0115 *
x2 -0.23990 0.16629 -1.4427 0.1491
Hypothesis tests:
Ho: shape = 0 (Cox-type model):
X-squared = 11.7142, df = 2, p-value = 0.0029
Ho: size = 0 (Accelerated rate model):
X-squared = 7.8841, df = 2, p-value = 0.0194
Ho: shape = size (Accelerated mean model):
X-squared = 8.5924, df = 2, p-value = 0.0136