vignettes/reReg-reg.Rmd
reReg-reg.Rmd
In this vignette, we demonstrate how to use the reReg()
function in reReg
package to fit semiparametric regression models to recurrent event data.
where \(Z\) is a latent shared frailty variable to account for association between the two types of outcomes, \(\lambda_0(\cdot)\) is the baseline rate function, \(h_0(\cdot)\) is the baseline hazard function, and the regression coefficients \((\alpha, \eta)\) and \((\beta, \theta)\) correspond to the shape and size parameters of the rate function and hazard function, respectively. In contrast to many shared-frailty models that require a parametric assumption, following the idea of Wang, Qin, and Chiang (2001), the reReg()
function implements semiparametric estimation procedures that do not require the knowledge about the frailty distribution. As a result, the dependence between recurrent events and failure event is left unspecified and the proposed implementations accommodate informative censoring. The reReg()
function fits the recurrent event data under the above joint model setting. The arguments of reReg()
are as follows
reReg()
> library(reReg); args(reReg)
function (formula, data, B = 200, method = "cox", se = c("resampling",
"bootstrap", "NULL"), control = list())
NULL
formula
a formula object, with the response on the left of a "~" operator, and the predictors on the right. The response must be a recurrent event survival object as returned by function Recur
. See the vignette on Visualization of recurrent event data or Introduction to formula response function Recur()
for examples in creating Recur
objects.data
an optional data frame in which to interpret the variables occurring in the formula
.B
a numeric value specifies the number of resampling (or bootstrap) for variance estimation. When B = 0
, variance estimation will not be performed.method
a character string specifying the underlying model.se
a character string specifying the method for standard error estimation.control
a list of control parameters.method
Model \eqref{eq:joint} includes several popular semiparametric models as special cases, which can be specified via the method
argument with the rate function and hazard function separated by "|
". For examples, the joint Cox model of Huang and Wang (2004) is a special case of \eqref{eq:joint} when \(\alpha = \eta = 0\) and can be called by method = "cox|cox"
; the joint accelerated mean model of Xu et al. (2017) is a special case when \(\alpha = \beta\) and \(\eta = \theta\) and can be called by method = "am|am"
. Treating the terminal event as nuisances (\(\eta = \theta = 0\)), \eqref{eq:joint} reduces to the generalized scale-change model of Xu et al. (2019), called by method = "sc|."
. Moreover, users can mix the models depending on the application. For example, method = "cox|ar"
postulate a Cox proportional model for the recurrent event rate function and an accelerated rate model for the terminal event hazard function (\(\alpha = \theta = 0\) in \eqref{eq:joint}).
se
For inference, the reReg()
function provides several approaches for variance estimation. The default option is se = "resampling"
, which refers to the efficient resampling-based sandwich estimator. The general idea is to decompose the limiting covariance matrix in a sandwich form, and its components are estimated via perturbed estimating equations. Details of the resampling approach can be found in Zeng and Lin (2008), Xu et al. (2017), and Xu et al. (2019). The resampling approach is faster than the conventional bootstrap, which can be called with se = "bootstrap"
, as it only requires evaluating perturbed estimating equations rather than solving them. When se = "bootstrap"
, user has an option to carry out the bootstrap with parallel computing. This can be done by specifying parallel = TRUE
in the control
list. When parallel = TRUE
, the number of CPU cores can be specified with parCl
.
The complete control
list consists of the following parameters: tol
specifies the absolute error tolerance in solving the estimating equations a0, b0
specifies the initial guesses used for root search * solver
specifies the equation solver used for root search; the available options are BBsolve
, dfsane
, BBoptim
, and optim
(the first three options loads the corresponding equation solver from package BB
). * baseSE
is an logical value indicating whether the confidence bounds for the baseline functions will be computed. * parallel
is an logical value indicating whether parallel computation will be applied when se = "bootstrap"
is called. * parCl
is an integer value specifying the number of CPU cores to be used when parallel = TRUE
. The default value is half the CPU cores on the current host.
We will illustrate the usage of reReg
with simulated data generated from simSC
. Readers are referred to the vignette on Simulating recurrent event data for using simSC
to generate recurrent event data.
A simulated model following the joint Cox model of Huang and Wang (2004) can be generated by
> set.seed(1); datCox <- simSC(500, c(1, -1), c(-1, 1), summary = TRUE)
Summary results for number of recurrent event per subject:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 2.00 8.00 18.83 21.00 525.00
Number of failures: 119 (23.8%); Number of censored events: 381 (76.2%)
Number of x1 == 1: 236 (47.2%); Number of x1 == 0: 264 (52.8%)
Summary results for x2:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.23639 -0.74943 -0.07149 -0.05930 0.67217 2.68406
The underlying true model has the form: \[\lambda(t) = Z \lambda_0(t)e^{X_1 - X_2}; h(t) = Z h_0(t)e^{-X_1 + X_2}.\] The model fit is
> fit.cox <- reReg(Recur(t.stop, id, event, status) ~ x1 + x2, data = datCox) > summary(fit.cox)
Length Class Mode
alpha 2 -none- numeric
aconv 1 -none- numeric
log.muZ 1 -none- numeric
zi 500 -none- numeric
Lam0 1 -none- function
beta 2 -none- numeric
bconv 1 -none- numeric
Haz0 1 -none- function
recType 1 -none- character
temType 1 -none- character
alphaSE 2 -none- numeric
alphaVar 4 -none- numeric
varMat 9 -none- numeric
betaSE 2 -none- numeric
betaVar 4 -none- numeric
method 1 -none- character
reTb 59490 -none- numeric
DF 8 data.frame list
call 3 -none- call
varNames 2 -none- character
se 1 -none- character
The baseline functions can be plotted via plot()
:
> plot(fit.cox)
A simulated model following the joint accelerated mean model of Xu et al. (2017) can be generated by
> set.seed(1); datAM <- simSC(500, c(1, -1), c(-1, 1), type = "am", summary = TRUE)
Summary results for number of recurrent event per subject:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 2.00 5.00 6.34 9.00 52.00
Number of failures: 115 (23%); Number of censored events: 385 (77%)
Number of x1 == 1: 236 (47.2%); Number of x1 == 0: 264 (52.8%)
Summary results for x2:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.23639 -0.74943 -0.07149 -0.05930 0.67217 2.68406
The underlying true model has the form: \[\lambda(t) = Z \lambda_0(te^{X_1 - X_2})e^{X_1 - X_2}; h(t) = Z h_0(te^{-X_1 + X_2})e^{-X_1 + X_2}.\] The model fit is
> fit.am <- reReg(Recur(t.stop, id, event, status) ~ x1 + x2, data = datAM, method = "am") > summary(fit.am)
Length Class Mode
alpha 2 -none- numeric
aconv 1 -none- numeric
log.muZ 1 -none- numeric
zi 500 -none- numeric
Lam0 1 -none- function
beta 2 -none- numeric
bconv 1 -none- numeric
Haz0 1 -none- function
recType 1 -none- character
temType 1 -none- character
alphaSE 2 -none- numeric
alphaVar 4 -none- numeric
varMat 4 -none- numeric
betaSE 2 -none- numeric
betaVar 4 -none- numeric
method 1 -none- character
reTb 22020 -none- numeric
DF 8 data.frame list
call 4 -none- call
varNames 2 -none- character
se 1 -none- character
The baseline functions can be plotted via plot()
:
> plot(fit.am)
sc.XCYH
A simulated model following the generalized scale-change model of Xu et al. (2019) can be generated by
> set.seed(1); datSC <- simSC(500, a1 = c(1, -1), a2 = c(-1, 1), b1 = c(1, -1), b2 = c(-1, 1), type = "sc", summary = TRUE)
Summary results for number of recurrent event per subject:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 2.00 4.27 5.00 55.00
Number of failures: 127 (25.4%); Number of censored events: 373 (74.6%)
Number of x1 == 1: 236 (47.2%); Number of x1 == 0: 264 (52.8%)
Summary results for x2:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.23639 -0.74943 -0.07149 -0.05930 0.67217 2.68406
The underlying true model has the form: \[\lambda(t) = Z \lambda_0(te^{X_1 - X_2})e^{-X_1 + X_2}; h(t) = Z h_0(te^{X_1 - X_2})e^{-X_1 + X_2}.\] The model fit is
> fit.sc <- reReg(Recur(t.stop, id, event, status) ~ x1 + x2, data = datSC, method = "sc") > summary(fit.sc)
Length Class Mode
alpha 4 -none- numeric
aconv 2 -none- numeric
log.muZ 1 -none- numeric
zi 500 -none- numeric
Lam0 1 -none- function
beta 4 -none- numeric
bconv 1 -none- numeric
Haz0 1 -none- function
recType 1 -none- character
temType 1 -none- character
alphaSE 4 -none- numeric
alphaVar 16 -none- numeric
varMat 25 -none- numeric
betaSE 4 -none- numeric
betaVar 16 -none- numeric
method 1 -none- character
reTb 15810 -none- numeric
DF 8 data.frame list
call 4 -none- call
varNames 2 -none- character
se 1 -none- character
The baseline functions can be plotted via plot()
:
> plot(fit.sc)
A simulated model following the generalized scale-change model of Xu et al. (2019) can be generated by
> set.seed(1); datCoxar <- simSC(500, c(1, -1), c(-1, 1), type = "cox|ar", summary = TRUE)
Summary results for number of recurrent event per subject:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 2.00 7.00 13.17 16.25 205.00
Number of failures: 185 (37%); Number of censored events: 315 (63%)
Number of x1 == 1: 236 (47.2%); Number of x1 == 0: 264 (52.8%)
Summary results for x2:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.23639 -0.74943 -0.07149 -0.05930 0.67217 2.68406
The underlying true model has the form: \[\lambda(t) = Z \lambda_0(t)e^{X_1 - X_2}; h(t) = Z h_0(te^{-X_1 + X_2}).\] The model fit is
> fit.coxar <- reReg(Recur(t.stop, id, event, status) ~ x1 + x2, data = datCoxar, method = "cox|ar") > summary(fit.coxar)
Length Class Mode
alpha 2 -none- numeric
aconv 1 -none- numeric
log.muZ 1 -none- numeric
zi 500 -none- numeric
Lam0 1 -none- function
beta 2 -none- numeric
bconv 1 -none- numeric
Haz0 1 -none- function
recType 1 -none- character
temType 1 -none- character
alphaSE 2 -none- numeric
alphaVar 4 -none- numeric
varMat 9 -none- numeric
betaSE 2 -none- numeric
betaVar 4 -none- numeric
method 1 -none- character
reTb 42516 -none- numeric
DF 8 data.frame list
call 4 -none- call
varNames 2 -none- character
se 1 -none- character
The baseline functions can be plotted via plot()
:
> plot(fit.coxar)
Parallel computing is possible via specifying the control list.
> system.time(fitB1 <- reReg(Recur(t.stop, id, event, status) ~ x1 + x2, data = datCox, + method = "cox", se = "bootstrap"))
user system elapsed
149.464 0.008 149.490
> summary(fitB1)
Length Class Mode
alpha 2 -none- numeric
aconv 1 -none- numeric
log.muZ 1 -none- numeric
zi 500 -none- numeric
Lam0 1 -none- function
beta 2 -none- numeric
bconv 1 -none- numeric
Haz0 1 -none- function
recType 1 -none- character
temType 1 -none- character
alphaSE 2 -none- numeric
betaSE 2 -none- numeric
alphaVar 4 -none- numeric
betaVar 4 -none- numeric
SEmat 800 -none- numeric
B 1 -none- numeric
method 1 -none- character
reTb 59490 -none- numeric
DF 8 data.frame list
call 5 -none- call
varNames 2 -none- character
se 1 -none- character
> system.time(fitB2 <- reReg(Recur(t.stop, id, event, status) ~ x1 + x2, data = datCox, + method = "cox", se = "bootstrap", + control = list(parallel = TRUE, parCl = 8)))
user system elapsed
0.812 0.008 12.329
> summary(fitB2)
Length Class Mode
alpha 2 -none- numeric
aconv 1 -none- numeric
log.muZ 1 -none- numeric
zi 500 -none- numeric
Lam0 1 -none- function
beta 2 -none- numeric
bconv 1 -none- numeric
Haz0 1 -none- function
recType 1 -none- character
temType 1 -none- character
alphaSE 2 -none- numeric
betaSE 2 -none- numeric
alphaVar 4 -none- numeric
betaVar 4 -none- numeric
SEmat 800 -none- numeric
B 1 -none- numeric
method 1 -none- character
reTb 59490 -none- numeric
DF 8 data.frame list
call 6 -none- call
varNames 2 -none- character
se 1 -none- character
Some methods that assumes and requires independent censorings are also implemented in . These includes the methods proposed by D. Lin et al. (2000), Ghosh and Lin (2002), and Ghosh and Lin (2003), that can be called by specifying method = "cox.LWYY"
, method = "cox.GL"
, and method = "am.GL"
, respectively.
Ghosh, Debashis, and D Y Lin. 2003. “Semiparametric Analysis of Recurrent Events Data in the Presence of Dependent Censoring.” Biometrics 59 (4): 877–85.
Ghosh, Debashis, and DY Lin. 2002. “Marginal Regression Models for Recurrent and Terminal Events.” Statistica Sinica 12 (3): 663–88.
Huang, Chiung-Yu, and Mei-Cheng Wang. 2004. “Joint Modeling and Estimation for Recurrent Event Processes and Failure Time Data.” Journal of the American Statistical Association 99 (468): 1153–65.
Lin, DY, LJ Wei, I Yang, and Z Ying. 2000. “Semiparametric Regression for the Mean and Rate Functions of Recurrent Events.” Journal of the Royal Statistical Society: Series B 62 (4). Wiley Online Library: 711–30.
Wang, Mei-Cheng, Jing Qin, and Chin-Tsang Chiang. 2001. “Analyzing Recurrent Event Data with Informative Censoring.” Journal of the American Statistical Association 96 (455): 1057–65.
Xu, Gongjun, Sy Han Chiou, Chiung-Yu Huang, Mei-Cheng Wang, and Jun Yan. 2017. “Joint Scale-Change Models for Recurrent Events and Failure Time.” Journal of the American Statistical Association 112 (518). Taylor & Francis: 794–805.
Xu, Gongjun, Sy Han Chiou, Jun Yan, Kieren Marr, and Chiung-Yu Huang. 2019. “Generalized Scale-Change Models for Recurrent Event Processes Under Informative Censoring.” Statistica Sinica. doi:http://doi:10.5705/ss.202018.0224.
Zeng, Donglin, and D Y Lin. 2008. “Efficient Resampling Methods for Nonsmooth Estimating Functions.” Biostatistics 9 (2): 355–63.