examples.Rmd
In this vignette, we demonstrate the usage of the SSIndex
package. The implementations provided in the SSIndex
package are based on the working manuscript titled “Statistical inference on shape- and size-indexes for counting processes”.
Let \(N^\ast(t)\) be the number of events occurring at or before time \(t\) and \(X\) is a \(p\times1\) vector of baseline covariate. We consider a semi-parametric model for the conditional rate function of \(N^\ast(\cdot)\): \[\lambda(t|X) = f(t, \beta_0^\top X) g(\gamma_0^\top X), 0\le t \le \tau, \] where \([0, \tau]\) is the study period of interest, \(f(\cdot, a)\) is an unspecified density function for \(a\in\mathcal R\), and \(g(\cdot)\) is an unspecified function. We denote the regression parameters \(\beta_0\) and \(\gamma_0\) the shape and size parameters, respectively. Similar, we denote the linear combinations \(\beta_0^\top X\) and \(\gamma_0^\top X\) the shape-index and size-index, respectively. Let \(F(t, a) = \int_0^tf(u, a)du\), we impose the assumptions:
We use the simDat()
to generate simulated data descriptive in the manuscript and use the gsm()
to obtain the point estimators for \(\beta_0\) and \(\gamma_0\).
We use the simDat()
function to generate the simulated data used in the manuscript. The syntax of simDat()
is
function (n, model, frailty = FALSE, ...)
NULL
The arguments are:
n
is an integer value representing sample sizemodel
is a character string indicating the models described below.frailty
is a logical value indicating whether to generate the data under informative censoringWe consider generating recurrent events from the following rate functions as proposed in the simulation study of the manuscript.
(M1) \(\lambda(t|X,Z) = Z(1 + t)^{-1} + 0.5Z\exp{(b^\intercal X)}\), \(\beta_0 = -b\), \(\gamma_0 = b\);
(M2) \(\lambda(t|X,Z) =Z\exp\{-0.5t\exp({-b^\intercal X})\}\), \(\beta_0 = b\), \(\gamma_0 = -b\);
(M3) \(\lambda(t | X, Z) = 0.5Z\log[\exp(b^\intercal X) + 1](1 + 0.5t)^{\{\log[\exp(b^\intercal X) + 1] - 1\}}\), \(\beta_0 = -b\), \(\gamma_0 = b\);
(M4) \(\lambda(t|X,Z) = Zf_B(t,\beta_0^\intercal X)\exp(\gamma_0^\intercal X)\), for \(t \in[0, 1]\), where \(f_B(\cdot,a) \propto x(1-x)^{\exp(a)}\) is a Beta density function, \(\beta_0 = (0.6, 0.8)^\top\), and \(\gamma_0 = (0.28, 0.96)^\top\).
We set \(b = (0.6, 0.8)\) in (M1)-(M3). For each setting, we consider two covariates and \(X = (X_1, X_2)^\intercal\), where \(X_1\) and \(X_2\) are independent standard normal random variables. The censoring time was generated from an exponential distribution with mean \(10 (1 + |X_1|) / Z\), where the subject-specific latent variable \(Z\) was either set as \(Z \equiv 1\) (frailty = TRUE
) or generated from an exponential distribution with mean 1 (frailty = FALSE
). Thus, in the latter case, the censoring time is correlated with the recurrent event process through both \(X\) and \(Z\). The following code provides an simulated data set with 200 samples under model (M4) and the non-informative censoring assumption.
# A tibble: 6 x 7
id Time m x1 x2 event status
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.0537 10 0.326 1.26 1 0
2 1 0.170 10 0.326 1.26 1 0
3 1 0.208 10 0.326 1.26 1 0
4 1 0.225 10 0.326 1.26 1 0
5 1 0.254 10 0.326 1.26 1 0
6 1 0.259 10 0.326 1.26 1 0
The simDat()
returns a data.frame with 7 columns. The variables (columns) are defined below:
id
is subject’s identitytime
marks the recurrent event time when event = 1
or censoring time when event = 0
m
is the number of recurrent events observed by the subject during the study periodx1
and x2
are the covaraitesevent
is the recurrent event indicator; event = 1
denotes a recurrent event, and event = 0
otherwise.status
is 1 - event
The main function in SSIndex
package is ssfit()
. The arguments for the ssfit()
function are
function (formula, data, shp.ind = FALSE, bIni = NULL, rIni = NULL,
optMethod = NULL, smoothFirst = NULL, interval = c(0, 2 *
pi), h = NULL, weights = NULL)
NULL
formula
is a formula object, with the response on the left of a ~
operator.data
is an optional argument for data.shp.ind
is a character string indicating either to proceed with shape-independence assumption.B
is a numerical value specifying number of bootstrap in shape independence testing.bIni
is the initial value for estimating the shape parameterrIni
is the initial value for estimating the size parametersmoothFirst
estimate the smooth estimator first and use it to guide? Default: FALSE
when \(p \le 2\), TRUE
when \(p > 2\)
opt
how many optimization methods to try? 1: optim
2: optim
+ BB::spg
Default: 1 when \(p \le 2\), 2 when \(p > 2\)
weights
is an optional weight vector used for multiplier bootstrap.Since the proposed estimating procedure is a two-step rank-based estimation procedure that esitmates the shape parameter first, setting shp.ind = TRUE
would assume \(\beta = 0\). The default initial values used in the estimation are 0-vectors and other initial values could be specified via bIni
and rIni
. The proposed estimating procedure involves optimizing non-smooth objective functions. When smoothFirst = TRUE
, an induced smoothing technique is applied to the non-smooth objective functions and the optimization will be applied to the smoothed objective functions. The parameter estimates obtained from optimizing the smoothed objective functions will be used as the initial values in the optimization of the proposed non-smooth objective functions. In our simulation study, we found setting smoothFirst = TRUE
could yield a slightly more stable results when \(p > 1\).
The following gives an example code to fit the simulated data dat
:
> fm <- reSurv(time1 = Time, id = id, event = event, status = status) ~ x1 + x2
> fit <- ssfit(fm, data = dat, shp.ind = FALSE, smoothFirst = TRUE, optMethod = 2)
The above example defines the formula
using the reSurv()
function from the reReg
package. The reSurv()
is used to create a recurrent event object, see ?reReg::reSurv
. While reSurv()
serves the propose here, it is deprecated in reReg
version 1.1.6.
[1] "b0" "r0" "b00" "r00" "Fhat"
The ssfit()
function returns a list of five elements. They are
b0
is the shape parameter estimate obtained by optimizing the smooth objective functions; this is NULL
if smoothFirst = FALSE
.r0
is the size parameter estimate obtained by optimizing the smooth objective functions; this is NULL
if smoothFirst = FALSE
.b00
is the shape parameter estimate obtained by optimizing the nonsmooth objective functions using the smoothed estimator as the initial value (if it is available).b00
is the size parameter estimate obtained by optimizing the nonsmooth objective functions using the smoothed estimator as the initial value (if it is available).Fhat
is the estimate of \(\widehat F(Y_i, \widehat\beta^\top X_i)\), where \(Y_i\) is the censoring time for the \(i\)th subject and \(\widehat\beta\) is the shape parameter estimate from b00
.In this case, the point estimates are
> tab <- matrix(with(fit, c(b0, r0, b00, r00)), 4)
> rownames(tab) <- c("b1", "b2", "r1", "r2")
> knitr::kable(tab, col.names = c("smooth", "non-smooth"))
smooth | non-smooth | |
---|---|---|
b1 | 0.5430320 | 0.5628135 |
b2 | 0.8397120 | 0.8265839 |
r1 | 0.2804429 | 0.2831104 |
r2 | 0.9598707 | 0.9590873 |
Non-parametric bootstrap or multiplier boostratp can be applied to obtain the standard errors for the shape and size esitmates. For example, the following implementation can be used to obtain the standard errors with 100 bootstrap samples:
> bootSE <- function(dat) {
+ n <- length(unique(dat$id))
+ mm <- aggregate(event ~ id, dat, sum)[, 2]
+ ind <- sample(1:n, replace = TRUE)
+ dat0 <- dat[unlist(sapply(ind, function(x) which(dat$id %in% x))),]
+ dat0$id <- rep(1:n, mm[ind] + 1)
+ fit0 <- ssfit(fm, data = dat0, shp.ind = FALSE, bIni = fit$b00, rIni = fit$r00,
+ smoothFirst = TRUE, optMethod = 2)
+ with(fit0, c(b0, r0, b00, r00))
+ }
> b <- replicate(100, bootSE(dat))
> tab <- matrix(apply(b, 1, sd), 4)
> rownames(tab) <- c("b1", "b2", "r1", "r2")
> knitr::kable(tab, col.names = c("smooth", "non-smooth"))
smooth | non-smooth | |
---|---|---|
b1 | 0.0345190 | 0.0418746 |
b2 | 0.0218521 | 0.0269611 |
r1 | 0.0289695 | 0.0322166 |
r2 | 0.0084166 | 0.0092267 |
The multiplier bootstrap approach can be carried out by using the optional weights
argument.
> resampSE <- function(dat) {
+ n <- length(unique(dat$id))
+ w <- rexp(n)
+ fit0 <- ssfit(fm, data = dat, shp.ind = FALSE, bIni = fit$b00, rIni = fit$r00,
+ smoothFirst = TRUE, optMethod = 2, weights = w / mean(w))
+ with(fit0, c(b0, r0, b00, r00))
+ }
> m <- replicate(100, resampSE(dat))
> tab <- matrix(apply(m, 1, sd), 4)
> rownames(tab) <- c("b1", "b2", "r1", "r2")
> knitr::kable(tab, col.names = c("smooth", "non-smooth"))
smooth | non-smooth | |
---|---|---|
b1 | 0.0412433 | 0.0396837 |
b2 | 0.0267936 | 0.0256027 |
r1 | 0.0265176 | 0.0283423 |
r2 | 0.0075635 | 0.0082334 |