Fits a "rocTree" model.

rocTree(formula, data, id, subset, ensemble = TRUE, splitBy = c("dCON",
  "CON"), control = list())

Arguments

formula

is a formula object, with the response on the left of a '~' operator, and the terms on the right. The response must be a survival object returned by the 'Surv' function.

data

is an optional data frame in which to interpret the variables occurring in the 'formula'.

id

is an optional vector used to identify the longitudinal observations of subject's id. The length of 'id' should be the same as the total number of observations. If 'id' is missing, each row of `data` represents a distinct observation from a subject and all covariates are treated as a baseline covariate.

subset

is an optional vector specifying a subset of observations to be used in the fitting process.

ensemble

is an optional logical value. If TRUE (default), ensemble methods will be fitted. Otherwise, the survival tree will be fitted.

splitBy

is a character string specifying the splitting algorithm. The available options are 'CON' and 'dCON' corresponding to the splitting algorithm based on the total concordance measure or the difference in concordance measure, respectively. The default value is 'dCON'.

control

a list of control parameters. See 'details' for important special features of control parameters.

Value

An object of S4 class "rocTree" representig the fit, with the following components:

Details

The argument "control" defaults to a list with the following values:

tau

is the maximum follow-up time; default value is the 90th percentile of the unique observed survival times.

maxTree

is the number of survival trees to be used in the ensemble method (when ensemble = TRUE).

maxNode

is the maximum node number allowed to be in the tree; the default value is 500.

numFold

is the number of folds used in the cross-validation. When numFold > 0, the survival tree will be pruned; when numFold = 0, the unpruned survival tree will be presented. The default value is 10.

h

is the smoothing parameter used in the Kernel; the default value is tau / 20.

minSplitTerm

is the minimum number of baseline observations in each terminal node; the default value is 15.

minSplitNode

is the minimum number of baseline observations in each splitable node; the default value is 30.

disc

is a logical vector specifying whether the covariates in formula are discrete (TRUE) or continuous (FALSE). The length of disc should be the same as the number of covariates in formula. When not specified, the rocTree() function assumes continuous covariates for all.

K

is the number of time points on which the concordance measure is computed. A less refined time grids (smaller K) generally yields faster speed but a very small K is not recommended. The default value is 20.

References

Sun Y. and Wang, M.C. (2018+). ROC-guided classification and survival trees. Technical report.

See also

See print.rocTree and plot.rocTree for printing and plotting an rocTree, respectively.

Examples

data(simDat) ## Fitting a pruned survival tree rocTree(Surv(Time, death) ~ z1 + z2, id = id, data = simDat, ensemble = FALSE)
#> ROC-guided survival tree #> #> node), split #> * denotes terminal node #> #> Root #> ¦--2) z1 <= 0.32338* #> °--3) z1 > 0.32338 #> ¦--6) z2 <= 0.60199* #> °--7) z2 > 0.60199* #>
## Fitting a unpruned survival tree rocTree(Surv(Time, death) ~ z1 + z2, id = id, data = simDat, ensemble = FALSE, control = list(numFold = 0))
#> ROC-guided survival tree #> #> node), split #> * denotes terminal node #> #> Root #> ¦--2) z1 <= 0.32338 #> ¦ ¦--4) z1 <= 0.16418* #> ¦ °--5) z1 > 0.16418* #> °--3) z1 > 0.32338 #> ¦--6) z2 <= 0.60199 #> ¦ ¦--12) z2 <= 0.22388* #> ¦ °--13) z2 > 0.22388* #> °--7) z2 > 0.60199* #>
# NOT RUN { ## Fitting the ensemble algorithm (default) rocTree(Surv(Time, death) ~ z1 + z2, id = id, data = simDat, ensemble = TRUE) # }