vignettes/censCov-examples.Rmd
censCov-examples.RmdIn this vignette, we demonstrate how to use the thlm function in censCov package to fit linear regression model with censored covariate using the methods proposed in Qian et al. (2018).
Suppose the linear regression model: \[ Y = \alpha_0 + \alpha_1X + \boldsymbol{\alpha}_2^\top \bf Z + \epsilon, \] where \(Y\) is the response variable, \(X\) is the covariate of interest which is subject to right censoring, \(\bf Z\) is a \(p\times 1\) vector of additional covariates that are completely observed, \(\epsilon\) is the random error term with mean 0 and finite variance, and the \(\alpha\)’s are the corresponding regression coefficients. We assume \(epsilon\) is independent of \(X\), \(\bf Z\), and \(C\), where \(C\) is the right censoring that potentially censors \(X\). The primary scientific interest is in the parameter \(\alpha_1\), which captures the association between \(Y\) and \(X\). The thlm function provides
thlm functionThe thlm function presents the threshold regression approaches for linear regression models with a covariate that is subject to random censoring. The threshold regression methods allow for immediate testing of the significance of the effect of a censored covariate. The censCov package can be installed from CRAN with
> install.packages("censCov")or from GitHub with
> devtools::install_github("stc04003/censCov")The arguments of the thlm function are as follows:
function (formula, data, method = c("cc", "reverse", "deletion-threshold",
"complete-threshold", "all"), B = 0, subset, x.upplim = NULL,
t0 = NULL, control = thlm.control())
NULL
formula is a formula expression in the form of response ~ predictors. The response variable is assumed to be fully observed. The thlm function can accommodate at most one censored covariate, which is entered in the formula as an Surv object; see survival::Surv for more detail. If all the covariates are uncensored, the thlm function returns a lm object.data is an optional data frame, list or environment contains variables in the formula and the subset argument. If this is left unspecified, the variables are taken from the environment from which thlm is called.method is a character string specifying the threshold regression methods to be used. The following are permitted:
cc for complete-cases regressionreverse for complete-cases regressiondeletion-threshold for complete-cases regressioncomplete-threshold for complete-cases regressionall for complete-cases regressionB is a numeric value specifies the bootstrap size for estimating the standard error of the regression coefficient for the censored covariate when method = "deletion-threshold" or method = "complete-threshold". When B = 0 (default), only the beta estimate will be displayed.subset is an optional vector specifying a subset of observations to be used in the fitting process.x.upplim is an optional number value specifies the upper support of the censored covariate. When left unspecified, the maximum of the censored covariate will be used.t0 is an optional numeric value specifies the threshold when method = "deletion-threshold" or method = "complete-threshold". When left unspecified, an optimal threshold will be determined to optimize test power.control is a list of threshold parameters including:
t0.interval controls the end points of the interval to be searched for the optimal threshold when t0 is left unspecifiedt0.plot controls whether the objective function will be plotted. When t0.plot is true, both the raw t0.plot values and the smoothed estimates (using local polynomial regression fitting) are plotted.We will illustrate the usage of thlm with a simulated data that is generated as below:
> simDat <- function(n) {
+ X <- rexp(n, 3)
+ Z <- runif(n, 1, 6)
+ Y <- 0.5 + 0.5 * X - 0.5 * Z + rnorm(n, 0, .75)
+ cstime <- rexp(n, .75)
+ delta <- (X <= cstime) * 1
+ X <- pmin(X, cstime)
+ data.frame(Y = Y, X = X, Z = Z, delta = delta)
+ }
> set.seed(1)
> head(dat <- simDat(200), 10) Y X Z delta
1 -1.4004305 0.25172728 5.432294 1
2 -0.6842887 0.39388093 3.359654 1
3 -0.5541144 0.04856891 1.545505 1
4 0.1832982 0.04659842 2.666390 1
5 -2.5467871 0.14535621 5.187083 1
6 -0.6450906 0.49887130 2.384249 0
7 -2.0134650 0.40985402 3.935176 1
8 -2.5030175 0.17989428 5.183661 1
9 0.6904316 0.31885583 1.355770 1
10 -1.4071626 0.04901533 4.513894 1
The columns in the simulated data set, dat, are
Y is the response variableX is the covariate of interest when delta = 1 or the right censored value when delta = 0
Z is an additional covariate that is completely observeddelta is the censoring indicator for X
The average censoring percentage is about 20%.
The following methods for censored covariates are implemented in thlm.
When method = "cc", thlm removes rows with delta = 0 and fits the linear model via lm.
> thlm(Y ~ Surv(X, delta) + Z, data = dat, method = "cc")
Call: thlm(formula = Y ~ Surv(X, delta) + Z, data = dat, method = "cc")
Hypothesis test of association
H0: a1 = 0, p-value = 0.1028
or
> thlm(Y ~ X + Z, data = dat, subset = delta == 1)
Call: thlm(formula = Y ~ X + Z, data = dat, subset = delta == 1)
Hypothesis test of association
H0: a1 = 0, p-value = 0.1047
The two \(p\)-values differs a bit because the thlm returns the Wald \(p\)-value.
Call:
thlm(formula = Y ~ Surv(X, delta) + Z, data = dat, method = "rev")
Reverse survival regression
Coefficients:
Estimate StdErr z.value p.value
Y -0.18333 0.10056 -1.823 0.0683 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Call:
thlm(formula = Y ~ Surv(X, delta) + Z, data = dat, method = "del",
B = 100)
Deletion threshold regression
Optimal threshold is 0.394
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.642742 0.235796 2.7258 0.006414 **
a2:Z -0.527387 0.039061 -13.5018 < 2.2e-16 ***
b1:X 0.307045 0.117716 2.6084 0.009098 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Under the assumption that X is independent of Z given X*. See ?thlm for more detail.

Call:
thlm(formula = Y ~ Surv(X, delta) + Z, data = dat, method = "com",
B = 100)
Complete threshold regression
Optimal threshold is 0.481
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.368099 0.304016 1.2108 0.2260
a2:Z -0.521533 0.036569 -14.2617 <2e-16 ***
b1:X 0.185574 0.138638 1.3386 0.1807
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Under the assumption that X is independent of Z given X*. See ?thlm for more detail.
The thlm function allows method = "all" that runs the complete cases analysis, reverse survival regression, deletion threshold regression, and complete threshold regression at once.
> summary(thlm(Y ~ Surv(X, delta) + Z, data = dat, method = "all", B = 100,
+ control = list(t0.plot = FALSE)))Call:
thlm(formula = Y ~ Surv(X, delta) + Z, data = dat, method = "all",
B = 100, control = list(t0.plot = FALSE))
Complete-case regression
Coefficients:
Estimate StdErr z.value p.value
X 0.345912 0.212040 1.6314 0.1028
Z -0.526268 0.040216 -13.0861 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Reverse survival regression
Coefficients:
Estimate StdErr z.value p.value
Y -0.18333 0.10056 -1.823 0.0683 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Deletion threshold regression
Optimal threshold is 0.394
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.642742 0.264088 2.4338 0.014941 *
a2:Z -0.527387 0.039061 -13.5018 < 2.2e-16 ***
b1:X 0.307045 0.117716 2.6084 0.009098 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Complete threshold regression
Optimal threshold is 0.481
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.368099 0.308850 1.1918 0.2333
a2:Z -0.521533 0.036569 -14.2617 <2e-16 ***
b1:X 0.185574 0.138638 1.3386 0.1807
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Under the assumption that X is independent of Z given X*. See ?thlm for more detail.
Can thlm fit model with one censored covariate (\(X\)) without any fully observed covaraite (\(Z\))?
Yes. For example:
> summary(thlm(Y ~ Surv(X, delta), data = dat, method = "all", B = 100,
+ control = list(t0.plot = FALSE)))Call:
thlm(formula = Y ~ Surv(X, delta), data = dat, method = "all",
B = 100, control = list(t0.plot = FALSE))
Complete-case regression
Coefficients:
Estimate StdErr z.value p.value
X 0.10360 0.30109 0.3441 0.7308
Reverse survival regression
Coefficients:
Estimate StdErr z.value p.value
Y -0.059851 0.076203 -0.7854 0.4322
Deletion threshold regression
Optimal threshold is 0.394
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.49723 0.34765 1.4303 0.1526
b1:X 0.23753 0.16906 1.4051 0.1600
Complete threshold regression
Optimal threshold is 0.481
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.082171 0.363501 0.2261 0.8212
b1:X 0.041426 0.196624 0.2107 0.8331
Under the assumption that X is independent of Z given X*. See ?thlm for more detail.
Can thlm fit model with one censored covariate (\(X\)) and multiple fully observed covaraites (\(Z\))?
Yes. For example:
> dat$Z2 <- dat$Z^2
> summary(thlm(Y ~ Surv(X, delta) + Z + Z2, data = dat, method = "all", B = 100,
+ control = list(t0.plot = FALSE)))Call:
thlm(formula = Y ~ Surv(X, delta) + Z + Z2, data = dat, method = "all",
B = 100, control = list(t0.plot = FALSE))
Complete-case regression
Coefficients:
Estimate StdErr z.value p.value
X 0.350485 0.212926 1.6460 0.099754 .
Z -0.608191 0.216904 -2.8040 0.005048 **
Z2 0.011590 0.030151 0.3844 0.700687
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Reverse survival regression
Coefficients:
Estimate StdErr z.value p.value
Y -0.18087 0.10079 -1.7946 0.07272 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Deletion threshold regression
Optimal threshold is 0.394
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.654823 0.253949 2.5786 0.009921 **
a2:Z -0.623451 0.211023 -2.9544 0.003132 **
a3:Z2 0.013604 0.029366 0.4633 0.643169
b1:X 0.312817 0.118646 2.6366 0.008375 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Complete threshold regression
Optimal threshold is 0.481
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.3745097 0.2656319 1.4099 0.158575
a2:Z -0.5685414 0.1997397 -2.8464 0.004422 **
a3:Z2 0.0066448 0.0277542 0.2394 0.810784
b1:X 0.1888058 0.1396247 1.3522 0.176299
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Under the assumption that X is independent of Z given X*. See ?thlm for more detail.
Does the censored covariate (\(X\)) need to be specified first in the formula?
Not nesscary. For exmaple:
> dat$Z2 <- dat$Z^2
> summary(thlm(Y ~ Z + Surv(X, delta) + Z2, data = dat, method = "all", B = 100,
+ control = list(t0.plot = FALSE)))Call:
thlm(formula = Y ~ Z + Surv(X, delta) + Z2, data = dat, method = "all",
B = 100, control = list(t0.plot = FALSE))
Complete-case regression
Coefficients:
Estimate StdErr z.value p.value
X 0.350485 0.212926 1.6460 0.099754 .
Z -0.608191 0.216904 -2.8040 0.005048 **
Z2 0.011590 0.030151 0.3844 0.700687
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Reverse survival regression
Coefficients:
Estimate StdErr z.value p.value
Y -0.18087 0.10079 -1.7946 0.07272 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Deletion threshold regression
Optimal threshold is 0.394
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.654823 0.274355 2.3868 0.016997 *
a2:Z -0.623451 0.211023 -2.9544 0.003132 **
a3:Z2 0.013604 0.029366 0.4633 0.643169
b1:X 0.312817 0.118646 2.6366 0.008375 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Complete threshold regression
Optimal threshold is 0.481
Coefficients:
Estimate StdErr z.value p.value
a1:X 0.3745097 0.2989135 1.2529 0.210241
a2:Z -0.5685414 0.1997397 -2.8464 0.004422 **
a3:Z2 0.0066448 0.0277542 0.2394 0.810784
b1:X 0.1888058 0.1396247 1.3522 0.176299
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Under the assumption that X is independent of Z given X*. See ?thlm for more detail.
Can thlm fit model with more than one censored covariate (\(X\))?
No.