Nonidentifiability in the presence of factorization for truncated data


A time to event, $X$⁠, is left-truncated by $T$ if $X$ can be observed only if $T<X$⁠. This often results in oversampling of large values of $X$⁠, and necessitates adjustment of estimation procedures to avoid bias. Simple risk-set adjustments can be made to standard risk-set-based estimators to accommodate left truncation when $T$ and $X$ are quasi-independent. We derive a weaker factorization condition for the conditional distribution of $T$ given $X$ in the observable region that permits risk-set adjustment for estimation of the distribution of $X$⁠, but not of the distribution of $T$⁠. Quasi-independence results when the analogous factorization condition for $X$ given $T$ holds also, in which case the distributions of $X$ and $T$ are easily estimated. While we can test for factorization, if the test does not reject, we cannot identify which factorization condition holds, or whether quasi-independence holds. Hence we require an unverifiable assumption in order to estimate the distribution of $X$ or $T$ based on truncated data. This contrasts with the common understanding that truncation is different from censoring in requiring no unverifiable assumptions for estimation. We illustrate these concepts through a simulation of left-truncated and right-censored data.

In Biometrika