1618 Lin Y Y et al. Sci China Math September 2018 Vol. 61 No. 9
selector (see [3]), group Lasso (see [26]), adaptive Lasso (see [32]) and their variants. Variable selection
methods for censored outcomes are thoroughly studied by [7, 14, 22, 27]. For moderate or large p, the
optimization problems associated with the penalized approaches can be solved effectively and quickly.
However, when p grows exponentially fast with n, the aforementioned penalized methods encounter
computational complexity in handling such ultra-high-dimensional data. Feature screening methods are
particularly designed to reduce the high-dimensionality to a moderate scale. Fan and Lv [8] proposed
the sure independence screening (SIS) and the iterated sure independence screening (ISIS) for linear
regression by ranking the marginal correlations of each predictor with the response variable. These
methods are clearly motivated and are extended to the generalized linear model by [9, 10]. However, it
is known that correlation may not b e a robust measure for association. Their performance might be
influenced by outliers in the responses and predictors.
Recently, several important findings regarding model-free feature screening were reported in the liter-
ature (see [4, 17, 18, 20, 29–31] among many others). In particular, Zhu et al. [31] proposed a model-free
feature screening method under a unified model framework, which is indeed novel. Without the specifi-
cation of a particular model structure, the proposal of [31] is theoretically and practically app ealing for
feature screening, especially when there are huge number of candidate variables. He et al. [12] introduced
an intriguing quantile-adaptive model-free variable screening framework for high-dimensional heteroge-
neous data. This framework can be extended to handle survival data under a conditional independence
assumption of the response and censoring variable given the covariates. Song et al. [20] studied a model-
free rank independence screening based on an inverse probability weighted Kendall’s τ rank correlation
for high-dimensional survival data. Feature screening methods for high-dimensional survival data based
on Cox’s partial likelihood can be found in, for example [11, 23, 28].
In this paper, partly motivated by the interesting work of Zhu et al. [31] and He et al. [12], we intro-
duce a unified and robust model-free feature screening approach to handle high-dimensional survival data.
There are several advantages of this method. First, it is a model-free screening approach based on an
inverse-probability-weighted correlation measure, and hence avoids the complication to specify a working
model with huge number of candidate variables. Second, the new method does not involve any nonpara-
metric estimation except the estimation of the conditional survival function of the censoring variable given
a predictor, where a lo cal Kaplan-Meier estimator is used. Third, under very mild conditions without
requiring the existence of any moment of the response variable, we prove the prop osed method enjoys
the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose
a conditional independence assumption of the response and the censoring variable given the covariates,
instead of assuming the fully independence assumption of the censoring variable and the response and
the covariates that is common in the literature. Moreover, we also propose a more robust variant to the
new procedure, which is proved to possess desirable theoretical properties without any finite moment
condition of the predictors and the response. Hence, the proposed methods are robust to outliers in the
response and predictors.
The rest of the paper is organized as follows. In Section 2, we introduce the proposed censored
model-free feature screening procedure and its more robust variant. Their theoretical properties are
also discussed in Section 2. Extensive simulation studies are carried out to verify the finite sample
performance of the new methods in Section 3. In Section 4, we demonstrate an application to a genetic
data set. Section 5 contains discussions and a few concluding remarks. All the technical proofs are
deferred to App endixes A and B.
2 Methodology and main results
Let T be the time to event of interest, C be the censoring time, and x = (X
1
, . . . , X
p
)
T
be the p-
dimensional predictor vector. We assume that T is subject to random right censoring. The observed data
are (x
T
i
, Y
i
, ∆
i
) for i = 1, . . . , n, independently and identically distributed copies of (x
T
, Y, ∆), where
Y = min(T, C), ∆ = I(T 6 C) and I(·) is the indicator function. Throughout this article, it is assumed
评论0
最新资源