随机相关系数集合用于变量选择资源-CSDN文库

74 浏览量 2021-03-24 02:14:25 上传评论收藏 394KB PDF 举报

在数据分析与统计建模中，变量选择是建立预测模型的核心问题之一。变量选择的目的在于从众多候选变量中挑选出对目标变量有预测能力的变量子集，以减少模型复杂性、提高预测准确性和解释性。本文将讨论一种新颖的变量选择方法——随机相关系数集合（Stochastic Correlation Coefficient Ensembles）在变量选择中的应用。本文提出了一个“最大化相关性和最小化共同冗余”的新准则，用于线性模型中的变量选择问题。该准则旨在挑选出与目标变量高度相关的变量，同时减少变量间的信息冗余。这种准则的实现方法是，通过构建一个变量选择集合（Variable Selection Ensemble, VSE），结合提出的随机相关系数算法与随机逐步算法。接下来，我们详细探讨本文所提出的随机相关系数集合变量选择方法。该方法的核心在于利用随机相关系数作为衡量变量重要性的指标。随机相关系数方法不同于传统的相关系数计算，它通过考虑变量的随机性来增加算法的鲁棒性。在变量选择的过程中，算法将挑选出那些与目标变量相关性高，而与其他已选变量冗余度低的变量。 Lasso（最小绝对收缩和选择算子）和LARS（最小角回归）是目前在变量选择领域中较为流行和有前景的方法。它们通过执行变量选择和收缩两个步骤，在多种变量选择问题中展示了其优越性和深刻的实证表现。然而，这两种方法也存在局限性。例如，在模型包含多个高相关性信息变量时，Lasso倾向于只选择其中的一个或少数几个，而排除了其他变量。为了提升Lasso的性能，许多研究者提出了新的算法，如弹性网络（Elastic Net）等，而本文的方法也是为了解决类似的问题。文章中的实验部分使用了两个模拟研究和四个真实数据集，对提出的VSE方法与其他方法进行了广泛的比较。结果显示，所提的VSE方法在变量选择和回归准确性方面都有显著的提升。这些实验结果进一步支持了随机相关系数集合在变量选择中的有效性。由于文章的部分内容来自于OCR扫描识别的结果，可能会存在一些识别错误或漏识别的情况，因此，在解读文章时应考虑可能存在的文字错误，并尝试理解文章的原始含义，确保能够正确把握研究的关键点和细节。例如，文中提到的Lasso和LARS的局限性，以及弹性网络的提及，表明了研究者在探究变量选择方法时需要考虑不同方法的适用条件和限制。文章中提到的关键词，包括LASSO、排名、随机相关系数、集合、最大相关性、最小冗余等，都是统计学和机器学习领域内与变量选择密切相关的专业术语。这些关键词不仅揭示了研究的方向和重点，也表明了文章在学术研究中的定位和可能的贡献。本研究通过对随机相关系数集合在变量选择中的应用进行探究，提出了一种新的变量选择方法，通过实验验证了该方法在提升模型预测准确性和减少变量冗余方面的有效性，为统计建模和机器学习提供了新的思路和工具。

资源推荐

资源详情

资源评论

JOURNAL OF APPLIED STATISTICS, 2017

VOL. 44, NO. 10, 1721–1742

http://dx.doi.org/10.1080/02664763.2016.1221913

Stochastic correlation coefficient ensembles for

variable selection

JinXing Che and YouLong Yang

School of Mathematics and Statistics, Xidian University, Xi’an, People’s Republic of China

ABSTRACT

In this paper, we propose a novel Max-Relevance and Min-Common-

Redundancy criterion for variable selection in linear models. Consid-

ering that the ensemble approach for variable selection has been

proven to be quite effective in linear regression models, we construct

a variable selection ensemble (VSE) by combining the presented

stochastic correlation coefficient algorithm with a stochastic step-

wise algorithm. We conduct extensive experimental comparison of

our algorithm and other methods using two simulation studies and

four real-life data sets. The results confirm that the proposed VSE

leads to promising improvement on variable selection and regres-

sion accuracy.

ARTICLE HISTORY

Received 4 September 2015

Accepted 3 August 2016

KEYWORDS

LASSO; ranking; stochastic

correlation coeﬃcient

ensemble; maximal

relevance; minimal

redundancy

1. Introduction

In regression problems, the presentation of the target variable is characterized by a number

of variable or attribute candidates. A critical problem is how to select a variable subset to

discriminate the target variable. The above problem is called variable selection (or feature

selection, among many other names) [

9,19, 43].

The most commonly used method may be the coecient shrinkage [

10,13,18,35,47,48],

which executes both variable selection and shrinkage in a single procedure [

44]. Lasso

(least absolute shrinkage and selection operator) and LARS (least-angle regression) are

the most promising and popular members in this eld [

10,34]. Although they have shown

attractive features and profound empirical performances in a wide variety of variable selec-

tion problems, they also exist some limitations in some situations. For example, if the model

includes some informative variables with a high correlation, then Lasso tends to select only

one or a few of them while excluding the rest ones [

44]. To improve the performance of

the Lasso, many researchers have proposed new algorithms, such as the elastic net [

47],

adaptive Lasso [

46], VISA [27,31], relaxed Lasso [23], and so on.

From another point of view, the variable selection can be also viewed as a discrete opti-

mization problem with a solution space that spans all 2

possible subsets from a candidate

pool of m variables. The objective function of the discrete optimization problem varies

with the context of the valuation criterion: including minimizing the error sum-of-squares,

CONTACT YouLong Yang ylyang@mail.xidian.edu.cn, youlongyang2015@163.com School of Mathematics and

Statistics, Xidian University, 266 Xinglong Section of Xifeng Road, Xi’an710126, People’s Republic of China

1722 J. CHE AND Y. YANG

AIC, BIC in regression, etc. [

26]. Then a search method is needed to nd the best variable

subset from all these 2

possible subsets [5]. Because the number of variable subset candi-

dates is 2

, it is dicult to search the variable subset exhaustively for large m. This brings

about such challenges as reducing computational complexity while extracting compact yet

eective variables.

It is well known that the stepwise selection has lower computational complexity than

the best subset selection. However, the stepwise selection actually selects a sub-optimal

subset due to the obtained nested sequence [

39]. To improve the traditional stepwise selec-

tion, Xin and Zhu propose a stochastic stepwise ensemble (ST2E). This ensemble method

randomly includes or excludes a group of variables at each step, where the group size is

randomly determined [

42]. For a xed group size, the group candidates can be quite large.

Instead of searching all the group candidates exhaustively, they only need to evaluate a

few randomly selected subsets. Then the best one is chosen from the selected subsets.

As demonstrated by a special example, Xin and Zhu nd that the globally optimal subset

with exhaustive search may include some junk variables. It is partly eciency and partly

lucky that the ensemble method could nd the best subset without exhaustive search. This

ensemble learning algorithm has received our considerable attention due to its potential to

signicantly improve the performance of traditional procedure.

Inspired by the great success of ensemble learning algorithms in solving forecasting

problems [

1,3,16,25], the idea of ensemble learning has been recently introduced to exe-

cute variable selection [

24,32,40,45]. In general, variable selection ensembles (VSEs) allow

each optimization path (i.e. each ensemble) to generate sub-optimal rather than opti-

mal solution, while make these solutions of VSEs as dierent as possible. In other words,

each ensemble does not need search the variable subset exhaustively, alternatively, it only

need do a simple search to get a good strength-diversity tradeo among all the ensem-

bles. According to this strength-diversity tradeo, to improve a VSE, all the ensemble

members must be as good variable selectors as possible, meanwhile, they must also dis-

agree with each other as much as possible [

44]. As stated in [21], improving the strength

of VSEs’ members while keeping their diversity will design a better VSE. Based on the

above idea, we propose a novel ensemble learning framework by injecting an informa-

tion measurement criterion into an ST2E with the aim to improve the strength of each

ensemble.

In probability theory and information theory, the dependence measurement between

two random variables is a fundamental and interesting problem. It has many applications in

statistics, signal processing, economics and so on. The most popular and classical measure-

ment for nonlinear and linear dependence is the mutual information and the correlation

coecient, respectively. Mutual information (MI) between two variables X and Y mea-

sures how similar the joint distribution p(X, Y) is to the products of factored marginal

distribution p(X)p(Y), so the MI of two random variables is a generalised measurement of

the variables’ mutual dependence [

33]. Not limited to linear dependence relationship, such

dependence includes linear or nonlinear relationship. Kojadinovic measures the similar-

ity for agglomerative hierarchical clustering of continuous variables by using the notion of

mutual information (MI) [20]. Typically, the MI-based variable selection algorithms can

build a lter method by the estimation of the MI between a variable candidate and the tar-

get variable (see the literature [

2,11,38]). However, the performance of the above selection

algorithms will be degraded as a result of large errors in estimating the mutual information

JOURNAL OF APPLIED STATISTICS 1723

[

22]. Moreover, the value of MI is related to the variable dimension, and is outside the

range [0, 1].

Correlation coecient can only be employed as a measurement of linear dependence

between two variables. There are two simple ways of lifting this restriction [

15]: One can

make a nonlinear t of the target with single variables and rank these variables according to

the goodness of t. Alternatively, one can employ nonlinear preprocessing (e.g. squaring,

taking the square root, the log, the inverse, etc.) and then use a correlation coecient. As a

result, this correlation criteria are commonly employed in dierent elds due to its simplic-

ity, low computational cost and ease of estimation [

14]. Moreover, as illustrated by Weston

et al. [

41], microarray data analysis always employs the correlation coecient criteria as a

measurement [

41]. Thus, correlation criteria is an important and meaningful study topic.

We focus on the correlation coecient for variable selection in this paper. The statistical

or probabilistic characteristics of databases are measured by the correlation-based criteria,

then the search of the best variable subset is guided by the AIC/BIC performance.

In this paper, we have improved the popular Max-Relevance and Min-Redundancy cri-

terion [

30,37]. As variable subset selection is based on the amount of information with

respect to the target variable, we pay close attention to the part of the common redundancy

information regarding the dependent variable. This common redundancy is the common

information among the variable candidate, the selected variable subset and the dependent

variable. Obviously, the accurate computation of this common redundancy is a challeng-

ing task. In order to maintain its uncertainty, we propose a stochastic correlation coecient

(SCC) method to construct a novel Max-Relevance and Min-Common-Redundancy crite-

rion. This criterion can improve the strength of VSEs’ members, meanwhile, the stochastic

property of SCC can keep its diversity.

The rest of the study is organized as follows. Section

2 reviews the related work of Max-

Relevance and Min-Redundancy-based variable subset selection. Section

3 proposes the

SCC method. Section

4 presents the VSE guided by SCC and stochastic stepwise (ST2).

The simulation studies are given in Section

5. The nal conclusion is drawn in Section 6.

2. Related work of Max-Relevance and Min-Redundancy-based variable

subset selection

For the variable selection problem, let X

, i = 1, 2, . . . , p be all the independent variables,

Y be the target/dependent variable. Now if a variable selection algorithm is employed to

select variable progressively, any independent variable would be selected or non-selected

state in a certain stage.

Denition 2.1: The set S is called as a selected variable subset, and the set T is called as a

non-selected variable subset i S ∪ T = {X

, i = 1, 2, . . . , p} and S ∩ T = Ø, where all the

selected variables are contained in S and all the non-selected variables are contained in T.

Thus, if we are doing a forward (backward) step, then the potential predictors to be

added (deleted) are contained in T (S).

In recent years, several variable subset selection approaches based on the Max-

Relevance and Min-Redundancy criterion are proposed for the classication problem. In

all of these approaches, the Max-Relevance and Min-Redundancy terms are measured by

1724 J. CHE AND Y. YANG

mutual information (MI) method, respectively: the term on the left side computes the rel-

evance between the variable to be selected and the dependent variable, and the term on the

right side computes the redundancy of this variable candidate with respect to the subset of

previously selected variables.

Denition 2.2 ([

8]): Given discrete random variables X with domain 

and Y with

domain 

, the entropy of X is dened as:

H(X) = −6

x∈

p(x) log

(p(x)), (1)

and the conditional entropy of X given Y is dened as

H(X | Y) = −6

x∈

y∈

p(x, y) log

(p(x | y)). (2)

Then, the mutual information between X and Y can be dened as

I(X; Y) = H(X) − H(X | Y). (3)

Denition 2.3: Let X

∈ T be a variable candidate, Y be the class attribute, and X

∈ S be

a selected variable. We say:

(1) Relevance(X

; Y) between any variable candidate X

and the class Y is called a Rele-

vance term, measured by using an information measurement.

(2) Redundancy(X

; X

) between any pair of variable candidates X

and X

is called a

Redundancy term, measured by using an information measurement.

To select an optimal variable subset, Battiti [2] rstly proposed a heuristic MI approxi-

mation of Max-Relevance and Min-Redundancy, the MIFS, represented by

f (X

) = I(X

; Y) − β

∈S

I(X

; X

), (4)

where function f estimates the goodness of variable X

, and β adjusts the subtraction

comparability between relevance and redundancy.

The selection of parameter β is a diculty. Large β makes the above algorithm tend to

select variables based on minimum redundancy, and small β makes the above algorithm

tend to select variables based on maximum relevance. To solve the above diculty, Peng

et al. [

30] proposed the following criterion (mRMR):

f (X

) = I(X

; Y) −

|S|

∈S

I(X

; X

). (5)

By using the denition of MI, Estevez et al. [

11] obtained the following interval of MI:

0 ≤ I(X

; X

) ≤ min{H(X

); H(X

)} (6)

and produced a normalized value in the range [0, 1] by dividing the minimum value of the

entropies, then presented a normalized selection strategy dened as follows:

f (X

) = I(X

; Y) −

|S|

∈S

I(X

; X

)

min{H(X

); H(X

)}

(7)

剩余21页未读，继续阅读

评论收藏

内容反馈

weixin_38517095

粉丝: 4
资源: 936

随机相关系数集合用于变量选择

离散型随机变量的概率密度函数及其应用

随机信号分析基础课件：1_3 随机变量及其概率分布.ppt

随机试验和随机变量.pdf

随机信号分析基础课件：第一章 概率与随机变量.ppt

高中数学讲义微专题87 离散型随机变量分布列与数字特征.pdf

随机信号分析基础课件：2_1 从随机变量到随机过程.ppt

随机信号分析基础课件：第一章 概率与随机变量复习课.ppt

概率随机变量与随机过程Solution manul.rar

部编版第三章 多维随机变量及其分布考研试题及答案.doc

随机信号分析（常建平）课后答案

概率、随机变量与随机过程 答案

随机信号分析与应用第一章答案

随机信号分析课件

随机信号分析习题解答

变量选择,变量选择方法,R language源码.zip

西电随机信号分析答案

随机信号分析实验讲义

随机信号分析试题及答案

随机信号分析

随机信号分析（常建平 李海林）习题答案

北理工《随机信号分析》复习总结.pdf

电子科技大学 随机信号分析 习题练习 带答案test04.rar

随机信号分析与处理教材答案_答案_carefullypkz_随机信号分析与处理_

2016随机信号分析期末B卷答案.docx

基于CORDIC的反正弦和反余弦计算的FPGA实现

BA无标度网络中的SIR模型

使用3DCNN和卷积LSTM进行手势识别学习时空特征

最新资源

随机信号分析基础课件：第一章概率与随机变量.ppt

随机信号分析基础课件：第一章概率与随机变量复习课.ppt

部编版第三章多维随机变量及其分布考研试题及答案.doc

概率、随机变量与随机过程答案

随机信号分析（常建平李海林）习题答案

电子科技大学随机信号分析习题练习带答案test04.rar