没有合适的资源?快使用搜索试试~ 我知道了~
在模式识别,机器等领域,特征选择是一个具有挑战性的问题学习和数据挖掘。 考虑粗糙集引入的一致性度量理论上,特征选择问题(也称为属性约简)旨在保留原始功能的歧视性。 许多启发式属性约简算法然而,已经经常提出这些方法在计算上很费时。 为了克服这个缺点,我们引入了一个基于粗糙集理论,称为正逼近,可用于加速启发式属性约简的过程。 基于建议的加速器,一般属性设计了约简算法。 通过使用加速器,几个代表粗糙集理论中的启发式属性约简算法得到了增强。 笔记每个修改后的算法都可以选择与其原始属性相同的属性约简版本,因此具有相同的分类精度。 实验表明,这些修改后的算法要优于原始算法。 值得注意的是当处理更大的算法时,改进算法的性能变得更加明显数据集。
资源推荐
资源详情
资源评论
This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
Artificial Intelligence 174 (2010) 597–618
Contents lists available at ScienceDirect
Artificial Intelligence
www.elsevier.com/locate/artint
Positive approximation: An accelerator for attribute reduction in rough
set theory
Yuhua Qian
a,c
, Jiye Liang
a,∗
,WitoldPedrycz
b
, Chuangyin Dang
c
a
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Taiyuan, 030006, Shanxi, China
b
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada
c
Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Hong Kong
article info abstract
Article history:
Received 15 July 2009
Received in revised form 6 April 2010
Accepted 7 April 2010
Available online 9 April 2010
Keywords:
Rough set theory
Attribute reduction
Decision table
Positive approximation
Granular computing
Feature selection is a challenging problem in areas such as pattern recognition, machine
learning and data mining. Considering a consistency measure introduced in rough set
theory, the problem of feature selection, also called attribute reduction, aims to retain the
discriminatory power of original features. Many heuristic attribute reduction algorithms
have been proposed however, quite often, these methods are computationally time-
consuming. To overcome this shortcoming, we introduce a theoretic framework based on
rough set theory, called positive approximation, which can be used to accelerate a heuristic
process of attribute reduction. Based on the proposed accelerator, a general attribute
reduction algorithm is designed. Through the use of the accelerator, several representative
heuristic attribute reduction algorithms in rough set theory have been enhanced. Note
that each of the modified algorithms can choose the same attribute reduct as its original
version, and hence possesses the same classification accuracy. Experiments show that these
modified algorithms outperform their original counterparts. It is worth noting that the
performance of the modified algorithms becomes more visible when dealing with larger
data sets.
© 2010 Elsevier B.V. All rights reserved.
1. Introduction
Feature selection, also called attribute reduction, is a common problem in pattern recognition, data mining and machine
learning. In recent years, we encounter databases in which both the number of objects becomes larger and their dimen-
sionality (number of attributes) gets larger as well. Tens, hundreds, and even thousands of attributes are stored in many
real-world application databases [6,12,37]. Attributes that are irrelevant to recognition tasks may deteriorate the perfor-
mance of learning algorithms [44,45]. In other words, storing and processing all attributes (both relevant and irrelevant)
could be computationally very expensive and impractical. To deal with this issue, as was pointed out in [20], some at-
tributes can be omitted, which will not seriously impact the resulting classification (recognition) error, cf. [20]. Therefore,
the omission of some attributes could not only be tolerable but even desirable relatively to the costs involved in such
cases [32].
In feature selection, we encounter two general strategies, namely wrappers [16] and filters. The former employs a
learning algorithm to evaluate the selected attribute subsets, and the latter selects attributes by being guided by some sig-
nificance measures such as information gain [23,46], consistency [6,41], distance [15], dependency [30], and others. These
*
Corresponding author. Tel.: +86 0351 7018176; fax: +86 0351 7018176.
E-mail addresses: jinchengqyh@sxu.edu.cn (Y.H. Qian), ljy@sxu.edu.cn (J.Y. Liang), pedrycz@ee.ualberta.ca (W. Pedrycz), mecdang@cityu.edu.hk
(C.Y. Dang).
0004-3702/$ – see front matter
© 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.artint.2010.04.018
Author's personal copy
598 Y.H. Qian et al. / Artificial Intelligence 174 (2010) 597–618
measures can be divided into two main categories: distance-based measures and consistency-based measures [20]. Rough
set theory by Pawlak [33–36] is a relatively new soft computing tool for the analysis of a vague description of an object,
and has become a popular mathematical framework for pattern recognition, image processing, feature selection, neuro-
computing, data mining and knowledge discovery from large data sets [7,11,31]. Attribute reduction in rough set theory
offers a systematic theoretic framework for consistency-based feature selection, which does not attempt to maximize the
class separability but rather attempts to retain the discernible ability of original features for the objects from the universe
[13,14,53].
Generally speaking, one always needs to handle two types of data, viz. those that assume numerical values and symbolic
values. For numerical values, there are two types of approaches. One relies on fuzzy rough set theory, and the other is
concerned with the discretization of numerical attributes. In order to deal with numerical attributes or hybrid attributes,
several approaches have been developed in the literature. Pedrycz and Vukovich regarded features as granular rather than
numerical [37]. Shen and Jenshen generalized the dependency function in classical rough set framework to the fuzzy case
and proposed a fuzzy-rough QUICKREDUCT algorithm [13,14,48]. Bhatt and Gopal provided a concept of fuzzy-rough sets
formed on compact computational domain, which is utilized to improve the computational efficiency [3,4]. Hu et al. pre-
sented a new entropy to measure of the information quantity in fuzzy sets [21] and applied this particular measure to
reduce hybrid data [22]. Data discretization is another important approach to deal with numerical values, in which we usu-
ally discretize numerical values into several intervals and associate the intervals with a set of symbolic values, see [5,28]. In
the “classical” rough set theory, the attribute reduction method takes all attributes as those which assume symbolic values.
Through preprocessing of original data, one can use the classical rough set theory to select a subset of features that is the
most suitable for a given recognition problem.
In the last twenty years, many techniques of attribute reduction have been developed in rough set theory. The concept
of the
β-reduct proposed by Ziarko provides a suite of reduction methods in the variable precision rough set model [60].
An attribute reduction method was proposed for knowledge reduction in random information systems [57]. Five kinds of
attribute reducts and their relationships in inconsistent systems were investigated by Kryszkiewicz [18], Li et al. [24] and
Mi et al. [29], respectively. By eliminating some rigorous conditions required by the distribution reduct, a maximum distri-
bution reduct was introduced by Mi et al. in [29]. In order to obtain all attribute reducts of a given data set, Skowron [49]
proposed a discernibility matrix method, in which any two objects determine one feature subset that can distinguish them.
According to the discernibility matrix viewpoint, Qian et al. [42,43] and Shao et al. [47] provided a technique of attribute
reduction for interval ordered information systems, set-valued ordered information systems and incomplete ordered infor-
mation systems, respectively. Kryszkiewicz and Lasek [17] proposed an approach to discovery of minimal sets of attributes
functionally determining a decision attribute. The above attribute reduction methods are usually computationally very ex-
pensive, which are intolerable for dealing with large-scale data sets with high dimensions. To support efficient attribute
reduction, many heuristic attribute reduction methods have been developed in rough set theory, cf. [19,20,22,25,26,39,52,
54–56]. Each of these attribute reduction methods can extract a single reduct from a given decision table.
1
For convenience,
from the viewpoint of heuristic functions, we classify these attribute reduction methods into four categories: positive-region
reduction, Shannon’s entropy reduction, Liang’s entropy reduction and combination entropy reduction. Hence, we review
only four representative heuristic attribute reduction methods.
(1) Positive-region reduction
The concept of positive region was proposed by Pawlak in [33], which is used to measure the significance of a condition
attribute in a decision table. While the idea of attribute reduction using positive region was originated by J.W. Grzymala-
Busse in [9] and [10], and the corresponding algorithm ignores the additional computation required for selecting significant
attributes. Then, Hu and Cercone [19] proposed a heuristic attribute reduction method, called positive-region reduction,
which remains the positive region of target decision unchanged. The literature [20] gave an extension of this positive-
region reduction for hybrid attribute reduction in the framework of fuzzy rough set. Owing to the consistency of ideas and
strategies of these methods, we regard the method from [19] as their representative. These reduction methods are the first
attempt to heuristic attribute reduction algorithms in rough set theory.
(2) Shannon’s entropy reduction
The entropy reducts have first been introduced in 1993/1994 by Skowron in his lectures at Warsaw University. Based
on the idea, Slezak introduced Shannon’s information entropy to search reducts in the classical rough set model [50–52].
Wang et al. [54] used conditional entropy of Shannon’s entropy to calculate the relative attribute reduction of a decision
information system. In fact, several authors also have used variants of Shannon’s entropy or mutual information to measure
uncertainty in rough set theory and construct heuristic algorithm of attribute reduction in rough set theory [22,55,56]. Here
1
The attribute reduct obtained preserves a particular property of a given decision table. However, as Prof. Bazan said, from the viewpoint of stabilityof
attribute reduct, the selected reduct may be of bad quality [1,2]. To overcome this problem, Bazan developed a method for dynamic reducts to get a stable
attribute reduct from a decision table. How to accelerate the method for dynamic reducts is an interesting topic in further work.
Author's personal copy
Y.H. Qian et al. / Artificial Intelligence 174 (2010) 597–618 599
we only select the attribute reduction algorithm in the literature [54] as their representative. This reduction method remains
the conditional entropy of target decision unchanged.
(3) Liang’s entropy reduction
Liang et al. [25] defined a new information entropy to measure the uncertainty of an information system and applied
the entropy to reduce redundant features [26]. Unlike Shannon’s entropy, this information entropy can measure both the
uncertainty of an information system and the fuzziness of a rough decision in rough set theory. This reduction method can
preserve the conditional entropy of a given decision table. In fact, the mutual information form of Liang’s entropy also can
be used to construct a heuristic function of an attribute reduction algorithm. For simplicity, we here ignore its discussion.
(4) Combination entropy reduction
In general, the objects in an equivalence class cannot be distinguished each other, but the objects in different equivalence
classes can be distinguished each other in rough set theory. Therefore, in a broad sense, the knowledge content of a given
attribute set can be characterized by the entire number of pairs of the objects which can be distinguished each other on
the universe. Based on this consideration, Qian and Liang [39] presented the concept of combination entropy for measuring
the uncertainty of information systems and used its conditional entropy to select a feature subset. This reduction method
can obtain an attribute subset that possesses the same number of pairs of the elements which can be distinguished each
other as the original decision table. This measure focuses on a completely different point of view, which is mainly based on
the intuitionistic knowledge content nature of information gain.
Each of these above methods preserves a particular property of a given information system or a given decision table.
However, these above methods are still computationally very expensive, which are intolerable for dealing with large-scale
data sets with high dimensions. In this paper, we are not concerned with how to discretize numerical attributes and con-
struct a heuristic function for attribute reduction. Our objective is to focus on how to improve the time efficiency of a
heuristic attribute reduction algorithm. We propose a new rough set framework, which is called positive approximation.
The main advantage of this approach stems from the fact that this framework is able to characterize the granulation struc-
ture of a rough set using a granulation order. Based on the positive approximation, we develop a common accelerator for
improving the time efficiency of a heuristic attribute reduction, which provides a vehicle of making algorithms of rough
set based feature selection techniques faster. By incorporating the accelerator into each of the above four representative
heuristic attribute reduction methods, we construct their modified versions. Numerical experiments show that each of the
modified methods can choose the same attribute subset as that of the corresponding original method while greatly reducing
computing time. We would like to stress that the improvement becomes more profoundly visible when the data sets under
discussion get larger.
The study is organized as follows. Some basic concepts in rough set theory are briefly reviewed in Section 2. In Section 3,
we establish the positive approximation framework and investigate some of its main properties. In Section 4, through
analyzing the rank preservation of four representative significance measures of attributes, we develop a general modified
attribute reduction algorithm based on the positive approximation. Experiments on nine public data sets show that these
modified algorithms outperform their original counterparts in terms of computational time. Finally, Section 5 concludes this
paper by bringing some remarks and discussions.
2. Preliminaries
In this section, we will review several basic concepts in rough set theory. Throughout this paper, we suppose that the
universe U is a finite nonempty set.
Let U be a finite and nonempty set called the universe and R
⊆ U × U an equivalence relation on U .ThenK =U , R is
called an approximation space [33–36]. The equivalence relation R partitions the set U into disjoint subsets. This partition
of the universe is called a quotient set induced by R,denotedbyU
/R. It represents a very special type of similarity between
elements of the universe. If two elements x
, y ∈ U (x = y) belong to the same equivalence class, we say that x and y are
indistinguishable under the equivalence relation R, i.e., they are equal in R. We denote the equivalence class including x
by
[x]
R
. Each equivalence class [x]
R
may be viewed as an information granule consisting of indistinguishable elements [59].
The granulation structure induced by an equivalence relation is a partition of the universe.
GivenanapproximationspaceK
=U , R and an arbitrary subset X ⊆ U , one can construct a rough set of the set on the
universe by elemental information granules in the following definition:
R X =
{[x]
R
|[x]
R
⊆ X},
RX =
{[x]
R
|[x]
R
∩ X = ∅},
where R X and RX are called R-lower approximation and R-upper approximation with respect to R, respectively. The order
pair
R X, RX is called a rough set of X with respect to the equivalence relation R. Equivalently, they also can be written
as
R X ={x |[x]
R
⊆ X},
RX ={x |[x]
R
∩ X = ∅}.
剩余22页未读,继续阅读
资源评论
weixin_38606076
- 粉丝: 4
- 资源: 942
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功