GeneralandLocal:Averagedk-DependenceBayesianClassifiers(SCI检索)资源-CSDN文库

119 浏览量 2021-03-19 22:35:10 上传评论收藏 1.31MB PDF 举报

本文主要讨论了概率依赖规则和k-依赖贝叶斯分类器的理论，并且提出了一个新的分类器——平均k依赖贝叶斯(AKDB)分类器，以提高对条件概率分布的估计准确度，并降低计算复杂度。研究中涉及的关键概念和知识点如下： 1. 贝叶斯网络（Bayesian networks, BNs）：是图形模型的一种，用于表示变量之间的概率依赖关系。贝叶斯网络通过有向无环图（DAG）来表示变量间的依赖结构，每个节点代表一个随机变量，节点之间的有向边代表变量间的依赖关系，而边的权重表示条件概率。 2. NP-hard问题：非确定性多项式时间问题，是指一类问题，其复杂度高到即便是使用非确定性图灵机也无法在多项式时间内解决。在机器学习中，建立通用贝叶斯网络的推理被证明是NP-hard问题，意味着即便寻求近似解也极具挑战性。 3. k依赖贝叶斯（k-dependence Bayesian, KDB）分类器：是一种贝叶斯分类器，它能够在属性依赖性谱的不同点（k的值）构造模型。KDB分类器通过考虑k个属性的依赖关系来提高分类准确性，但它不能识别属性取不同值时相互依赖性的变化。 4. 局部k依赖贝叶斯（Local KDB）分类器：这是一种在KDB框架下学习的分类器，旨在描述每个测试实例中的局部依赖性。局部KDB专注于每条测试数据的特定属性依赖情况，以提高对局部依赖性的识别。 5. 函数依赖规则：在数据库和统计模型中，函数依赖规则用来描述一个变量的取值如何依赖于其他变量的取值。 6. 替换消解法（substitution-elimination resolution）：这是一种新的半朴素贝叶斯操作方法，用于替代或消除一般化，以实现对条件概率分布的准确估计，同时减少计算复杂性。这种方法的目的是在保持分类性能的同时，减少计算量。 7. 平均k依赖贝叶斯（AKDB）分类器：这种分类器通过平均KDB和局部KDB的输出来工作。它融合了两种方法的优势，既能够更好地处理属性值变化带来的依赖性变化，又能够减少计算复杂度。 8. 零一损失（zero-one loss）：这是分类问题中的一个性能指标，用来衡量分类器对测试数据分类错误的频率。AKDB分类器在这方面比朴素贝叶斯（NB）、树增广朴素贝叶斯（TAN）、平均单依赖估计器（AODE）和KDB分类器有显著的优势。 9. 方差（variance）：在统计学中，方差是衡量一组数值与其平均值差异程度的统计量。在这个研究中，KDB和局部KDB显示出在方差方面的互补特性。 10. 学习算法的实现与性能评估：通过在加州大学欧文分校（UCI）的机器学习数据库上进行的实验结果，验证了AKDB分类器的性能，这些结果表明了AKDB在处理分类任务中的优越性。这篇研究论文在统计学习和模式识别领域有着重要的应用价值，特别是在需要处理复杂属性关系和高维数据的场景。通过对属性依赖性进行更精细的建模和计算，AKDB分类器不仅提高了分类准确性，也优化了资源消耗，这对于实际应用来说是极具吸引力的。

资源推荐

资源详情

资源评论

Entropy 2015, 17, 4134-4154; doi:10.3390/e17064134

OPEN ACCESS

entropy

ISSN 1099-4300

www.mdpi.com/journal/entropy

Article

General and Local: Averaged k-Dependence Bayesian Classiﬁers

Limin Wang

1,∗

, Haoyu Zhao

, Minghui Sun

and Yue Ning

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin

University, ChangChun 130012, China; E-Mails: smh@jlu.edu.cn (M.S.); ningyue@jlu.edu.cn (Y.N.)

School of Software, Jilin University, ChangChun 130012, China; E-Mail: zhaohw@jlu.edu.cn

* Author to whom correspondence should be addressed; E-Mail: wanglim@jlu.edu.cn;

Tel.: +86-431-85626892.

Academic Editors: Carlos Alberto de Bragança Pereira and Adriano Polpo

Received: 4 May 2015 / Accepted: 9 June 2015 / Published: 16 June 2015

Abstract: The inference of a general Bayesian network has been shown to be an NP-hard

problem, even for approximate solutions. Although k-dependence Bayesian (KDB) classiﬁer

can construct at arbitrary points (values of k) along the attribute dependence spectrum, it

cannot identify the changes of interdependencies when attributes take different values. Local

KDB, which learns in the framework of KDB, is proposed in this study to describe the

local dependencies implicated in each test instance. Based on the analysis of functional

dependencies, substitution-elimination resolution, a new type of semi-naive Bayesian

operation, is proposed to substitute or eliminate generalization to achieve accurate estimation

of conditional probability distribution while reducing computational complexity. The ﬁnal

classiﬁer, averaged k-dependence Bayesian (AKDB) classiﬁers, will average the output of

KDB and local KDB. Experimental results on the repository of machine learning databases

from the University of California Irvine (UCI) showed that AKDB has signiﬁcant advantages

in zero-one loss and bias relative to naive Bayes (NB), tree augmented naive Bayes (TAN),

Averaged one-dependence estimators (AODE), and KDB. Moreover, KDB and local KDB

show mutually complementary characteristics with respect to variance.

Keywords: k-dependence Bayesian classiﬁer; substitution-elimination resolution;

functionaldependency rules of probability

Entropy 2015, 17 4135

1. Introduction

Bayesian networks (BNs), which were introduced by Pearl [1], can encode dependencies among all

variables. Their success has led to a recent ﬂurry of algorithms for learning BNs from data [2–5].

BN =< N, A, Θ > is a directed acyclic graph with a conditional probability distribution for each node,

collectively represented by Θ that quantiﬁes how much a node depends on its parents. Each node n ∈ N

represents a domain variable, and each arc a ∈ A between nodes represents a probabilistic dependency.

A BN can be used as a classiﬁer that characterizes the joint distribution P (x, y) (In the following

discussion, lower-case letters denote speciﬁc values taken by corresponding attributes. For instance,

represents the event that X

= x

) of class variable Y and a set of attributes X = {X

, X

, · · · , X

and predicts the class label with the highest conditional probability. Denoting the parent nodes of x

by P a(x

), the joint distribution P

(x, y) can be represented by factors over the network structure B,

as follows:

B(x,y)

i=1

P (x

|P a(x

)). (1)

The inference of a general BN has been shown to be an NP-hard problem [6] even for approximate

solutions [7]. However, learning unrestricted BNs does not necessarily lead to a classiﬁer with good

performance. For example, naive Bayes (NB) [8] is the simplest BN, which considers only the

dependence between each attribute X

and the class variable Y . However, Friedman et al. [9] observed

that unrestricted BN classiﬁers do not outperform NB in a large sample of benchmark data sets. Many

BN classiﬁers have been proposed to overcome the limitation of NB. One practical approach for

structure learning is to impose some restrictions on the structures of BNs, for example, learning tree-like

structures. Sahami [10] proposed to describe the limited dependence among variables within a general

framework, which is called k-dependence Bayesian (KDB) classiﬁer. Friedman et al. [9] proposed tree

augmented naive Bayes (TAN), a structure-learning algorithm that learns a maximum spanning tree from

the attributes. Conditional mutual information is applied in these two algorithms to measure the weight

of arcs between predictive attributes. When data size becomes larger, the superiority in high-dependence

representation helps KDB obtain better classiﬁcation performance than TAN.

The key differences between Bayesian classiﬁers are their structure-learning algorithms. Many

criteria, such as Bayesian scoring function [11], minimal description length (MDL) [12] and Akaike

Information Criterion (AIC) [13], have been proposed to ﬁnd out one global graph structure B

that best

characterizes the true distribution of given data. Considering the time and space complexity overhead,

only a limited number of conditional probabilities can be encoded in BN. All credible dependencies must

be represented to obtain a more accurate estimation of the true joint distribution. However, these criteria

can only approximately measure the overall interdependencies between attributes, but cannot identify the

change of interdependencies when attributes take different values. Thus the candidate graph structures

may have very close score values and are non-negligible in the posterior sense [14]. To extend the limited

representation of B

, some researchers proposed to aggregate several candidate BNs together. Averaged

one-dependence estimators (AODE), which were proposed by Webb et al. [15], aggregate the predictions

of all qualiﬁed restricted class of one-dependence estimators. Zheng et al. [16] proposed subsumption

resolution (SR), to efﬁciently identify occurrences of the specialization-generalization relationship and

eliminate generalizations at classiﬁcation time. By introducing Functional Dependency (FD) analysis

Entropy 2015, 17 4136

into the learning procedures, the model interpretability and robustness of different Bayesian classiﬁers

can be improved greatly. After eliminating highly dependent attribute values by applying FD analysis,

the maximal spanning tree (MST) of TAN is rebuilt with the rest of the attribute values for each test

instance. Correspondingly the extraneous effect caused by logical relationships between attribute values

will be mitigated [17]. To evaluate the feasibility of integrating probabilistic reasoning and logical

reasoning into the framework of AODE, we ﬁrst select the branch nodes of MST as the super parents,

then reﬁne AODE by applying FD analysis to delete redundant children attribute [18].

In this paper, local mutual information and conditional local mutual information, which are deduced

from classical information theory, are applied to build the local graph structure B

. B

can be considered

a complementary part of B

, to describe local causal relationships. To construct classiﬁers at arbitrary

points (values of k) along the attribute dependence spectrum, both B

and B

are built in the framework

of KDB model. Substitution-elimination resolution (SER), a new type of semi-naive Bayesian operation

is proposed to substitute or eliminate generalization to achieve accurate estimation of conditional

probability distribution while reducing computational complexity. SER deals only with speciﬁc values

and only in the context of other speciﬁc values. We prove that this adjustment is theoretically correct

and demonstrate experimentally that it can considerably improve zero-one loss, bias and variance.

The remainder of this paper is organized as follows: Section 2 ﬁrst proposes the background theory—

information theory and functional dependency rules of probability, and then clariﬁes the rationality of

SER. Section 3 introduces the basic ideas of KDB, local KDB and the proposed algorithm, averaged

k-dependence Bayesian classiﬁers (AKDB), which averages the output of KDB and local KDB. Section 4

compares various approaches on data sets from the UCI Machine Learning Repository. Finally, Section 5

presents possible future work.

2. Background Theory and Related Research Work

2.1. Information Theory

In the 1940s, Claude E. Shannon introduced information theory [19], the theoretical basis of modern

digital communication. Although Shannon was principally concerned with the problem of electronic

communications, the theory has a broader applicability. Many commonly used measures are based on

the entropy of information theory and used in a variety of classiﬁcation algorithms.

Deﬁnition 1. [19] Entropy of an attribute (or random variable) is a function that attempts to characterize

its unpredictability. When given a discrete random variable X with any possible value x and probability

distribution function P(·), entropy is deﬁned as follows,

H(X) = −

x∈X

P (x)log

P (x) (2)

Deterministic attributes have zero entropy as entropy measures the amount of uncertainty with which

they take some values. Similar to the concept of conditional probability, conditional entropy H(X|Y )

may be understood as the amount of randomness in the random variable X when the value of Y is known.

Entropy 2015, 17 4137

Deﬁnition 2. [19] Given discrete random variables X and Y and their possible values x and y,

conditional entropy is deﬁned as follows:

H(X|Y ) = −

x∈X

y∈Y

P (x, y)log

P (x|y) (3)

Using the deﬁnition of entropy and conditional entropy, we can calculate the amount of information

shared between two attributes. The stronger the correlation, the higher the value of mutual information

will be.

Deﬁnition 3. [19] The mutual information (MI) I(X; Y ) of two random variables is a measure of the

mutual dependence of the variables and is deﬁned as follows:

I(X; Y ) = H(X) − H(X|Y ) =

x∈X

y∈Y

P (x, y)log

P (x, y)

P (x)P (y)

(4)

Mutual information I(X; Y ) between two attributes X and Y measures the expected reduction in

entropy and is nonnegative, i.e., I(X; Y ) ≥ 0. I(X; Y ) = 0 if they are independent and is maximized

if H(X|Y ) = 0. Similar to the deﬁnition of conditional entropy, conditional mutual information

I(X; Y |Z) indicates the amount of information shared between two attributes X and Y when all the

values of attribute Z are known.

Deﬁnition 4. [19] Conditional mutual information (CMI) I(X; Y |Z) is deﬁned as follows:

I(X; Y |Z) =

x∈X

y∈Y

z∈Z

P (x, y, z)log

P (x, y|z)

P (x|z)P (y|z)

(5)

Deﬁnition 5. Local mutual information (LMI) I(X; y) is deﬁned to measure the reduction of entropy

about variable X after observing that Y = y, as follows:

I(X; y) =

x∈X

P (x, y) log

P (x, y)

P (x)P (y)

(6)

Deﬁnition 6. Conditional local mutual information (CLMI) I(x; y|Z) is deﬁned to measure the amount

of information shared between two attribute values x and y when all the values of attribute Z are known,

as follows:

I(x; y|Z) =

z∈Z

P (x, y, z) log

P (x, y|z)

P (x|z)P (y|z)

(7)

2.2. Functional Dependency Analysis and Substitution-Elimination Resolution

Given a data set D, attribute value y is functionally dependent on attribute value x, and x functionally

determines y (in symbols x → y). We demonstrated functional dependency rules of probability

in [17,18] to build a linkage between probabilistic inference and logical inference, and the following

rules are mainly included:

• Representation equivalence of probability: Suppose two attribute values {x, y} and y can be

inferred by x, i.e., the FD x → y holds, then the following joint probability distribution holds:

P (x) = P (x, y) (8)

剩余20页未读，继续阅读

评论收藏

内容反馈

weixin_38747144

粉丝: 4
资源: 938

General and Local: Averaged k-Dependence Bayesian Classifiers (S...

最新资源

General and Local: Averaged k-Dependence Bayesian Classifiers (S...

Attribute Weighting for Averaged One-Dependence Estimators

Extracting Credible Dependencies for Averaged One-Dependence Estimator Analysis (SCI检索)

Averaged One-Dependence Estimators AODE-开源

Averaged N-Dependence Estimators - AnDE:AnDE实现A1DE和A2DE-开源

nltk-data averaged-perceptron-tagger

First-principles quantum transport method for disordered nanoelectronics: Disorder-averaged transmission, shot noise, and device-to-device variability

开放式芝麻：基于softmax-margin SegRNN的帧语义解析系统

python实现的基于倒排索引和向量空间模型实现的信息检索系统+源代码+文档说明

averaged-perceptron:具有平均感知器算法的线性分类器

Bidirectional DC-DC Power Converter Design Optimization Modeling and Control.pdf

一本基于matlab的数理统计电子书-Crc Press - Computational Statistics Handbook With Matlab -.part4.rar

一本基于matlab的数理统计电子书-Crc Press - Computational Statistics Handbook With Matlab -.part1.rar

一本基于matlab的数理统计电子书-Crc Press - Computational Statistics Handbook With Matlab -.part2.rar

一本基于matlab的数理统计电子书-Crc Press - Computational Statistics Handbook With Matlab -.part3.rar

CCM Averaged-switch model with PSPICE

心率监测器转接板电路+PCB源文件+源代码-电路方案

基于CORDIC的反正弦和反余弦计算的FPGA实现

使用3DCNN和卷积LSTM进行手势识别学习时空特征

BA无标度网络中的SIR模型

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于维纳过程的退化模型，具有递归过滤算法，可用于估计剩余使用寿命

基于FPGA的奇异值和特征值分解的快速实现。

基于BP神经网络的人口预测

磁悬浮系统自适应模糊PID控制器的设计

最新资源