k-Nearest Neighbor Classification

所需积分/C币:25 2015-03-10 15:01:31 973KB PDF
收藏 收藏 1

this paper, the problem of classifying an unseen pattern on the basis of its nearest neighbors in a recorded data set is addressed from the point of view of Dempster-Shafer theory. Each neighbor of a sample to be classified is considered as an item of evidence that supports certain hypotheses regard
N(6 JEEE TRANSAC IIONS ON SYSIEMS, MAN. AND CYBERNETICS. VOL, 25. NO. 4 MAY 1995 As demonstrated by Shafer [19 any one of the three functions degree of beliet--the most credible, or select the hypothesis an. Bcl and Pi is sufficient to recover the other two. This with the lowest degree of doubt-the most plausible. follows from the definition of Pl(A as 1-Bel(a)and This analysis can be extended to decision with costs. In the frame work of D-S theory, there is nothing strictly equivalent n(4)=∑(-1)4lBDe!(B 9)to Bayesian expected costs, leading unambiguously to a single Bcd decision. It is however possible to define lower and upper A BPA can also be viewed as determining a set of proba- bounds for these costs. in the following way [7,[2]. Let M bility distributions P over 2e satisfying be the number of hypotheses, and c be an M x M matrix such that U;i is th g hypo Be(1)<P(4)≤Pl(4) (10) is true. Then, for each simple hypothesis B: E 0. a lower d cost e a cted cost E*0, I car for all 4 Ce. For that reason. Bel and PI are also called be defined lower and upper probabilitics, respectively. This fundamental imprecision in the determinalion of the probabilities reflects E、6 ∑ In(A)inin Ci (14) ∈4 the weakness", or incompleteness of the available informa tion. The above inequalities reduce to equalities in the case of E6=∑m(A)axO (15) a bayesian belief function Given two belief functions Beh, and Bel2 over the same The lower(respectively: upper) expected cost can be seen as frame of discernment, but induced by two independent source being generated by a probability distribution compatible with of information, we Imust define a way by which, under sonlle and such that the densily of in(A)is concentrated at the conditions. these belief functions can be comhined into a single element of a with the lowest (respectively: highest) cost. Here one. Dempster's rule of combination is a convenient method for doing such pooling of evidence. First, Bel, and Bel, again, the choice is left open as to which criterion should used for the decision. Maximizing the upper expected c have to be combinable, i c. their cores must not be disjoint amounts to minimizing the worst possible consequence, and If 11 and IIl2 are the BPAs associated with Bel, and Bel2. therefore generally leads to more conservative decisions. Note respectively, this condition can also be expressed as that、 when U verifies ∑m(Am2(B)< (11) If Bel and Bely are combinable. then the function m: 24- ho. &i is the Kronecker symbol, the following equalities 0. 1. defined by E-,]=1-P({6,}) 17) 71()=0 ∑B=:1(4)mn2(B E团6,]=1-Btl(B;} (18) r(6)=1-∑B=m1()m2(b 4g(13)In the case of (0.1) costs, minimizing thc lower(respectively upper)expected cost is thus equivalent to selecting the hypoth is a BPA. The belief function Bel given by n is called the esis with the highest plausibility (respectively: credibility) orthogonal sum of Bel and Bel, and is denoted Bel Bel2 For convenience, 1l will also be denoted 1/.1 A 1/2. The core of Bel equals the intersection of the cores of Bell and Bel2 IL. THE METHOD Although Dempsters rule is hard to justify theoretically it has some attractive features, such as the following: it is A. The Decision Rule commutative and associative; given two belief functions Bel cI1 Let x=(: E n)=1.……, n, be a collection and Bcl2, if BelI is vacuous, then Bel B Bcl2= Bely; if on N P-dimcnsional training samples, andC=[Cl.. CAr BelI is bayesian, and if Bell e Bel exists, then it is also be a set of M classes. Each sample ' will first be assumed Bayesian o possess a class label L'E(1.. M indicating with The D-S formalism must also be considered in the perspec- certainty its membership to one class in C. The pair (t. C) tive of decision analysis [2]. As explained above, under d-s where L is the set of labels, constitutes a training set that can theory, a body of evidence about some sct of hypotheses be used to classify new patterns does not in general provide a unique probability distribution, Let s be an incoming sample to be classified using the but only a set of comparible probabilities bounded by a belief information contained in the training se. Classifying means function Bel and a plausibility function PI. An immediate assigning it to one class in C, i. e. deciding among a set of M consequence is that simple hypotheses can no longer be hypotheses: E CG, 9-I..M. Using the vocabulary of ranked according to their probability: in general, the rankings D-S theory, C can be called the frame of discernment of the produced by Bel and Pl will be different. This means that, as problem a result of lack of inforimlaliun, the decision is, to sore extent, Let us denote by the set of the k-nearest neighbors of N indeterminate. The theory does not make a choice between in a. according to some distance measure(e.g. the euclidian two distinct strategies: select the hypothesis with the greatest one). For any E the know ledge that /=can DENGEUX: k-NEAREST NEIGHBOR CLASSIFICATION RULE BASED ON DEMPSTER-SHAFER THEORY be regarded as a piece of evidence that increases our belief rule. Note that this is always possible, since all the associated that x also belongs to Ca. However, this piece of evidence belief functions have C as a focal element does not by itself provide 100% certainty. In D-S formalism, Let us first consider two elements z 'and T ] of y belonging belief is committed to C. Since the fact that l'=y does from the combination of m .and msj is given by. resulting his can be expressed by saying that only some part of our to the same class Ca The BPA m, (4 7! =m". Em, resulting not point to any other particular hypothesis, the rest of our belief cannot be distributed to anything else than C. the whole ({C})=1-(1-c02(d)(1-o2(”) frame of discernment. This item of evidence can therefore be represented by a BPA rm: verifying n:()(C)=(1-a0(d)(1-a094(F).(28) (19) If we denote by a the set of the ki-nearest neighbors of x n^(C)=1- (20) belonging to Ca, and assuring that sfv, the result of the mn(4)=0VA∈2°\{{C ( combination of the corresponding bPAs m2=,∈中;m2 with0<(<1. If .I:is far from: :8. as compared to distances between ms(CQ1)=1-I(1-Codg(ds): neighboring points in Ca. the class of: 'will be considered as providing very little information regarding the class of: I I1-0() in that case. (r must therefore take on a smal value on the r∈ contrary, if r is close to r one will be much more inclined to believe that r and r ' belong to the same class If g=0, then mg is simply the bpa associated with the consequence, it seems reasonable to postulate that c should vacuous belief function: mia(c)=I be a decreasing function of ds. the distance between n and Combining all the BPAs m, for each class, a global BPA c. Furthermore, if we note 9=1 Tlg is obtained as =(02(d) (22) m5(Cqb)IIms(C) where the index g indicates that the infuence of d. may m(C4}) 1.…,M(31) depend on the class of; t, the following additional conditions must be imposed on (yo and r (C) 0 n(0)=1 (24)where K is a normalizing factor lin cq(d)=0 (25) The first two conditions indicate that even if the case of K=∑m({ClmC+ a zero distance between a 'and one still does not have certainty that they belong to the same class. This results from (34) the fact that several classes can, in general, simultaneously ∑Ⅱm;C)+(1-M)m(C) q have non zero probability densities in some regions of the feature space. The third condition insures that in the limit. as The focal elements of the belief function associated with the distance between r: and gets infinitely large. the belief s are the classes represented among the k-nearest neighbors function given by m*, becomes vacuous, which means that of r and C. The credibility and plausibility of a given class one's belief concerning the class of y. is no longer affected C,are by ones knowledge of the class of Bel(Cal)=m(Cq) There is obviously an infinitely large number of decreasing P({C)=m({(g})+m(C (36) unctions verifying(24)and(25), and it is very difficult to find any a priori argument in favor of onc particular function Therefore, both criteria produce the same ranking of hypothe or anothcr. We suggcst to choose gas ses concerning a r If an M x M cost matrix U is givcn, where Ui, j is the 中n{d) (26) cost of as signing an incoming pattcrn to class i, if it actually with y>0ard3∈{1 belongs to class i, then lower and upper expected costs are n can be arbitrarily fixed to defined for each possible decision a small value (I or 2 ). Simple heuristics for the choice of (ro and ya will be presented later m°(4) Inin U (37) For each of the k-nearest neighbors of m". a BPa depending on both its class label and its distance to can therefore be defined. In order to make a decision regarding the class assign n (Cr G, r +mIC)min Cq,r (38) ment of: " these BPAs can be combined using Dempsters IEEE TRANSACTIONS ON SYSTEMS. MAN, AND CYBERNETICS, VOL 25, NO. 5, MAY 1995 m°(A)maxU C∈. erefore ({Cp})>m(Cq})分k一<k-(47) m"(C, )U4,r+ '(C)maxUS. /.(40) 台1;> (48) P=1 Note that minimizing the lower or upper expected cost do not necessarily lead to the same decision, as can be seen om the following example. Let us consider the problem B. Reject Options of assigning an incoming samplc : r: to onc of thrcc classes The decision rule D can easily be modified so as to include (M =3. It is assured that the consideration of the h ambiguity and distance reject options. The ambiguity reject nearest neighbors of r has produced a BPA 7). St option, as introduced by Chow [3 consists in postponing 0. r'RC31)=0.4 and decision-making when the conditional error of making a n (C)=0.4. The cost matrix is decision give f the feature space where the ong ove 101 between classes. In that case, an incoming sample I: s to be classified will generally be close to several training vectors belonging to different classes. Hence, this can be viewed as a The lower and upper expected costs are, in that case problem of conflicting information E【C1-0.4E(2]1-0.6E【C3]=0.2 The distance reject option discussed in 19] corresponds to C]=0.8EC2=10E[Cy=1.0 a different situation, where the point. r. sto be classified is far away from any previously recorded sample, and is therefore Thus. C:3 minimizes Ix. while CI minimizes E suspected of belonging to a class that is not represented in the However, in the case of (O. 1) costs, that will exclusively training set. The problem here no longer arises from contlict be considered henceforth, minimizing the lower(resp. upper) in the data, but from the weakness or scarcity of available expcctcd cost amounts to maximizing the plausibility (resp. information credibility ) In that case, and under the assumption that the In our framework, the first case will be characterized by true class membership of each training pattern is known, both a BPA i" that will be uniformly distributed among several riteria therefore lead to the same on rule classes. As a consequence. both the maximum plausibility qmax =arg maxm (Cpr;= D;: )=q (41) PI(Cans I )and the maximum credibility Bels(icamas where D(r)is the class label assigned to the probability mass will he concentrated on the whole frame Note that the consideration of the distances makes the of discernment C. As a consequence, only Bel"(Cy.))will probability of a tie taking place much smaller than in th takc on a small valuc: as the distance between r and its closest simple majority rulc, whose rclationship with D can also be neighbor gues to infinity, Be/(Ces)) actually goes to zero, described by the following th thile Pl (Csma) goes to Theorein / If the h: nearest neighbors of a data point: T As a result, it is possible to introduce ambiguity and distance re located at the distance d if reject options by imposing thresholds Plmin and Belmin on the ga:=o. then the decision rule d produces the same decision plausibility and credibility, respectively. The sample 3:swill lle be ambiguity rejected if P/(Cua)sPlmin, and it will be P denote by the distance between "and all e rejected if i°({(g})<乃 Note that in the of its h: nearest neighbors r'∈φ. For all y∈{1 case of 10. I] costs, these thresholds correspond to thresholds is defined by ' tmax and rnay on thc lowcr and uppcr expected costs, E 4({Cn}=1-(1-ano() (1 (43) 1-Pluin (50) (1-(1-00((C))(1 The determination of Pmin has lu be based onl d (rade off between the probabilities of error and reject, and must MJ (44) therefore be left to the designer of the system the choice of Belmn is more problematic, since nc unknown class is, by (C) 1-(x0 () (45) definition, initially included in the training set. A reasonable approach is to compute Bel(iCai.)for each r:'in th For any〃 and g in{1 M) such that m (Cq)>0. we training set using the leave-one-out method, and define a have distinct threshold Bel? for each class Caas ∥({C1})(1-m06()-2n-(1-cm0) (46) ({()(1-(0()-1-(1-0abi) BeRmit-xi∈t,L nin Bel(c DENGEUX: k-NEAREST NEIGHBOR CLASSIFICATION RULE BASED ON DEMPSTER-SHAFER THEORY C. Imperfect Labelling As in the standard k-nn procedure, the choice of ki is In soime applications. it may happen that one unly has difficult to make a priori. Although our metho eems to by mperfect knowledge concerning the class membership of far less sensitive to this parameter than the majority rule,a some training pattems. For example, three class problem, systematic search for the best value of k may be necessary in an expert may have some degree of belief that a sample t order to obtain optimal results belongs to a class C3, but still consider as possible that it For the choice of co and yy, several heuristics have been might rather belong to C1 or C2. Or, he may be sure that tested. Good results on average have been obtained with ao v'does not belong to C3, whilc being totally incapable of 0.95 and g determined seperately for each class as \Va deciding between C1 and C2. In D-S formalism. one's belief where do is the mean distance between two training vectors in the class membership of each training pattern ri can be belonging to class Cy. The value of B has been found to have represented by a BPA m over the frame of discernment C. very little influence on the performance of the method. A value For cxample, if the expert is sure that r 'does not belong to C3, of 4= I has been adopted in our simulations The following examples are intended to illustrate various the chance of his assessment being correct at so%. then his aspccts of our method, namely: the shape of the decision belief can be represented in the form of a BPa as boundaries and reject regions for simple two-dimensional dat sets, the relative performance as compared to the voting ar m'({C1.C2})=0. (52) distance-weighted ha-nN rules for different values of F, and m(C)=0.2 (53)the effect of imperfect labelling with all remaining rn ' (A) values equal to zero A. Experimen! I The approach described in abuve can easily be generalized The purpose of this experiment is to visualize the decision so as to make use of training patterns whose class membership boundary and the regions of ambiguity and distance reject for is represented by a BPA. If a: s is a sample to be classified onc's belief about the class of "induced by the knowledge two different two-dimensional data sets of moderate size. The that E q can be represented by a bpa n a. deduced from first data set is taken from two gaussian distributions with the m and di following characteristics 1)=a09(Cm(4)vA∈2\C(54) 0 4∈2CYC 0.251 where d is a monotonically decreasing function verifying(24)where I is the identity matrix 'Ihere are 40 training samples As before, the yu i,'can then be combined using Dempster's The second data set consists of two non-gaussian classes rule to form a global Bpa of 40 samples each separated by a non-linear boundar Both data sets ted in the figs. 1-4, together with (50)the lincs of cqual maximum credibility Bels(Ca.)and plausibility Pl(Cqa). for k=9. As expected, the regio Notc that, while the amount of computation needed to ot low plausibility is concentrated in each case around the implement Dempsters rule increases only linearly with the class boundary, allowing for ambiguity reject, whereas small number of classes when the belief functions given by the msi credibility values are obtained in the regions of low probabilit are simple support functions as considered in Scction IIl. A, the density. The distance reject regions, as defined in Section III B increase is exponential is the worst general case. However, are delimited by dotted lines more computationally efficient approximation methods suIch For the first data set the estimated error rate obtained usin as proposed in [21] could be used for very larger numbers of an independent test set of 1000 samples is 0.084, against 0.089 9-NN rule. The corres second data set and leave one out error estimation are 0.075 for both methods TV. EXPERIMENTS The approach described in this paper has been successfully B. Experimen/2 results of some of these experiments, practical issues related A comparison between the performances of the voting to the implementation of the procedure need to be addressed. h-NN procedure. the distance-weighted k-NN rule and our Leaving alone the rejection thresholds. for which a deter- method was performed using one artificial and two real-world ination method has alrcady bccn proposed, and assuming an classification problems. In the majority rule, ties were resolved exponential form for pa as described in(26), the following by randomly selecting one of the tied pattern classes parameters have to he fixed in order to allow the pratical use of the method: hi, (0. Ya. =1..M and B This heuristic was suggested to me by Lalla mcricm Zouhal IEEE TRANSACTIONS ON SYSTEMS. MAN, ANDCYBERNETICS. VOL, 25.NO. 5. MAY 995 3 0705 级炒 太 03 0.1 Fig. I. Lines of equal maximum credibility (Bel(Caax))tor k=9 Fig 3. Lines of equal r aximum credibility (De((Camax)))for k: =9 Gaussian data). Samples falling outside the region delimited by the dotted (non-gaussian data). Samples falling outside the region delimited by the dotted linc are distance rcjccicd 爻x 99 Ec 43210123 2 Fig. 2. Lines of equal maximum plausibility (P1(Cimas)) for A. =9 Fig 4. Lines of equal maximum plausibility iP5(C+))for k=9 Gaussian data). non-gaussian data) The fist proble implies three gaussian distributions in a training set, and the other four male and three female speakers hree-dimensional space, with the following characteristics were used for building a test set. After suitable preproccssing, 568 training patterns and 462 test patterns in a 10 dimensional 二 13 Fig9 shows the test error rates 0 for the three methods with k ranging from 1 to 30 The third task investigated concerns the classification of ∑3=21 radar retums from the ionosphere obtained by a radar syster Training sets of 30, 60. 120 and 180 Samples have been consisting of a phased array of 16 high-frequency antennas generated using prior probabilities(1/, 1/3, 1/3). Values of [17]. [20]. The targets were free electrons in the ionosphere h ranging from I to 25 have been investigated. A test set of Radar returns were manually classified as"good"or"bad 1000 samples has been used for error estimation. For each depending on whether or not they showed evidence of some pair(N ki). the reported error rates are averages over 5 trials type of structure in the ionosphere. Received signals were pertormed with 5 independent training sets. The results are processed using an autocorrelation function whose arguments presented Table I and Figs. 5-8 are the time of a pulse and the pulse number. This processing The second data set is composed of real-world data obtained yielded 34 continuous attributes for each of the 351 training y recording examples of the eleven steady state vowels of instances collected. The classification results for different English spoken by fifteen speakers[8], [181. Words containing valucs of h arc described in Fig, 10. The figures shown are each of these vowels were uttered once by the fifteen speakers. leave-one-out estimates of the error rates, computed using the Four male and four female speakers were used to build a training data DENGEUX: K-NEAREST NEIGHBOR CLASSIFICATION RULE BASED ON DEMPSTER-SHAFER THEORY 811 TABLE I Gaussian data(N=60) RESULIS OF THI. SEONI EXPERIMENT (GAUSSIAN DATA, 1000 TEST SAMPLES) .36 FOR THE VOTING A -NN RULE( -NN). THE DISTANCE-WEIGHTED K -NN RULE (WEIGHTED A'-NN)AND OUR METHOD(D-S): BEST ERROR RATES (MEANS OVER 5 RUNS)WITH CORRESPONDING VALL'ES OF K(UPPER NUMEERS)AND AVERAGE ERROR RATES INTEGRATED OVER THE DIFFERENT VALLIES OF: A (LOWER NL MBER) Classification rule k-NN k-NN Dempster-Shafer N=300.3265)0299(16 0267(15) 031 0.397 0.338 600309(8)0.93(21) 0.260(23) 0.335 0.314 0.284 N=1200.296(7)0.277(25) 0.254(22) 0.306 0.300 0.280 y=1800280(18) 0.249(23 g. 6. Mean classification error rates for the voting k-NN rule (-) 0.296 0.293 0.273 distance-weighted k-NN rule (-)and our method (-)as a function (Gaussian data,\=60) Gaussian data(N=120 0.55 045 Fig. 5. Mean classification error rates to the voting k-NN rule (-) the Fig. 7. Mean classification error rates the voting k-nN rule (-) the distance-weighted K-NN rule (- and our method (--)as a function of h distance-weighted k -NN rule (-)and our nethod (--)as a function of K Gaussian data. N= 30). Gaussian data,. Not surprisingly, the performances of the two methods experiment. For each training vector r, a number p has taking into account distance information are better than that been generated using a uniform distribution on [0, I with of the voting k-NN rulc, for the thrcc classification problems probability p. the label of s' has been changed (to any of the investigated. Whereas the error rate of the voting h -NN rule other two classes with equal probabilities). Denoting by L the passes by a minimuN for some problem-depenldenll nuInber new class label of a, and assuming that L=4, then the BPA of neighbors the results obtained by the two other methods m'describing the class membership of r? has been defincd as: appear to be much less sensitive to the value of k, provided hi is chosen large enough. Our method clearly outperforms the m2(C})=1 (57 distance-weighted approach on the Gaussian data sets and the task. Both methods m2,(C)=p (58) alent on the ionosphere dala 2'(A)=0 for all other A CC. Hence, m (C)is an indication of the reliability of the class label of r. Using the C. Experiment D-S formalisIn, it is possible to make use of this information, In order to study the behavior of our method in case by giving less importance to those training vectors whose class of imperfect labelling, the following simulation has been membership is uncertain. This property can be expected to performed. A data set of 120 training samples has been result in a distinctive advantage over the majority nile in a generated using the three gaussian distributions of the previous situation of this kind 812 IEEE TRANSACTIONS ON SYSTEMS. MAN. AND CYBERNETICS. VOL. 25. NO 5, MAY 1995 Gaussian data(N= 180) ionosphere data 0.24 0.22 0.18 03 0.16 0.14 012 026 0o5 Fig 8. Mean classification error rates for the voting k-NN rule (-) the Fig 10. Mean classification crror rates for the voting k-nn rule (-) thc distancc-wcightcd A -NN rule ( )and our mcthod (-)as a function of h distance-wcightod A -NN rulc (-)and our method ( )as a functicn of k i Gaussian data,.= 180) ionosphere Vowel data 043 Gaussian data (n=120 -Imperfect labelling 0 035 032488101214 Fig 9. Mean classitication error rates for the voting k-NN ule (-) the Fig. I 1. Mean classification error ates for the voting k-NN rule (- and our distance-weighted A-NN rule [ )and our method (--)as a function of k: method with consideration of uncertainty in class labels (--) as a function (Vowel data) f h( Gaussian dat 120 As can be seen from Fig. Il, our results support this each of the hi nearest neighbors of a alteiL to be classified as assumption. The two curves correspond to the voting h NN an item of evidence that modifies one's helief concerning the rule and our method with consideration of uncertainty in class class membership of that pattern. D-S theory then provides labels. as before the indicated error rates are averages over 5 a simple mechanism for pooling this evidence in ordcr to trials."'The lowest rates achieved, as estimated on an indepen: quantify the uncertainty attached to each simple or compound dent test set of 1000 samples, are 0.43 and 0.34, respectively. hypothesis. 'This approach has been shown to present several The percentages of performance degradation resulting from the advantages. It provides a natural way of modulating the Introduction of uncertainty in the class labels are respectively importance of training samples in the decision depending 54c and 21 c. These results indicate that the consideration on their nearness to the point to be classified. It allows for of the distances to the nearest neighbors and of the BPAs ot the introduction of ambiguity and distance reject options, that these neighbors both bring an improvement over the majority receive a unified interpretation using the concepts rule in that case and upper expected costs. Situations in which only imperfect knowledge is available concerning the class membership of V. CONCLUSION ome training pattens are easily dealt with by labelling each Based on the conceptual framework of D-S theory, a new recorded sample using basic probability numbers attached non parametric technigue for pattern classification has been to each subset of classes. Simulations using artificial and proposed. This technique essentially consists in considering real-world data sets of moderate sizes have illustrated the DENGECX: k-NEAREST NEIGHBOR CLASSIFICATION RULE BASED ON DEMPSTER-SHAFER THEORY different aspects, and have revealed in cach casc a supcriority (41 T. M. Cover and P E. Hart, "Nearest neighbor pattern classification," of the proposcd schcmc over the voting k-NN procedure in EEE Trans.lom.7为 2-27.1967 terms of classification performance. In two cases. the results [5 B. V. Dasarathiy, " Nosing around the neighborhood: A new Sy structure and classification rule for recognition in partially exposed oblained with our method were also better than those obtained environments IEEE Trans. Pattern Anal Machine intell. vol. PAMI-2 ith the distance-weighted ki-NN rule, while both method yielded similar results in a third experiment. It should be arest neighbor norms: NN pattern classification techniques TEEE CUrHter Surety Press, LUs Al IlLUs, CA,1991 noted that these results are obviously not sufficient to claim [7] A. P Dempster and A. ong, "Comment, "Stat. Sci., vol 2, no. 1,pp the superiority of our approach for all possible data scts [81 D. H. Deterding. "Speaker normalization for automatic speech recogni although no counterexample has been encountered up to now ion, Ph D. thesis, University of Cambridge, 1989 The comparison with the weighted or unweighted % -NN rules 191 B Dubuisson and M Masson,A statistical decision rule with incom in the infinite sample case is also an interesting but so far plete knowledge about classes, Pattern Recognition, vol 26, no. I 155-165,1993 unanswered question IIO S.A. Dudani, "The distance-weighted k-nearest-neighbor rule. IEEE Another particularity of the technique described in this Trans. Sysl. Man Cyber., vol 6, pp. 325-327, 1976 paper is the quantification of the uncertainty attached to the W E. Fix and 1. L. Hodges. " Discriminatory analysis, nonparametric uiscinuinativl: Coiaislency propcrties, Technical Report 4, USAF decisions, in a form that permits combination with the outputs School of Aviation Medicine, Randolph Field, TX, 1951 of complementary classifiers, possibly operating at different [12] M E Hellman, "The nearest neighbor classification rule with a reject levels of abstraction. For example, given three classes C1 C2 [13] A. Jozwik. "A learning scheme for a fuzzy k-NN rule. Pattern and C3, one classifier may discriminate between class C1(14)JM.Keller.M ol.1,pp.287-289,1983 and the other two, while another one may help to disct R. Gray, and J. A Givens. "A fuzzy k-NN neighbor algcritlIl, IEEE Trans. Sysl. Man Cyber, voL. 15, 10. 4, PP 580-585 C2 and C3. By combining the BPAs produced by each 985 these classifiers, Dempster's rule offers a way to assess the [151 J E. Macleod, A. Luk, and D. M. Titterington,"A re-examination of the reliability of the resulting classification. This approach is distance-weighted A -nearest neighbor classification rule, "IEEE Tran Syst. Man Cyber., voL. 17. no, 4.pp. 689-696.:987. expected to be particularly useful in data fusion applications. [161 R. L. Morin and D. E. Raeside, "A reappraisal of distance-weighted where decentralized decisions based on data coming from nearest-neighbor classification for pattern recognition with missing data,IEEE Trans. Sysf Man Cyber. voL. I L, n0. 3, Pp 241-243, 1981 disparate sensor sources need to be merged in order Lo achieve |171 P. M. Murphy and D. W. Aha, "UCI Repository of machine learning a final decision databases [Machine-ieadable data repository], "University uf CalifoilLid Department of Inforrnation and Computer Science, Irvine, CA, 1994 118 A. J. Robinson, Dynamic error propagation networks, " Ph. D. thesis ACKNOWLEDGMENT Cambridge university Engineering Department, 1989 1 G. Shafer. A Mathematical Theor of Evidence. Princeton, NJ: Prince The vowel data were obtained from the Carnegie mellon on University Press, i976 Universlly collection of neural net benchmarks maintained by 120) v.G. Sigillito, S P Wing, L V. Hutton, and K B Baker. "Classitication f radar returns fionl the ionosphere using neural netwoR ks, in Johns Matt White, under the supervision of Scott E FahLman. The Hopkins APL Technica! Digest, vol. 10, PP 262-266, 1989 ionosphere data were obtained from the UCI Repository of [211 B. Tessel. "Approxinaiuns for efficient compulation in the theory of evidence, Artificiai Intelligence, vol. 61, pp 315-329, 1993 machine learning databases maintained by Patrick M. Murphy and David w. Aha. The author wishes to express his thanks to the anonymous referees for their helpful comments and suggestions during the revision of this paper Thierry graduated in 1985 as an engineer ss from the ational des pe Ph. D. from the REFERENCE 98 worked at LIAC(Laboratoire Baily and A. K. Jain,"A note on distance-weighted k-nearest d'Informauique Avancee de Compiegne), a research neighbor rules, " IEEE Trans. Syst. Man Ciber, vol nter of Ly des eaux dumez, where he was in charge of a European research project on [21 W.F. Casclton and w. Luo. Decision making with inprecise piob the application of neural nctworks to forccastil abilities: Dempster- Shafer theory and application.“ Water Reso∥rces蠡 and diagnosis. He is currently an assistant professor Research,vo.28.mo.12,pp.3071-3081,1992 at the universite de Technologie de Compiegne [3]C.K Chow, "On optimum recognition error ard reject tradeoff, "IEEE His research interests include artificial neural networks, statistical pattern Trun.. Inform. Theor, vol. IT-16 pp. 41-46. 1970 recognition and data fusion

试读 10P k-Nearest Neighbor Classification
立即下载 低至0.43元/次 身份认证VIP会员低至7折
关注 私信 TA的资源
k-Nearest Neighbor Classification 25积分/C币 立即下载
k-Nearest Neighbor Classification第1页
k-Nearest Neighbor Classification第2页

试读结束, 可继续读1页

25积分/C币 立即下载 >