Data description toolbox
dd tools 1.6.3
A Matlab toolbox for data description, outlier and novelty detection
June 3, 2008
D.M.J. Tax
−5
0
5
−5
0
5
0
0.2
0.4
0.6
0.8
Feature 1
Banana Set (targetcl. 1)
Feature 2
Contents
1 Thi s manual 4
2 Intro duct io n 6
2.1 What is one-class classification? . . . . . . . . . . . . . . . . . 6
2.2 Error minimization in one-class . . . . . . . . . . . . . . . . . 7
2.3 Receiver Operating Characteristic curve . . . . . . . . . . . . 8
2.4 Introduction dd
tools . . . . . . . . . . . . . . . . . . . . . . 9
3 Dataset s 10
3.1 Creating one-class datasets . . . . . . . . . . . . . . . . . . . . 10
3.2 Inspecting one-class datasets . . . . . . . . . . . . . . . . . . . 12
4 Cl assifiers 14
4.1 Prtools classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Creating one-class classifiers . . . . . . . . . . . . . . . . . . . 15
4.3 Inspecting one-class classifiers . . . . . . . . . . . . . . . . . . 16
4.4 Available classifiers . . . . . . . . . . . . . . . . . . . . . . . . 17
4.5 Combining one-class classifiers . . . . . . . . . . . . . . . . . . 23
4.6 Note for programmers . . . . . . . . . . . . . . . . . . . . . . 24
5 Er ro r computation 27
5.1 Basic errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Precision and recall . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3 Area under the ROC curve . . . . . . . . . . . . . . . . . . . . 28
5.4 Cost curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.5 Generating artificial outliers . . . . . . . . . . . . . . . . . . . 31
5.6 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2
Chapter 1
This manual
The dd tools Matlab toolbox provides tools, classifiers and evaluation func-
tions for the research of one-class classification (or data description). The
dd tools toolbox is an extension of the Prtools toolbox in which Matlab
objects for mapping and dataset are defined. dd tools uses these objects
and their methods, but extends (and sometimes restricts) them to one-class
classification. This means that before you can use dd tools to its full po-
tential, you need to know a bit about Prtools. When you are completely
new to pattern recognition, Matlab or Prtools, please familiarize yourself a
bit with them first (see http://www.prtools.org for more information on
Prtools).
This short document should give the reader some idea what the data de-
scription to olbox (dd
tools) for Prtools offers. It provides some background
information about one-class classification, about some implementation issues
and it gives some practical examples. It does not try to be complete, though,
because each new version of the dd tools will probably include new com-
mands and possibilities. The file Contents.m in the dd tools-directory gives
the up-to-date list of all functions and classifiers in the toolbox. The most
up-to-date information can be found on the webpage on dd tools, currently
at: http://www-ict.ewi.tudelft.nl/~davidt/dd_tools.html
Note, that this is not a cookbook, solving all your problems. It should
point out the basic philosophy of the dd tools . You should always have
a look at the help provided by each command (try help dd tools). They
should show all possible combinations of parameter arguments and output
arguments. When a parameter is listed in the Matlab code, but not in the
help, it often indicates an undocumented feature, which means: be careful!
4
Then I’m not 100% sure if it will work, how useful it is and if it will survive
a next dd tools version.
In chapter 2 a basic introduction about one-class classification/novelty detec-
tion/outlier detection is given. What is the goal, and how is the performance
measured. You can skip that if you’re familiar with one-class classification.
In chapter 2.4 the basic idea of the dd tools is given. Then in chapters 3
and 4 the specific use of datasets and classifiers is shown. In chapter 5 the
computation of the error is explained, and finally in 6 some general remarks
are given.
5