Data description toolbox
dd tools 2.0.0
A Matlab toolbox for data description, outlier and novelty detection
for PRTools 5.0
July 24, 2013
D.M.J. Tax
−5
0
5
−5
0
5
0
0.2
0.4
0.6
0.8
Feature 1
Banana Set (targetcl. 1)
Feature 2
Contents
1 This manual 4
2 Introduction 6
2.1 Classification in Prtools . . . . . . . . . . . . . . . . . . . . . 6
2.2 What is one-class classification? . . . . . . . . . . . . . . . . . 6
2.3 Error minimization in one-class . . . . . . . . . . . . . . . . . 8
2.4 Receiver Operating Characteristic curve . . . . . . . . . . . . 8
2.5 Introduction dd tools . . . . . . . . . . . . . . . . . . . . . . 10
3 Datasets 11
3.1 Creating one-class datasets . . . . . . . . . . . . . . . . . . . . 11
3.2 Inspecting one-class datasets . . . . . . . . . . . . . . . . . . . 13
4 Classifiers 15
4.1 Prtools classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Creating one-class classifiers . . . . . . . . . . . . . . . . . . . 16
4.3 Inspecting one-class classifiers . . . . . . . . . . . . . . . . . . 17
4.4 Available classifiers . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Combining one-class classifiers . . . . . . . . . . . . . . . . . . 25
4.6 Multi-class classification using one-class classifiers . . . . . . . 26
4.7 Note for programmers . . . . . . . . . . . . . . . . . . . . . . 27
5 Error computation 30
5.1 Basic errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Precision and recall . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 Area under the ROC curve . . . . . . . . . . . . . . . . . . . . 31
5.4 Cost curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5 Generating artificial outliers . . . . . . . . . . . . . . . . . . . 35
2
5.6 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 General remarks 37
7 Contents.m of the toolbox 40
Copyright: D.M.J. Tax, D.M.J.Tax@prtools.org
Faculty EWI, Delft University of Technology
P.O. Box 5031, 2600 GA Delft, The Netherlands
3
Chapter 1
This manual
The dd tools Matlab toolbox provides tools, classifiers and evaluation func-
tions for the research of one-class classification (or data description). The
dd tools toolbox is an extension of the Prtools toolbox , more specifically,
Prtools 5.0. In this toolbox Matlab objects for datasets and mappings,
called prdataset and prmapping, are defined. dd tools uses these objects
and their methods, but extends (and sometimes restricts) them to one-class
classification. This means that before you can use dd tools to its full po-
tential, you need to know a bit about Prtools. When you are completely
new to pattern recognition, Matlab or Prtools, please familiarize yourself a
bit with them first (see http://www.prtools.org for more information on
Prtools).
This short document should give the reader some idea what the data de-
scription toolbox (dd tools) for Prtools offers. It provides some background
information about one-class classification, about some implementation issues
and it gives some practical examples. It does not try to be complete, though,
because each new version of the dd tools will probably include new com-
mands and possibilities. The file Contents.m in the dd tools-directory gives
the up-to-date list of all functions and classifiers in the toolbox. The most
up-to-date information can be found on the webpage on dd tools, currently
at: http://prlab.tudelft.nl/david-tax/dd_tools.html.
Note, that this is not a cookbook, solving all your problems. It should
point out the basic philosophy of the dd tools . You should always have
a look at the help provided by each command (try help dd tools). They
should show all possible combinations of parameter arguments and output
arguments. When a parameter is listed in the Matlab code, but not in the
4
help, it often indicates an undocumented feature, which means: be careful!
Then I’m not 100% sure if it will work, how useful it is and if it will survive
a next dd tools version.
In chapter 2 a basic introduction about one-class classification/novelty detec-
tion/outlier detection is given. What is the goal, and how is the performance
measured. You can skip that if you’re familiar with one-class classification.
In chapter 2.5 the basic idea of the dd tools is given. Then in chapters 3
and 4 the specific use of datasets and classifiers is shown. In chapter 5 the
computation of the error is explained, and finally in 6 some general remarks
are given.
5