- 1 -
PRTools4
A Matlab Toolbox for Pattern Recognition
R.P.W. Duin, P. Juszczak, P. Paclik,
E. Pekalska, D. de Ridder, D.M.J. Tax, S. Verzakov
Version 4.1, August 2007
An introduction into the setup, definitions and use of PRTools is given. PRTools4 is extended and
enhanced with respect to version 3 and thereby not fully compatible with it. This manual includes the
description of a further upgrade: PRTools4.1. Still not all possibilities are fully exploited on the user
level, or not at all. See release notes on page 58. Readers are assumed to be familiar with Matlab and
should have a basic understanding of field of statistical pattern recognition.
tel : +31 15 2786143
fax: +31 15 2781843
email: prtools@prtools.org
http://prtools.org/
Delft Pattern Recognition Research
Faculty EWI - ICT
Delft University of Technology
P.O. Box 5046, 2600 GA Delft
The Netherlands
- 2 -
Availability, licences, copyright, reference
PRTools can be downloaded from the PRTools website.
The use of PRTools is protected by a license. This license is free for non-commercial academic
research, non-commercial education and for personal inspection and evaluation. For commercial
usage special licenses are available.
The PRTools sources are copyright protected.
If PRTools is used for scientific or educational publications, the following reference will be
appreciated:
R.P.W. Duin, P. Juszczak, P. Paclik, E. Pekalska, D. de Ridder, D.M.J. Tax, S. Verzakov
PRTools4.1, A Matlab Toolbox for Pattern Recognition, Delft University of Technology, 2007.
- 3 -
Table of Contents
1. Motivation 5
2. Essential concepts 6
3. Implementation 9
4. Advanced example 11
5. Some Details 13
5.1 Datasets 13
5.2 Datasets help information 15
5.3 Datafiles 18
5.4 Datafiles help information 19
5.5 Classifiers and mappings 20
5.6 Mappings help information 23
5.7 How to write your own mapping 27
6. References 31
7. A review of the toolbox 32
Datasets and Mappings 32
Data Generation 34
Linear and Higher Degree Polynomial Classifiers 35
Normal Density Based Classification 36
Nonlinear Classification 36
Feature Selection 37
Classifiers and Tests (general) 38
Mappings 39
Combining classification rules 40
Image operations 40
Clustering and Distances 41
Plotting 43
- 4 -
Examples 43
8. Examples 45
8.1 PREX_CLEVAL Learning curves 45
8.2 PREX_COMBINING PRTOOLS example of classifier combining 46
8.3 PREX_CONFMAT Confusion matrix, scatterplot and gridsize 47
8.4 PREX_DENSITY Various density plots 48
8.5 PREX_EIGENFACES Use of images and eigenfaces 49
8.6 PREX_MATCHLAB Clustering the Iris dataset 50
8.7 PREX-MCPLOT Multi-class classifier plot 51
8.8 PREX_PLOTC Dataset scatter and classifier plot 52
8.9 PREX_SPATM Spatial smoothing of image classification 53
8.10 PREX_COSTM PRTools example on cost matrices and rejection 54
8.11 PREX_LOGDENS Improving density based classifiers 56
9. PRTools 4.0 release notes 58
9.1 Datasets 58
9.2 Mappings 58
9.3 The user level 59
10. PRTools 4.1 release notes 60
10.1 Compatibility 60
10.2 Datafiles 60
10.3 Image processing routines 60
10.4 Multiple labels 60
10.5 Optimisation of complexity parameters and regularisation 61
10.6 Regression 61
10.7 Object and dataset annotation 61
10.8 Kernels 61
10.9 Support vector classifiers 61
10.10 Rejects 61
- 5 -
1. Motivation
In statistical pattern recognition one studies techniques for the generalization of examples to decision
rules to be used for the detection and recognition of patterns in experimental data. This area of
research has a strong computational character, demanding a flexible use of numerical programs for
data analysis as well as for the evaluation of the procedures. As still new methods are being proposed
in the literature a programming platform is needed that enables a fast and flexible implementation.
Pattern recognition is studied in almost all areas of applied science. Thereby the use of a widely
available numerical toolset like Matlab may be profitable for both, the use of existing techniques, as
well as for the study of new algorithms. Moreover, because of its general nature in comparison with
more specialized statistical environments, it offers an easy integration with the preprocessing of data
of any nature. This may certainly be facilitated by the large set of toolboxes available in Matlab.
The about 200 pattern recognition routines and the additional 200 support routines offered by
PRTools in its present state represent a basic set covering largely the area of statistical pattern
recognition. Many methods and proposals, however, are not yet implemented. Some choices are
accidental as the routines were programmed by the developers for their own research, sometimes in a
way that was good for their private purposes. The important field of neural networks has partially been
skipped as Matlab already includes a very good toolbox in that area. Just an interface to some basic
routines is offered by PRTools to facilitate a comparison with traditional techniques.
PRTools has a few limitations. Due to the heavy memory demands of Matlab very large problems
with learning sets of tens of thousands of objects cannot always be handled directly. In version 4.1 of
the toolbox some tools to use large sets of files on disk are included. In the present version, PRTools4,
the handling of missing data has been prepared, but hardly any routine has been are implemented. The
use of symbolic data is not supported. Recently the possibility of soft (and thereby also fuzzy) labels
has been added, as well as the usage of multiple labels. Just a few routines make use of them now.
Also multi-dimensional target fields are allowed, but at this moment no procedure makes use of this
possibility. Finally, support for misclassification costs has been implemented, but this is still on an
experimental level.
In section 2 we present the basic philosophy about mappings and datasets. Section 3 presents the
actual implementation, which is illustrated by examples in section 4. In section 5 further details are
given, focussing on defining and using datasets and mappings. Section 7 lists the most important
procedures of the toolbox. The examples included in the distribution of PRTools are listed in section
8, together with their expected results. Finally release notes of the versions 4.0 and 4.1 are given in
sections 9 and 10. Here a summary of changes can be found that may be important for experienced
users of PRTools.