- 1 -
PRTools4
A Matlab Toolbox for Pattern Recognition
R.P.W. Duin, P. Juszczak, P. Paclik,
E. Pekalska, D. de Ridder, D.M.J. Tax,
Version 4.0, February 2004
An introduction into the setup, definitions and use of PRTools is given. PRTools4 is extended and
enhanced with respect to version 3 and thereby not fully compatible with it. Some new possibilities
are not yet fully exploited on the user level, or not at all. See release notes on page 50. Readers are
assumed to be familiar with Matlab and should have a basic understanding of field of statistical pattern
recognition.
tel : +31 15 2786143
fax: +31 15 2781843
email: prtools@prtools.org
http://prtools.org/
Delft Pattern Recognition Research
Faculty EWI - ICT
Delft University of Technology
P.O. Box 5046, 2600 GA Delft
The Netherlands
- 2 -
Availability, licences, copyright, reference
PRTools can be downloaded from the PRTools website.
The use of PRTools is protected by a license. This license is free for non-commercial academic
research, non-commercial education and for personal inspection and evaluation. For other usage a
one-time license fee has to be paid.
The PRTools sources are copyright protected.
If PRTools is used for scientific or educational publications, the following reference will be
appreciated:
R.P.W. Duin, P. Juszczak, P. Paclik, E. Pekalska, D. de Ridder, D.M.J. Tax,
PRTools4, A Matlab Toolbox for Pattern Recognition, Delft University of Technology, 2004.
- 3 -
Table of Contents
1. Motivation 5
2. Essential concepts 6
3. Implementation 8
4. Advanced example 10
5. Some Details 12
5.1 Datasets 12
5.2 Datasets help information 13
5.3 Classifiers and mappings 16
5.4 Mappings help information 18
5.5 How to write your own mapping 23
6. References 26
7. A review of the toolbox 27
Datasets and Mappings 28
Data Generation 28
Linear and Higher Degree Polynomial Classifiers 29
Normal Density Based Classification 30
Nonlinear Classification 30
Feature Selection 31
Classifiers and Tests (general) 31
Mappings 33
Combining classification rules 34
Image operations 34
Clustering and Distances 35
Plotting 35
Examples 35
8. Examples 37
- 4 -
8.1 PREX_CLEVAL Learning curves 37
8.2 PREX_COMBINING PRTOOLS example of classifier combining 38
8.3 PREX_CONFMAT Confusion matrix, scatterplot and gridsize 39
8.4 PREX_DENSITY Various density plots 40
8.5 PREX_EIGENFACES Use of images and eigenfaces 41
8.6 PREX_MATCHLAB Clustering the Iris dataset 42
8.7 PREX-MCPLOT Multi-class classifier plot 43
8.8 PREX_PLOTC Dataset scatter and classifier plot 44
8.9 PREX_SPATM Spatial smoothing of image classification 45
8.10 PREX_COSTM PRTools example on cost matrices and rejection 46
8.11 PREX_LOGDENS Improving density based classifiers 48
9. PRTools 4.0 release notes 50
9.1 Datasets 50
9.2 Mappings 50
9.3 The user level 51
- 5 -
1. Motivation
In statistical pattern recognition one studies techniques for the generalization datasets to decision rules
to be used for the recognition of patterns in experimental data sets. This area of research has a strong
computational character, demanding a flexible use of numerical programs for data analysis as well as
for the evaluation of the procedures. As still new methods are being proposed in the literature a
programming platform is needed that enables a fast and flexible implementation. Pattern recognition
is studied in almost all areas of applied science. Thereby the use of a widely available numerical
toolset like Matlab may be profitable for both, the use of existing techniques, as well as for the study
of new algorithms. Moreover, because of its general nature in comparison with more specialized
statistical environments, it offers an easy integration with the preprocessing of data of any nature. This
may certainly be facilitated by the large set of toolboxes available in Matlab.
The about 200 pattern recognition routines and the additional 200 support routines offered by
PRTools in its present state represent a basic set covering largely the area of statistical pattern
recognition. Many methods and proposals, however, are not yet implemented. Some choices were
accidental as the routines were programmed by the developers for their own research, sometimes in a
way that was good for their private purposes. The important field of neural networks has partially been
skipped as Matlab already includes a very good toolbox in that area. Just an interface to some basic
routines is offered by PRTools to facilitate a comparison with traditional techniques.
PRTools has a few limitations. Due to the heavy memory demands of Matlab very large problems
with learning sets of tens of thousands of objects cannot be handled on moderate machines. In the
present version, PRTools4, the handling of missing data has been prepared, but no routines are
implemented yet. The use of symbolic data is not supported. Recently the possibility of soft (and
thereby also fuzzy) labels has been added . Just a few routines make use of them now. Also multi-
dimensional target fields are allowed, but at this moment no procedure makes use of this possibility.
Finally, support for misclassification costs has been implemented, but this is still on a experimental
level.
In section 2 we present the basic philosophy about mappings and datasets. Section 3 presents the
actual implementation, which is illustrated by examples in section 4. In section 5 further details are
given, focussing on defining and using datasets and mappings. Section 7 lists the most important
procedures of the toolbox.