iToolbox Manual
by
Lars Nørgaard
Email:
lan@kvl.dk
Web:
www.models.kvl.dk
April 2005
Contents
Important notes on the iToolbox for MATLAB..............................................3
iToolbox – Getting Started.......................................................................4
A word on over fit and outliers ..............................................................4
Help ..................................................................................................4
Interval PLS – how to do it ...................................................................5
Backward interval PLS........................................................................10
biPLS as preprocessing for genetic algorithms........................................10
Moving window PLS ...........................................................................10
Synergy interval PLS..........................................................................11
Interval PCA .....................................................................................11
PLS models on selected intervals and prediction.....................................12
Validation.........................................................................................13
Preprocessing ...................................................................................13
Model structure.................................................................................13
m-files alphabetically ............................................................................15
iToolbox directory..............................................................................15
Auxiliary files....................................................................................17
2
Important notes on the iToolbox for MATLAB
Conditions
The toolbox is freeware. References to the implemented methods are given
below.
Authors
Lars Nørgaard, lan@kvl.dk
Chemometrics Group, Food Technology
The Royal Veterinary and Agricultural University
DK-1958 Frederiksberg
Denmark
&
Riccardo Leardi,
riclea@dictfa.unige.it (bipls & dyn_bipls)
Department of Pharmaceutical and Food Chemistry and Technology
University of Genoa
Italy
References
L. Nørgaard, A. Saudland, J. Wagner, J.P. Nielsen, L. Munck and S.B. Engelsen, Interval
Partial Least Squares Regression (iPLS): A Comparative Chemometric Study with an
Example from Near-Infrared Spectroscopy, Applied Spectroscopy, 54, 413-419, 2000.
R. Leardi and L. Nørgaard, Sequential application of backward interval-PLS and Genetic
Algorithms for the selection of relevant spectral regions, Journal of Chemometrics, in
press.
Warranty
In short, no guarantees, whatsoever, are given for the quality of this toolbox or
for the consequences of its use.
Where does the toolbox work?
The toolbox has been tested with MATLAB 7 in Windows XP.
Setting up the toolbox
In order to install the toolbox just copy all the files to a directory (e.g. iToolbox)
and add this to your MATLAB path.
Support
We are very interested in and dependent on feedback from the users (preferably
by e-mail). If you have problems running the toolbox please supply screen
dumps as well as version number of the toolbox, MATLAB, and operating system
before contacting us. We will do the utmost to help overcoming the problems.
3
iToolbox – Getting Started
The iToolbox is for exploratory investigations of data sets with many collinear
variables (e.g. spectral data sets). The main methods in the iToolbox are
interval PLS (iPLS): Splits the data set into a number of intervals (variable-
wise), calculates PLS models for each interval and presents the results in one
plot. This method is intended to give an overview of the data and can be helpful
in interpretation (e.g. for spectral assignments).
backward interval PLS (biPLS): As in the interval PLS model the data set is
split into a given number of intervals, but now PLS models are calculated with
each interval left out, i.e. if one chooses 20 intervals then each model is based
on 19 intervals leaving out one interval at a time. The first left out interval is the
one that when left out gives the poorest performing model with respect to
RMSECV or RMSEP (Root Mean Square Error of Cross Validation / Prediction).
This procedure is continued until one interval remains. The results are presented
in a table.
moving window PLS (mwPLS): Calculates iPLS models based on a moving
window concept. For each variable a PLS model is calculated with the given
window size. The results are presented in a plot.
synergy interval PLS (siPLS): Splits the data set into a number of intervals
(variable-wise) and calculates all possible PLS model combinations of two, three
or four intervals. The computation time can be long depending on the number of
intervals and the selected number of intervals to combine. The results are
presented in a table.
interval PCA (iPCA): Splits the data set into a number of intervals (variable-
wise), calculates PCA models for each interval and presents the results in
multiple score plots. This method is intended to give an overview of the data and
can be helpful in exploratory studies and interpretation (e.g. when looking for
groupings among samples).
PLS models and prediction: To be used for developing PLS models on selected
intervals from iPLS and biPLS and for prediction based on new data sets.
Try out the demos to get an impression of how to use the methods:
iplsdemo,
biplsdemo, mwplsdemo, siplsdemo, ipcademo.
A word on over fit and outliers
Please be aware that the methods implemented in this toolbox might over fit the
data. To be completely safe, e.g. when using the models for predictive purposes,
an independent test set should always be evaluated to see if the results obtained
correspond to the results found by using the toolbox. This is not a special case
for this toolbox but goes for all variable selection methods.
A large number of PLS models are calculated in each of the methods and it is
important to be aware that outliers can distort the results as they can in
standard PLS modeling.
Help
In general you can write the name of the file to get help on its function. E.g. if
you write
ipls at the MATLAB prompt followed by [ENTER] you get the following
text:
4
Model=ipls(X,Y,no_of_lv,prepro_method,intervals,xaxislabels,
val_method,segments);
Example:
Model=ipls(X,Y,7,'mean',20,xaxis,'syst123',5);
If you write
help ipls you get extended help, including input-output (I/O)
explanation, and this goes for all relevant files.
If the name of directory where the toolbox files are placed is, e.g., iToolbox, and
this has been added to the MATLAB path (
File, Set Path in MATLAB) you can
write
help iToolbox to get a list of all relevant files.
Interval PLS – how to do it
The data used for illustration are real extract measurements on beer samples
(ycal) as well as corresponding near infrared (NIR) spectroscopic data (Xcal) in
the range 400-2250 nm (every 2
nd
nm is recorded). In the calibration set the
number of samples is 40. An independent test set is available consisting of 20
samples (Xtest & ytest).
Load the data:
load nirbeer
Make a plot of the raw data:
plot(xaxis,Xcal)
The plot should look like this:
400 600 800 1000 1200 1400 1600 1800 2000 2200 2400
0
0.5
1
1.5
2
2.5
3
3.5
4
5