ARESLab
Adaptive Regression Splines toolbox for Matlab
ver. 1.3.1
Gints Jekabsons
Institute of Applied Computer Systems
Riga Technical University
Meza 1/3, LV-1048, Riga, Latvia
URL: http://www.cs.rtu.lv/jekabsons/
Reference manual
April, 2010
Copyright © 2009-2010 Gints Jekabsons
2
CONTENTS
1. INTRODUCTION .........................................................................................................................3
2. AVAILABLE FUNCTIONS .........................................................................................................5
2.1. Function aresbuild ............................................................................................................5
2.2. Function aresparams ..........................................................................................................7
2.3. Function arespredict ........................................................................................................9
2.4. Function arestest ..............................................................................................................9
2.5. Function arescv ................................................................................................................10
2.6. Function arescvc ..............................................................................................................10
2.7. Function aresplot ............................................................................................................11
2.8. Function areseq ................................................................................................................12
3. EXAMPLES OF USAGE............................................................................................................13
3.1. Ten-dimensional function with noise..................................................................................14
3.2. Noise-free two-dimensional function..................................................................................15
4. REFERENCES.............................................................................................................................17
3
1. INTRODUCTION
What is ARESLab
ARESLab is a Matlab toolbox for building piecewise-linear and piecewise-cubic regression
models using the Multivariate Adaptive Regression Splines technique (also known as MARS). (The
term “MARS” is a registered trademark and thus not used in the name of the toolbox.) The original
author of MARS technique is Jerome Friedman (Friedman 1991, Friedman 1993).
The toolbox allows building models (referred to as ARES models) using different settings,
testing them on a separate test set or using k-fold Cross-Validation, using them for prediction,
outputting equations for deployment, plotting the models etc. The built models can also be used as
metamodels (also known as surrogate models) for design optimization tasks (e.g., see Chen et al.
2006, Kalnins et al. 2008, Kalnins et al. 2009, Jekabsons 2010).
This reference manual provides overview of the functions available in the ARESLab.
ARESLab can be downloaded at http://www.cs.rtu.lv/jekabsons/.
The toolbox code is licensed under the GNU GPL ver. 2 or any later version.
Some parts of aresbuild and createList functions are derived from ENTOOL toolbox
(Merkwirth & Wichard 2003, Norgaard 2000) which also falls under the GPL licence.
For any feedback on the toolbox including bug reports feel free to contact me.
Details
The ARESLab toolbox is written entirely in Matlab. I tried to implement the main functionality
of the MARS technique for regression as close to the description in the Friedman's original paper
(Friedman 1991) as possible. While implementing the knot placement part (see remarks about
minSpan and endSpan in Section 2), I also took a look at the source code of the R Earth package
(Milborrow 2009) and implemented it very similarly to Earth version 2.4-0. The only major
difference at the moment I think is that the model building is not accelerated using “Fast MARS”
queuing (Friedman 1993) together with the “fast least-squares update technique” (Friedman 1991).
This difference however affects more the speed of the algorithm execution rather than the predictive
performance of built models.
The absence of “Fast MARS” queuing means that the code might be rather slow for large data
sets (however see the function descriptions on how to make it faster by setting more conservative
values for algorithm parameters). Note that a much faster version of Multivariate Adaptive
Regression Splines is included in VariReg software tool (Jekabsons 2009, available at
http://www.cs.rtu.lv/jekabsons/) which also can be put to work from within the Matlab environment
(although with much less functionality). Another alternative is to use the Earth package for R which
is very sophisticated however lacks the ability to create piecewise-cubic models.
Possible future updates for the toolbox:
• optional complete re-training of an existing model (for slightly changed data);
• setting the upper limit of interactivity for each input variable separately;
• automatic variable scaling;
• modelling for classification problems (although for two classes, one can code the output as
0/1, treat the problem as a regression, and use the current version of ARESLab).
4
Some further aspects of MARS mentioned in Friedman’s papers but not implemented in
ARESLab:
• “Fast MARS” queuing;
• automatic handling of missing values;
• automatic handling of categorical input variables (with the current version of ARESLab, the
user must create a number of dummy variables in the usual way before building the model);
• model slicing.
Citing the ARESLab toolbox
Please give a reference to the webpage in any publication describing research performed using
the toolbox e.g., like this:
Jekabsons G., ARESLab: Adaptive Regression Splines toolbox for Matlab, 2010, available at
http://www.cs.rtu.lv/jekabsons/
5
2. AVAILABLE FUNCTIONS
ARESLab toolbox provides the following list of functions:
• aresbuild – builds an ARES model;
• aresparams – creates a configuration for ARES model building algorithm for further use
with aresbuild, arescv, or arescvc functions;
• arespredict – makes predictions using an ARES model;
• arestest – tests an ARES model on a test data set;
• arescv – tests ARES performance using k-fold Cross-Validation;
• arescvc – finds the “best” value for penalty c (Generalized Cross-Validation penalty per
knot) from a set of candidate values using k-fold Cross-Validation and MSE;
• aresplot – plots surface of an ARES model;
• areseq – outputs the ARES model in an explicit mathematical form.
2.1. Function aresbuild
Purpose:
Builds a regression model using the Multivariate Adaptive Regression Splines technique.
Call:
[model, time] = aresbuild(Xtr, Ytr, trainParams, weights, modelOld, verbose)
All the arguments, except the first two, of this function are optional. Empty values are also
accepted (the corresponding default values will be used).
Input:
Xtr, Ytr : Training data cases (Xtr(i,:), Ytr(i)), i = 1,...,n. Note that it is
recommended to pre-scale Xtr values to [0,1] (Friedman 1991) and to
standardize Ytr values (Milborrow 2009). This is because widely different
locations and scales for the input variables can cause instabilities that could
affect the quality of the final model. The MARS technique is (except for
numerics) invariant to the locations and scales of the input variables. It is
therefore reasonable to perform a transformation that causes resulting
locations and scales to be most favourable from the point of view of
numeric stability (Friedman 1991).
trainParams : A structure of training parameters for the algorithm. If not provided,
default values will be used (see function
aresparams for details).
weights : A vector of data case weights; if supplied, the algorithm calculates the
sum of squared errors multiplying the squared residuals by the supplied
weights. The length of weights vector must be the same as the number of
data cases (i.e., n). The weights must be nonnegative.
modelOld : If here an already built ARES model is provided, no forward phase will be
done. Instead this model will be taken directly to the backward phase and
pruned. This is useful for fast selection of the "best" penalty
trainParams.c value using Cross-Validation e.g., in arescvc function.
verbose : Set to false for no verbose. (default value = true)