function [x, ind, r_act, A_o, x_o] = ols(A,b,r,threshold)
% OLS Orthogonal Least Quares.
%
% [x, ind] = OLS(A,b,r) gives the solution to the least squares problem
% using only the best r regressors chosen from the ones present in matrix A.
% This function also returns in the vector ind the indexes of the
% best r regressors (i.e., the best columns of A to use).
%
% If r is equal to n, the solution x given by OLS is the same as the solution
% given by A\b, but in ind we still have the regressors sorted by their
% importance. % This means that one can perform a feature selection by taking
% the first k entries in ind (k<r).
%
% THEORETICAL BACKGROUND
% Let us consider the Linear Least Squares Problem:
%
% min (A*x-b)'*(A*x-b)
%
% where A is a m-by-n matrix, b an m-dimentional vector and x, the unknown,
% an n-dimentional vector. The problem is well posed when m>n.
% A can be viewed as a set of columns, usually called regressors, while b typically
% is the vector of desired outputs.
% The solution to this problem can be computed in Matlab through the function
% MLDIVIDE as follows:
%
% x = A\b
%
% Although this function is very powerful, it does not give any information on which
% regressor is better than others.
% On the other hand, Orthogonal Least Squares (OLS) can extract this information.
% It is mainly based on the Gram-Schmidt orthogonalization algorithm.
%
% In pattern recognition problems, usually a training matrix A_tr is given,
% associated with the vector of desired outputs b_tr, plus a test set pair (A_ts, b_ts)
% where the system has to be tested. In such cases the OLS function can be
% used as follows, with a value of r chosen by the user:
%
% [x, ind] = OLS(A_tr, b_tr, r);
%
% err_tr = (A_tr*x-b_tr)' *(A_tr*x-b_tr)/length(b_tr); % Mean Squared Error on training set.
% err_ts = (A_ts*x-b_ts)' *(A_ts*x-b_ts)/length(b_ts); % Mean Squared Error on test set.
% While a large value for r will probably give a low error on the training set,
% it could happen that a lower value of it would give a lower error on the test set.
% In such cases, the value of r which minimizes the error on the test set should be chosen.
%
% ALTERNATIVE SYNTAX
%
% [x, ind, r_act] = OLS(A,b,[], threshold) is another way to call the OLS function.
% This call requires a threshold in place of the number of regressors r.
% The threshold, which has to be a decimal number in the interval [0,1],
% indicates the maximum relative error allowed between the approximated output b_r
% and the desired output b, when the number of regressors used is r.
% The provided output r_act gives the number of regressors actually used, which
% are necessary to reduce the error untill the desired value of the threshold.
% Please not that r_act is also equal to the length of the vector ind.
% The pseudocode for this call is the following:
%
% for r=1:n
% x_r = OLS(A,b,r);
% b_r = A*x_r;
% err = 1-sum(b_r.^2)/sum(b.^2);
% if threshold < err,
% break;
% end
% end
% r_act = r;
%
% When threshold is near to 1, r tends to be 1 (a regressor tends to be sufficient).
% When threshold is near to 0, r tends to be n (all regressor tends to be needed to reach the desired accuracy). The parameter threshold is an indirect way to choose the parameter r.
%
% ALTERNATIVE SYNTAX
%
% OLS(A,b); using this syntax the OLS(A,b,r) function is called for r=1:n and the Error Reduction Ratio
% is collected and then plotted. It gives an idea on the possible values
% for threshold.
%
% ALTERNATIVE SYNTAX
%
% The function OLS computes new, orthogonal regressors in a transformed space. The new regressors
% can be extracted by using the following syntax:
%
% [x, ind, r_act, A_o] = OLS(A,b,[], threshold);
% or the following one:
%
% [x, ind, r_act, A_o] = OLS(A,b,r);
%
% The output A_o is an m-by-r_act matrix.
%
% To test the orthogonality of new generated regressors use the following instructions:
% A_o(:,u)'* A_o(:,v)
% u,v being integer values in {1,�,r_act}
%
% ALTERNATIVE SYNTAX
%
% [x, ind, r_act, A_o, x_o] = OLS(A,b,r); % This syntax can be used to extract the vector x_o,
% which is the vector of solutions in the orthogonally transformed space.
% This means that the length of vector x_o is r_act, as the number of columns of A_o.
% Numerically, A*x is equal to A_o*x_o.
%
% FINAL REMARK
%
% The implementation of this function follows the notation given in [1].
%
% AKNOLEDGEMENTS
%
% A special thank to Eng. Bruno Morelli for the help provided in writing
% this function.
%
% REFERENCES
%
% [1] L. Wang, R. Langari, "Building Sugeno-yype models using fuzzy
% discretization and Orthogonal Parameter Estimation Techniques,"
% IEEE Trans. Fuzzy Systems, vol. 3, no. 4, pp. 454-458, 1995
% [2] S.A. Billings, M. Koremberg, S. Chen, "Identification of non-linear
% output-affine systems using an orthogonal least-squares algorithm,"
% Int. J. Syst. Sci., vol. 19, pp. 1559-1568, 1988
% [3] M. Koremberg, S.A. Billings, Y.P. Liu, P.J. McIlroy, "Orthogonal
% parameter estimation algorithm for nonlinear stochastic systems,"
% Int. J. of Control, vol. 48, pp. 193-210, 1988
%
% See also MLDIVIDE, LSQLIN, PINV, SVD, QR.
% OLS, $Version: 0.81, August 2007
% Author: Marco Cococcioni
% Please report any bug to m.cococcioni <at> gmail.com
if nargin < 2, % Running in demo mode
close all
clc
disp(sprintf('Not enough input arguments provided. Running in demo mode.'));
disp('First demo');
disp('--------------------------------------------------------------');
m = 1000;
n = 5;
A = randn(m, n);
b1_str = '(0.5*A(:,1)+0.07*A(:,2)+2*A(:,3)+0.004*A(:,4)+60*A(:,5)).^3';
b1 = eval(b1_str);
disp(sprintf('Observations (m): %d, variables (n): %d, random matrix of regressors (A): randn(m,n)', m, n));
disp(sprintf('Vector of outputs b = %s', b1_str));
disp('');
disp('Given the previous vector b, it is evident that regressors sorted by');
disp('decreasing order of importance are: [5 3 1 2 4] (look to the coefficients used in b)');
ols_demo(A,b1);
close all
clc
disp('Second demo');
disp('--------------------------------------------------------------');
b2_str = '(A(:,1)+A(:,2)+A(:,3)+A(:,4)+A(:,5)).^3';
b2 = eval(b2_str);
disp(sprintf('In this second demo, we change the vector b as follows\nb = %s', b2_str));
disp('In this case, it is evident that all regressors have the same importance.');
disp('This will be highlighted by the chart on Error Reduction Ratio as a function of the')
disp('number of regressors: it will be practically a straight line.');
disp('Furthermore, the order of the most important is random!');
ols_demo(A,b2);
clear all
return
end
if nargin == 2,
[m, n]=size(A);
sse = zeros(n,1);
for r = 1:n
[x{r}, ind{r}] = ols(A,b,r); % ind{r} contains the index of the r most important regressors
sse(r) = (A*x{r}-b)'*(A*x{r}-b);
b_r{r} = A*x{r};
err(r) = 1-sum(b_r{r}.^2)/sum(b.^2);
end
plot(err,'.-b');
ylabel('Error Reduction Ratio (ERR)');
xlabel('Index of the most important regression variables');
title(sprintf('Matrix A has %d rows (observations) and %d columns (regression variables)', m,n));
set(gca,'xtick',1:n);
set(gca,'xticklabel',ind{end});
x = x{end};
ind = ind{end};
return
end
[m, n]=size(A);
if m <= n,
error ('OLS error: The number of observations shall be strictly greater than the number of variables');
end
A=A';
cum_error=0;
ind=[];
error_break = false;
if isempty(r),
r = n; % In case the number of orthogonal regressors is not provided,
% it will be set equal to the number of variables (regressors)
error_bre