the phase of finding the winning neuron. Afterwards, in the
correction of the weights process both layers are treated as
one. In the test p hase after the winn ing neuron is found
(using only Kohonen layer) the r oot- mea n-s quar e- erro r of
prediction (RMSEP) is calculated usi ng only weights in the
output layer.
After the training is finished the Kohonen layer serves as
pointing device. After the sample vector x
s
is introduced to the
CPNN the comparison of the weights of the Kohonen layer is
performed and the position of the winning neuron is determined
(this time without further adjustment of the weights — training
was finished earlier), the corresponding neuron in the output
layer and the values of the weights stored in it, are selected as
the best matching for the sample vector x
s
. Even if during the
training phase some of the neurons in the Kohonen layer have
never been excited by training samples, due to existing inter-
actions between the neurons in this phase, the neurons in the
output layer will have stored values for possible samples that
have not been used during the training. These properties of
CPNN together with suitably selected final neighborhood
radius are important in development of model with good
generalization performances for the in terpo lation of the
properties modeled.
3. Data sets
As previously stated the data sets used for the demonstration
of this program were taken from the literature [17–20].
The data set used for development of the regression model
consists of 185 saturated acyclic compounds (ethers, diethers,
acetals and peroxides, as well as their sulf ur analogs) [17,18].
Twelve calculated descriptors were used for prediction of the
boiling points of these substances.
The Italian olive oils data set [19,20], which was used for
classification, consists of 572 samples of olive oils produced in
nine different regio ns in Italy (North Apulia, Calabria, South
Apulia, Sicily, Inner Sardinia, Costal Sardin ia, East Laguria,
West Laguria and Umbria). For each of the samples the per-
centage of the following fatty acids was determined: palmitic,
palmitoleic, stearic, oleic, linoleic, arachidic, lino lenic and ei-
cosenoic. In this case, as dependent variables were used vectors y
with length nine. If the sample belongs to region labeled as k,
then y
k
= 1, and all the other elements of y are set to 0.
In order to perform the analysis, the data sets were randomly
divided into training and test sets. In the case of development of
the regres sion model, 40% of the structures were used as a test
set. The remaining structures were used as a training set. In the
case of development of the classification model, the Italian olive
oil data set was divided into training set consisted of 1/3 of the
sample, while all the other samples were used as a test set. In
both cases before the optimization started the variables were
autoscalled.
4. Software specificat ions and requirements
The CPNN program was developed in Matlab 6.5 (Release 13)
[12] on the bases of SOM Toolbox [9] developed by J. Vesanto et
al. All the features available for SOM in the toolbox are also
available for our CPNN program. The Matlab function which
executes the program is called som_counter_pr op. The syntax for
execution of CPNN program could be found by typing help
som_counter_prop in Matlab's Command Window after the in-
stallation of the program. The required inputs as well as the outputs
are given in the help section of som_counter_prop program. The
parameters used for definition of the shape of the CPNN and for its
training are numerical. Only three input/output parameters are not
numerical. These parameters are structured variables representing
training data, test data and the trained CPNN.
The supporting documentation and demo scripts available in
the SOM Toolbox [9] should be used in order to learn how to
extract maximum information of the analyzed data by this
counter-propagation neural networks program.
The input data file format used in our program is the same as
the one used in the SOM Toolbox [9]. This data format is in
details described in the SOM Toolbox documentation [9]. Our
CPNN program is capable of handling the missing values in the
same way as the other functions available in the SOM Toolbox.
For this purpose the user should replace them with the label
“NaN”.
The training of the CPNN could be performed in two phases:
rough (with large learning rate and large neighborhood radius)
Fig. 4. Some of the neighbourhood functions available in SOM Toolbox (a —
bubble, b — Gaussian, c — cut Gaussian).
Fig. 3. Commonly used learning rate functions (linear, power and inverse).
86 I. Kuzmanovski, M. Novič / Chemometrics and Intelligent Laboratory Systems 90 (2008) 84–91