these preselected samples. In addition, if the sampling interval is ex-
tremely long, for example, one day per sampl e, the model could not
cope wit h changes promptly. Therefore, the significance of recursive
methods is not obvious, and the performance of NIR model may deteri-
orate when abrupt changes occur in a real production process during a
sampling interval.
LWR, which is also called just-in-time learning, has been used to
solve this problem [19]. LWR constructs a local model by prioritizing
samples in a database according to the similarity between them and a
query sample. For each query sample to be predicted, a local model is
built and then released. Here, the query sample refers to the spectrum
whose c hemical properties need to be estimated by the calibration
model. Thus, LWR can cope with abrupt changes as well as gradual
ones in contrast to recursive methods, and it can cope with nonlinearity
because it builds a local model repeatedly [17,20].
Besides, wavelength selection in the NIR modeling process is a major
step [21,22] because the removal of uninformative wavelengths results
in better prediction performance and reduces model complexity.
Generally, t he selection criteria for wavel ength can be categorized
into two groups [23,24]. O ne is based on the information co ntent
of the wavelength, such as the signal-to-noise ratio. This method
is often used in qualitative an alysis but is not popular in quantita-
tive analysis because they only consider spectral information with-
out considering the output information when wavelength selection
is performed. The othe r is based on the statisti cs related to the
model's performance, e.g. RMSECV, such as uninformative variable
eliminati on (UVE) [25], iterative partial least squares (iPLS) [26],
Monte Carlo based UVE (MC-UVE) [23], movi ng window partial
least squares (MWPLS) [27], intelligent algorithms [28,29] and so
on. This category of method has been widely used for wavelength
selection when a gl obal model has to be established. In recent
years, a recursive wavelength selection strategy has been proposed
by Chen [18]. This strategy is based on the variable importance in
the projection (VIP) [30] algori thm and can per form recursi ve
updating wavelength structure when query samples are available.
However, during a sampling interv al, the wavelength and model
structure re main unchanged.
Traditionally, wavelength selection algorithm is applied to an
NIR training data set before a global model is e stablished, and
during a sampling interval, the selected wavelengths remain un-
changed. However, the change in process characteristics hinders
the selected wavelengths f rom being re presentative of future
process conditions; this problem has also been addressed in the
literature [18].
To address the aforementioned problems, this a rticle proposes an
online updating NIR modeling method. According to the proposed
algorithm, for each query sample, the calibration samples and wave-
length structure are updated successively, and then a local model is
established. The greatest advantage of this approach is that it can
adaptively adjust both calibration samples and wavelength struc-
ture to real process variations. The performance of the proposed
modeling method is demonstrate d for an NIR data set from a real
gasoline blending and optimal c ontrol process.
The rest of this paper is organized as follows. Section 2 investi-
gates the te chnical background and ne cessity of model updatin g.
Section 3 deta ils our proposed method, inc luding the local modeling
approach, wave length selection method, a nd adaptiv e mode ling
strategy. Section 4 presents the experimental method. And, Section 5
is the experiment result and discussion. Finally, this research is conclud-
ed in Section 6.
2. Method
In this section, the technical background of online NIR analysis and
the necessity of model updating are investigated respectively.
2.1. Technical background
In recent years, qualitative and quantitative applications of NIR spec-
troscopy in various chemical fields, including the pharmaceutical, food,
and petrochemical industries, have gr own dramatically. In gasoline
blending processes, NIR spectroscopy has been used to analyze gasoline
properties.
Gasoline is one of the most profitable products of refineries and can
account for as much as 60% to 70% of total profit [31,32]. Blending recipe
is traditionally calculated based on laboratory values. During the blend-
ing process, the recipe remains unchanged, and different components
are pumped into ta rget tank one by one befor e blending. However,
the large numb ers of orders, delivery dates, blenders, blend compo-
nents, tanks, quality specifications, and nonlinear blending make this
process highly complex and nonlinear [32]. To address these concerns,
the online gasoline blending and optimal control process is desired.
In the online blending process, blending recipe is updated recursive-
ly depending on the feedback information from an NIR spectroscopy
analysis instrument. During th e blending optimization process, the
properties of gasoline components and products analyzed by an NIR
spectrometer are sent to the recipe optimizing server, and a new recipe
is calculated. Then , the DCS system updates the control strategy de-
pending on the recipe. Thus, an NIR spectrometer, particularly NIR
modeling technology, is crucial to for recipe updating during the online
blending process.
The basic principle of the NIR spectroscopy technique is Beer's law,
according to which the relationship between the dependent variable y
(property) and the independent variables x (NIR spectrum) can be
expressed in linear form as
y ¼ a
0
þ a
1
x
1
þ a
2
x
2
þ … þ a
m
x
m
ð1Þ
where a
i
(i =0,1,2,… m) denotes the regression coefficient. Based on
Eq. (1), multivariate calibration models can be used in analyzing multi-
component spectroscopic data. Statistical methods, such as MLR, PLS, or
PCR, and their robust versions are typically used to establish a calibra-
tion model. However, these conventional NIR modeling methods are
based on global linear regression algorithm, and the model structures
remain unchanged during a sampling interval. Because of the large
variation in industrial processes, model performance worsens over
time. Thus, laborious model reconstruction is frequently required.
Meanwhile, if the sampling rate is extremely long (such as one or two
days per sample), and if the model cannot track the process changes,
unqualified products are produced.
The aforementioned factors show that the most important problem
of current NIR models is model maintenance, i.e., how to cope with
changes in process characteristics and maintain high estimation accura-
cy for a long time.
2.2. Necessary of model updating
The performance of an NIR model depends on the quality of both
wavelength structure and calibration samples. NIR spectra typically
consist of broad, weak, non-specific, and overlapping bands. Moreover,
NIR data sets may have thousands of wavelengths [7,30,33–36]. There-
fore, certain irrelevant variables may be present for multivariate calibra-
tion. B etter quantitative calibration models may be obtained by
selecting characteristic wavelengths, including sample-specificor
component-specific information instead of the full-spectrum informa-
tion. Both experimental and theoretical demonstration revealed that
the performance of a calibration model can be improved by using select-
ed informative wavelengths, not the full spectrum. Moreover, online
spectroscopic measurements are almost inevitably subject to fluctua-
tions and variations in physica l conditions, such as temp erature,
pressure, flow turbulence, sample compactness, particle size, and sur-
face topology [18], which can influence spectra in a nonlinear manner,
80 K. He et al. / Chemometrics and Intelligent Laboratory Systems 134 (2014) 79–88