Sensors 2018, 18, 433 2 of 20
and cooling fan speeds appropriately [
4
,
5
]. In addition, in order to mitigate thermal emergencies on
multi-core chips, only a fraction of cores can be simultaneously powered in the full performance mode,
while other cores (i.e., dark cores) need to be power gated. In this so-called dark silicon problem [
6
–
9
]
is important to ensure thermal-safe operation for modern chips, i.e., where the peak temperature does
not exceed the safe-operating temperature, otherwise the response mechanisms of DTM are triggered.
The number of on-die thermal sensors keeps growing in very large scale integration (VLSI) systems
to enable the DTM of chip functionalities [
10
–
21
], as shown in Figure 1. The accuracy of on-chip sensor
readings has a great influence on the effectiveness and reliability of DTM. However, embedded thermal
sensors are inevitably accompanied by noise, including process variation, supply voltage fluctuations,
and cross-coupling etc, which cause the observed temperature readings to deviate from the actual
values. In the worst case, the temperature reading error of un-calibrated thermal sensors used in
IBM25PPC750L processors (International Business Machines Corporation (IBM), Armonk, New York,
United States of America) can be up to 34
◦
C (at an actual temperature of 95
◦
C) [
22
]. Therefore, blindly
trusting the thermal sensors to be ideal can lead DTM strategies to make inaccurate decisions that
result in false alarms or unnecessary responses.
40
30
20
10
0
2006 2007 2008 2009 2010 2011 2012 2013 2014
90 nm
65 nm
45 nm
22 nm
[10]
[11]
[12]
[13]
[15]
[14]
[16]
[17]
[18]
[19]
[20] [21]
Figure 1. Trends in the number of embedded thermal sensors in VLSI systems.
Thermal monitoring and management in VLSI systems have been widely researched in recent
years [
23
–
25
]. Nowroz et al. [
26
] utilized frequency-domain signal representations to devise both static
and runtime thermal monitoring approaches. Unfortunately, this work does not consider the effect of
inaccurate and noisy sensors. Reda et al. [
27
] proposed a new direction to simultaneously identify
the thermal models and the fine-grain power consumption of a chip from just the measurements of
the thermal sensors and the total power consumption. Although they verified the accuracy of this
method and demonstrated its resilience to sensor noise, the problem of noise reduction for sensor
measurements was not addressed. Effective temperature calibration can compensate for inaccuracies
in temperature measurement, and help to improve thermal sensing accuracy. As a result, how to
solve the problem of estimating temperatures for on-chip thermal sensors corrupted by noise is a
major challenge.
A number of studies have taken into account the noise issue associated with sensor readings,
such as the statistical methodology [
28
] and the multi-sensor collaborative calibration algorithm
(MSCCA) [
29
]. However, these techniques lack the ability for real-time prediction which is required
for proactive DTM techniques [
30
]. In [
31
,
32
], the authors proposed a scheme to make online
temperature measurements significantly more accurate. They constructed an offline thermal equivalent
resistor–capacitor (RC) model and reduced its complexity by a projection-based model order reduction
method. This model can be used to convert the power dissipation to temperature in the prediction step
of the Kalman filter. However, the derivation of such an RC model is not trivial due to the complexity
of silicon materials. Unlike the above approach, we apply the polynomial fitting technique to convert
the oscillation frequency of noisy sensors to temperature data and use the smoothing filter to obtain