The main contributions of this paper are multi-fold, which
include:
1) We model the tradeoff between accuracy and energy
cost of distinguishing the uncertain objects for crowd-
sensing as an optimization problem.
2) We quantify the accuracy of distinguishing uncertain
objects.
3) We investigate our model by leveraging the hy-
percube to explain the optimal solution of feature
selection.
4) We evaluate our proposed schemes with real data of
multiple features, and the result proves its low-cost,
high-accuracy performance.
The rest of paper is structured as follows: Section II
presents a brief overview of related works. In section III,
we introduce optimization problem between multiple feature
selection and energy cost in crowdsensing. In section IV, we
explain algebra solution of multiple features with hypercube.
Section V evaluates our schemes through the data we collected
in realistic scenarios, and Section VI concludes the paper with
directions for future work.
II. RELATED WORK
A. Uncertain Data in Crowdsensing
Uncertain data is a notion of data that contains specific
uncertainty. It typically found in the area of sensor networks.
There are three main models that could explain the generation
of uncertain data, attribute uncertainty, correlated uncertainty
and tuple uncertainty [6]. In most cases, data may contain
errors or may only be partially complete because of sources
imprecision including data randomness, limitation of measur-
ing sensors. Besides, the data we collected may correspond to
objects which are only vaguely specified, therefore we need to
explore data representation uncertainty [3].Modeling uncertain
data has been studied a lot in [7]–[9]. A database that provides
incomplete information consists of a set of possible instances
of the database.
Tao et al. [10] try to describe query result of d-dimensional
point x that belongs to an uncertain object O with a probability
threshold p
q
∈ [0, 1]. They suppose that uncertain object O
is associated with a probability density function p(x) and d-
dimensional uncertainty region Ω
O
. Then they calculate the
integration of probability density function on hyperspace Ω
O
.
Finally they define object O as a result if the integration is
larger than the threshold, otherwise it is not a result. In [8],
the authors define probabilistic database as a pair < x, p >,
where x is a finite set of possible database instances consistent
with a given schema, and p(I) is the probability associated
with any instance I ∈ x.
B. Energy-aware Mobile Computing
Energy awareness attract a lot of attention recently. One
important reason is the uptime of battery-powered mobile
devices, such as smartphones and tablets. In [11], the authors
found that we can achieve different energy consumption if
we implement algorithms in alternative ways. Regarding the
multiple sensor-based applications of smartphone, we have to
balance the tradeoff between energy cost and accuracy. For
example, there are several choices of localization by resorting
to different sensors in smartphone. GPS could achieve a high
accuracy while may consume too much energy. Other alter-
native ways such as leveraging cellar tower(GSM), WiFi or
Buletooth, which could achieve different localization accuracy
by consuming different energy. Maximilian Schirmer et al.
[12], propose sensor substitution approaches to reduce the
energy consumption of utilizing smartphone sensors.
C. Hypercube
The applications of hypercube have been initially studied
in parallel and distributed computing [13] [14]. And most of
recent hypercube based researches focus on routing. In [15],
the authors utilize hypercube to design a team multicast routing
protocol to address the scalability in mobile ad hoc networks.
H. Huo et al. [16] defined a routing selection and maintenance
rules based on a logical hypercube structure. In [17], the
authors linked Bluetooth devices as a hypercube to construct
a parallel computation and communication environment. Also
trying to leverage hypercube properties, the authors in [18]
[19] mapped social features into hypercube to design routing
algorithm for HCNs(human contact networks). We also use
hypercube to analyze multiple features selection solution in
geometric view. And for future work, we would explore the
hypercube properties on objects prediction.
D. Relative Entropy
Bin Jiang et al. [20] propose a novel probability similarity
measurement, KL divergence, to deal with the data clustering
problem for uncertain data. With the KL divergence, which can
distinguish the probability distribution difference of two ob-
jects, they update the partitioning and density-based clustering
approaches. They also test their new clustering algorithms with
synthetic data. In [21] [22], KL divergence was treated as a
feature to differentiate two kind of sound. For example, S. Basu
et al. [21] first introduce the relative entropy(KL divergence)
as a feature of sound to distinguish speech and other ambient
sounds.
III. OPTIMIZATION PROCESS FEATURE SELECTION
In this section, we introduce the optimization problem
between multiple feature selection and energy cost in crowd-
sensing. We explain data uncertainty of crowdsensing as the
first step. Then we will define the accuracy as our optimization
target under the constraints of energy cost. Table I summarizes
a list of essential notations used throughout the paper. (Some
notations are introduced later.)
A. Uncertain Data
There are many studies on modeling uncertain data [7]–[9].
In [7], the possible world model has been proposed. In this
model, any legitimate combination of each tuple constitutes
an instance of a possible world. We can define uncertain
database in probabilistic way, which is a finite probability
space whose outcomes are all possible database instances
consistent with a given schema. This can be represented as the
pair < X, p >, where X is a finite set of possible database
instances consistent with a given schema, and p(x) is the
probability associated with any instance x ∈ X. We note that