Journal of Information & Computational Science 9: 3 (2012) 657–665
Available at http://www.joics.com
Using Apriori to Mine IoT Frequent Structures on
Compute Cloud
⋆
Lin Guo
a
, Xiongfei Li
b,∗
a
Colloege of Computer Science and Technology, Jilin University, Jilin 130012, China
b
Symbol Computation and Knowledge Engineer of Ministry of Education, Jilin University
Jilin 130012, China
Abstract
By taking advantages of cloud computing technology and Internet of Things (IoT), an improved approach
was proposed in this paper, which offers an efficient, fast algorithm for mining frequent structures
in massive IoT datasets. The proposed data processing algorithm is preprocessing and parting data
according to the traits of IoT and assuring well data parallelism. We improve Apriori algorithm based
on MapReduce model, make it be able to parallel processing massive data on MapReduce model.
The first step of the method is to eliminate the redundancy presents in IoT data and conduct data
abstraction. Secondly, to use distributed computing methods to mine frequent structures. Finally, to
integrate distributed results and get final sets of frequent structures, then generate a rule set. Potential
benefits — frequent structures can be mined out in a huge amount of datasets distributed and costs few
running time.
Keywords: Cloud Computing; Internet of Things; Apriori; Data Mining
1 Introduction
IoT connects objects with Internet by utilizing identification information and real-time status
information and location information of objects, which are gained from sensor devices, and makes
objects identification, tracking and management become possible. IoT data not only has tradi-
tional real-time data characteristics, such as real-time and high volumes of data, but also has
its own characteristics, such as using lightweight communication protocols, limited computing
and storage resources [5]. Data mining technologies on such a distributed, mass and real-time
data processing would be of great challenges, of which major challenges are need to resolve some
problems as following: (1) managing and mining RFID stream data, (2) query, analyze and mine
⋆
Project supported by the National Science and Technology Support Program Foundation of China under
Grant (No. 2006BAK01A33); the Technology Development Program Foundation of Jilin Province of China under
Grant (No. 20090704); Project supported by the Natural Science Foundation of Jilin Province of China under
Grant (No. 201115020).
∗
Corresponding author.
Email address: xiongfei@jlu.edu.cn (Xiongfei Li).
1548–7741 / Copyright © 2012 Binary Information Press
March 2012