J.C.-W. Lin et al.
types of databases. Frequent itemset mining (FIM) [1]plays
an essential role in data mining since it can extract rele-
vant relationships between purchases made by customers
in transactional databases. In traditional FIM, only the
occurrence frequencies of item/sets are considered. But in
real-life, many other important factors must be considered to
determine if a pattern is interesting such as purchase quan-
tities, unit profits, and item weights. To reveal more useful
and meaningful information from transactional databases,
high-utility itemset mining (HUIM) was proposed by Yao
et al. [26]. It considers both the unit profits and purchase
quantities of items to find high-utility itemsets (HUIs), that
is itemsets that yield a high profit, or have a high impor-
tance. An item/set is considered as a HUI if its utility is
no less than a minimum utility count, which is set accord-
ing to the user’s preferences. Discovering HUIs can be seen
as an extension of the task of FIM. It has a wide range of
applications, such as sale promotion [5, 19, 28] and stock
investment [17]. To address the challenge that the down-
ward closure (DC) property of FIM is not applicable to the
utility measure in HUIM, the transaction-weighted utility
(TWU) model [14] was designed. It provides a transaction-
weighted downward closure (TWDC) property for finding
high-transaction-weighted utilization itemsets (HTWUIs).
This property can be used to avoid exploring the generally
very large search space of itemsets for HUIM, and thus dis-
cover HUIs efficiently. Several algorithms [7, 19–21]were
proposed to mine HUIs using the TWU model.
Although HUIM can reveal information that is often
more relevant than the one found by FIM, it has an impor-
tant limitation, which is that it does not take the length of
itemsets into account. This is a problem since the utility gen-
erally increases with the length of itemsets. For this reason,
the utility measure is an unfair measurement of the impor-
tance of itemsets for real-world applications. As a solution
to this issue, the task of high average-utility itemset min-
ing (HAUIM) has been introduced. By considering the size
of itemsets, it can reveal the high average-utility itemsets
(HAUIs). The average-utility of an item/set is the utility of
the itemset divided by its length (number of items). Hong
et al. first designed a two-phase average-utility (TPAU)
algorithm [11] to mine the HAUIs using an Apriori-like
approach. An average-utility upper bound (auub) property
was designed to ensure the completeness and correctness
of the algorithm. A projection-based PAI algorithm [12],
a tree-based high average-utility pattern (HAUP)-tree algo-
rithm [15], and the HAUI-tree [22] algorithm were designed
to efficiently mine HAUIs based on the TPAU algorithm.
The HAUI-Miner algorithm [18] was then developed to
directly mine HAUIs without generating candidates and
without scanning the database multiple times, thanks to a
special type of utility-list structure [20]. Nonetheless, a con-
siderable drawback of the HAUI-Miner algorithm is that
it repeatedly performs costly join operations for mining
HAUIs of different lengths since the TPAU model is based
on the auub upper bound, which is a loose upper bound on
the average-utility of itemsets.
To provide a more efficient algorithm for HAUI mining,
this paper presents a novel algorithm, which relies on sev-
eral pruning strategies to efficiently mine HAUIs. The major
contributions of this paper are summarized as follows.
1. In this paper, we design a depth-first search algo-
rithm for mining HAUIs, which includes three pruning
strategies. Moreover, a depth-first search version of the
algorithm is also considered.
2. The first pruning strategy stores the auub values of 2-
itemsets (item pairs) in a matrix, which is then used
to eliminate unpromising candidates early. Two more
pruning strategies are also designed based on a designed
item processing order, to respectively reduce the search
space for mining HAUIs and perform join operations
more efficiently.
3. Experiments are conducted to evaluate the efficiency
of the designed approach in terms of runtime, memory
usage, number of candidates, and scalability. Both real-
world and synthetic datasets are used. Moreover, the
effectiveness of the pruning strategies is also evaluated.
The rest of this paper is organized as follows. Related
work on HUIM and HAUIM is reviewed in Section 2.
Preliminaries and the problem statement are presented in
Section 3. The proposed algorithm, including three prun-
ing
strategies, is described in Section 4. A detailed example
showing how the designed algorithm is applied step-by-step
on an example database is given in Section 5. Results from
a series of experiments performed on various datasets are
discussed in Section 6. Finally, a conclusion is drawn in
Section 7.
2 Related work
Association-rule mining (ARM) is a fundamental research
area in data mining, which has been widely studied and
has many real-life applications. The Apriori algorithm [1]
was first developed to mine association rules (ARs) in two
phases. In the first phase, a set of frequent itemsets (FIs) is
discovered based on a minimum support threshold. Then,
in the second phase, the derived FIs are combined to obtain
a set of ARs respecting a minimum confidence thresh-
old. Let the size or length of an itemset be the number
of items that it contains. The Apriori algorithm is said to
use a level-wise approach to discover FIs since it discov-
ers itemsets by ascending order of their size. A limitation of
level-wise algorithms is that multiple database scans are typ-
ically performed to discover the desired itemsets, and that