AssociationRules_association_rules资源-CSDN文库

Mining

Association

需积分: 2 143 浏览量 2012-07-25 11:24:27 上传评论收藏 335KB PDF 举报

资源详情

资源评论

资源推荐

Pruning Closed Itemset Lattices for

Asso ciation Rules

Nicolas Pasquier, Yves Bastide, Rak Taouil, Lot Lakhal

{pasquier,bastide,taouil}@lib d1.univ-bpclermont.fr

lakhal@ucfma.univ-bp clermont.fr

Lab oratoire d'Informatique (LIMOS)

Université Blaise Pascal - Clermont-Ferrand I I

Complexe Scientique des Cézeaux

24, av. des Landais, 63177 Aubière Cedex France

Résumé

La découverte des règles d'association est l'un des principaux problèmes de l'extraction

de connaissances dans les bases de données. De nombreux algorithmes ecaces ont été

proposés, dont les plus remarquables sont Apriori, l'algorithme de Mannila, Partition,

Sampling et DIC. Ces derniers sont tous basés sur la méthode de recherche de Apriori:

l'élagage du treil lis des parties (treil lis des itemsets). Dans cet article, nous proposons

un algorithme ecace basé sur une nouvel le méthode de recherche: l'élagage du treillis

des fermés (treil lis des itemsets fermés). Ce treillis qui est un sous-ordre du treil lis des

parties est étroitement lié au treillis de concepts de Wil le dans son analyse formel le

de concepts. Nous avons comparé expérimentalement Close à une version optimisée

de Apriori et les résultats obtenus montrent la grande ecacité de Close dans le

traitement des données denses et/ou corrélées tel les que les données de rescensement

(cas dicile). Nous avons également pu observer que Close donne des temps de

réponse corrects dans le traitement des bases de données de ventes.

Mots-Clef:

extraction de connaissances; règles d'association; treillis; algorithmes.

Abstract

Discovering asso ciation rules is one of the most important task in data mining and

many ecient algorithms have been prop osed in the literature. The most noticeable

are Apriori, Mannila's algorithm, Partition, Sampling and DIC, that are all based on

the Apriori mining metho d: pruning of the subset lattice (itemset lattice). In this

paper we propose an ecient algorithm, called Close, based on a new mining method:

pruning of the closed set lattice (closed itemset lattice). This lattice, which is a

sub-order of the subset lattice, is closely related to Wille's concept lattice in formal

concept analysis. Experiments comparing Close to an optimized version of Apriori

showed that Close is very ecient for mining dense and/or correlated data such as

census data, and p erforms reasonably well for market basket style data.

Keywords:

data mining; knowledge discovery; asso ciation rules; lattices; algorithms.

hal-00467745, version 1 - 26 Apr 2010

Author manuscript, published in "BDA'1998 international conference on Advanced Databases, Hammamet : Tunisia (1998)"

1 Intro duction

One of the most imp ortant task in data mining is the discovery of

association rules

rst introduced

in [1]. The aim of the association rule discovery is to identify relationships between items in very

large databases. For example, given a market basket database, it would b e interesting for decision

support to know the fact that 80% of customers who b ought cereals and sugar also b ought milk.

In a census database, we should discover that 60% of p ersons who worked last year earned less

than the average income, or in a medical database, that 70% of patients who have stinesses and

fever also have headaches.

Agrawal's statement of the problem of discovering association rules in market basket databases is

the following [1, 2]. Let

= {

; i

; : : : ; i

} b e a set of

literals called

items

. Let the database

= {

; t

; : : : ; t

} b e a set of

transactions, each one consisting of a set of items

from

and

associated with a unique identier called its TID.

is called a

k-itemset

, where

is the size of

. A transaction

2 D

is said to

contain

an itemset



. The

support

of an itemset

the p ercentage of transactions in

containing

support

(

) =

2 D j



2 Dg

. An

association rule is a conditional implication among itemsets,

)

, where itemsets

I ; I

 I

and

;

. The

condence

of an association rule

)

is the conditional probability that a

transaction contains

, given that it contains

conf idence

(

) =

support

(

[

)

= suppor t

(

)

The supp ort of

is dened as:

support

(

) =

support

(

[

)

The problem of mining asso ciation rules in a database

is then traditionally dened as follows.

Given user dened thresholds for the permissible minimum support and condence, nd all the

association rules that hold with more than the given

minsupport

and

mincondence

. This problem

can be broken into two subproblems [1]:

Finding all

frequent

itemsets in

, i.e. itemsets with support greater or equal to

minsupport

Frequent itemsets are also called

large

itemsets.

For each frequent itemset

found, generating all association rules

)





with condence greater or equal to

mincondence

The second subproblem can b e solved in main memory in a straightforward manner once all frequent

itemsets and their supp ort are known. Hence, the problem of mining asso ciation rules is reduced

to the problem of nding frequent itemsets. Many algorithms have b een prop osed in the literature

[2, 3, 9, 8, 11, 12, 13]. Although they are very dierent from each other, they are all based on the

Apriori mining metho d [2]: pruning of the subset lattice for nding frequent itemsets. This relies

on the basic prop erties that

al l subsets of a frequent itemset are frequent

and that

al l supersets

of an infrequent itemset are infrequent

. Algorithms based on this approach perform very well for

weakly correlated data such as market basket data. However perfomances drastically decrease for

correlated data such as census data.

In this paper, we prop ose a new ecient algorithm called Close for mining association rules in very

large databases. Close is based on the pruning of the closed itemset lattice which is a sub-order

of the subset lattice, thus much smaller. Such a structure is closely related to Wille's concept

lattice in formal concept analysis [5, 14, 15]. We show that this structure can be used as a formal

framework for discovering association rules given the basic prop erties that

al l sub-closed itemsets of

a frequent closed itemset are frequent

, that

al l sup-closed itemsets of an infrequent closed itemset

are infrequent

and that

the set of maximal frequent itemsets is identical to the set of maximal

frequent closed itemsets

. Empirical evaluations comparing Close to an optimized version of Apriori

showed that Close performs reasonably well for weakly correlated data and p erforms very well for

correlated data.

The rest of the paper is organized as follows. Section 2 reviews related work and exhibits the

contribution of the pap er. In Section 3, we dene the semantics of asso ciation rules based on

the

Galois connection operators

. In Section 4, we describe the Close algorithm. Section 5 gives

experimental results on synthetic data

and census data using the PUMS le for Kansas USA

and Section 6 concludes the pap er.

http://www.almaden.ibm.com/cs/q uest/syndata.html

ftp://ftp2.cc.ukans.edu/pub/ipp r/census/pums/pums 90ks.zip

hal-00467745, version 1 - 26 Apr 2010

Set of candidate

-itemsets (potentially frequent itemsets).

Each element of this set has two elds: i) itemset and ii) support count.

Set of frequent

-itemsets (itemsets with minimum supp ort).

Each element of this set has two elds: i) itemset and ii) support count.

Table 1: Notation

Algorithm Apriori

In the Apriori algorithm, items are sorted in lexicographic order. The pseudo-co de of the Apriori

frequent itemset discovery is given in Algorithm 1. Frequent itemsets are computed iteratively,

in the ascending order of their size. The pro cess takes

iterations, where

is the size of the

largest frequent itemsets. For each iteration



, the database is scanned once and all frequent

itemsets of size

are computed. The rst iteration computes the set

of frequent

-itemsets. A

subsequent iteration

consists of two phases. First, a set

of candidate

-itemsets is created by

joining the frequent (



)-itemsets in



found in the previous iteration. This phase is realized

by the Apriori-Gen function described b elow. Next, the database is scanned for determining the

support of the candidates in

and the frequent

-itemsets are extracted from the candidates.

This pro cess is rep eated until no more candidate can be generated.

= {Large 1-itemsets};

for

(

=2;



;

++ )

do b egin

= Apriori-Gen(



); // Generates candidates

-itemsets

forall

transactions

2 D

do b egin

= Subset(

); // Candidates contained in t

forall

candidates

.count++;

end

= {

.count



minsupport

}

10)

end

11) Answer

;

Algorithm 1: Apriori frequent itemset discovery

Apriori-Gen Candidate Generation

The function takes as argument the set



of frequent

(



)-itemsets. It returns the set

of candidate

-itemsets, which is a superset of the set of all

frequent

-itemsets. Two frequent itemsets of size



with the same rst



items are joined,

generating a new candidate itemset of size

insert into

select

.item

: : :

.item



.item



from



p; L



where

.item

: : :

.item



.item



.item



.item



;

Then, the candidate set

produced is pruned by removing every candidate

-itemset

such that

some (



)-subset of

is not in



forall

candidate itemsets

do b egin

forall

(



)-subsets

do b egin

(

s =



)

then

delete

from

;

Example

Figure 3 shows the execution of Apriori for a minimum supp ort of 2 (40%) on the

database

. This process takes four iterations, computing four sets of candidates and frequent

itemsets and p erforming four database passes. The frequent itemsets found are outlined in the

itemset lattice given in Figure 2.

hal-00467745, version 1 - 26 Apr 2010

剩余19页未读，继续阅读

评论收藏

内容反馈

thssla21

粉丝: 5
资源: 142

Association Rules

评论0

最新资源

Association Rules

评论0

关联规则Association Rules

Rules

Sampling large databases for association rules

Fast Algorithms for Mining Association Rules

Mining Association Rules and Frequent Itemsets

Fast algorithms for mining association rules Apriori之父的论文

Apriori algorithm for association rules

Association Model

Rules of Productivity

CPAR:Classification based on Predictive Association Rules

Mining Association Rules between Sets of Items

Notebook - Association Rules & Analysis.ipynb

A data mining tool based on association rules

Java Rules

Rules:滴滴达

Brain Rules

GSM Association

MISRA C Rules

UCI Online Retail Example - Apriori & Association Rules.ipynb

Classify Web Document by Genetic Algorithm with Association Rules

AN EVALUATION APPROACH FOR THE PROGRAM OF ASSOCIATION RULES ALGORITHM BASED ON METAMORPHIC RELATIONS

中科院数据挖掘刘莹数据挖掘第二次作业

The analysis and improvem ent of Apriori algorithm

最新资源