15 September 2022 Data Mining: Concepts and Techniques 1
Chapter 5: Mining Frequent Patterns,
Association and Correlations
Basic concepts and a road map
Efficient and scalable frequent itemset mining
methods
Constraint-based association mining
Summary
15 September 2022 Data Mining: Concepts and Techniques 2
What Is Frequent Pattern Analysis?
Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of frequent itemsets and association rule mining
Motivation: Finding inherent regularities in data
What products were often purchased together?— Beer and diapers?!
What are the subsequent purchases after buying a PC?
What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
Applications
Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
15 September 2022 Data Mining: Concepts and Techniques 3
Why Is Freq. Pattern Mining Important?
Discloses an intrinsic and important property of data sets
Forms the foundation for many essential data mining tasks
Association, correlation, and causality analysis
Sequential, structural (e.g., sub-graph) patterns
Pattern analysis in spatiotemporal, multimedia, time-
series, and stream data
Classification: associative classification
Cluster analysis: frequent pattern-based clustering
Data warehousing: iceberg cube and cube-gradient
Semantic data compression: fascicles
Broad applications
15 September 2022 Data Mining: Concepts and Techniques 4
Basic Concepts: Frequent Patterns and
Association Rules
Itemset X = {x
1
, …, x
k
}
Find all the rules
X
Y
with minimum
support and confidence
support,
s
, probability that a
transaction contains X � Y
confidence,
c,
conditional
probability that a transaction
having X also contains
Y
Let sup
min
= 50%, conf
min
= 50%
Freq. Pat.:
{
A:3, B:3, D:4, E:3, AD:3
}
Association rules:
A
D
(60%, 100%)
D
A
(60%, 75%)
Customer
buys diaper
Customer
buys both
Customer
buys beer
Transaction-id Items bought
10 A, B, D
20 A, C, D
30 A, D, E
40 B, E, F
50 B, C, D, E, F
15 September 2022 Data Mining: Concepts and Techniques 5
Closed Patterns and Max-Patterns
A long pattern contains a combinatorial number of sub-
patterns, e.g., {a
1
, …, a
100
} contains (
100
1
) + (
100
2
) + … +
(
1
1
0
0
0
0
) = 2
100
– 1 = 1.27*10
30
sub-patterns!
Solution:
Mine closed patterns and max-patterns instead
An itemset X is closed if X is
frequent
and there exists
no
super-pattern
Y כ X,
with the same support
as X
(proposed by Pasquier, et al. @ ICDT’99)
An itemset X is a max-pattern if X is frequent and there
exists no frequent super-pattern Y כ X (proposed by
Bayardo @ SIGMOD’98)
Closed pattern is a lossless compression of freq. patterns
Reducing the # of patterns and rules
评论0