没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Part I: 书面作业
1. (Data Preprocessing) Data transformation
Normalization: scaled to fall within a small, specified range
1) Problems
Suppose that the data for analysis includes the attribute age. The age values for
the data tuples are(in increasing order): 13,15,16,16,19,20,20,21,
22,22,25,25,25,25,30,33,33,35,35,35,35,36,40,45,46,52,70.
2) Answers
v = 35
(a) min-max normalization: to [ new_min(A), new_max(A)]
v
′
=
v min(A)
max
A
min(A)
(new_max(A) new_min(A)) + new_min(A)
min
A
= 13, max
A
= 70
new_min(A) = 0.0, new_max(A) = 1.0
v
′
=
35 13
70 13
1.0 0.0
+ 0.0 = 0.386
(b) z-score normalization( μ :mean,σ :standard deviation )
v
′
=
v μ (A)
σ (A)
μ
A
=
809
27
= 29.96
σ
A
= 12.94
v
′
=
35 29.96
12.94
= 0.389
(c) normalization by decimal scaling
v
′
=
v
10
j
Where j is the smallest integer such that Max(|v’|)<1
v
′
=
35
10
2
= 0.35, j = 2
(d) Comment on which method you would prefer to use for the given data, giving
reason as to why.
I prefer to use the normalization by decimal scaling.
For min-max normalization, if new data inserted are out of the range of attribute
age, then “out of bounds” error will occurs.
For z-score normalization, additional calculations and storage have to be taken in
these two parameters( mean and standard deviation) .
For normalization by decimal scaling, the values of attribute age are basically no
more than two digits, thus j can be unified to be 2.
2. Mining association rules
1) Problems
A database has four transactions. Let min_sup=60% and min_conf=80%.
(a)
At the granularity of item_category(eg.item;could be “milk”),for the following
rule template:
1 2 3
, ( , ) ( , ) ( , ) [ , ]X transaction buys X item buys X item buys X item s c
List the frequent k-itemset for the largest k and all of the strong association
rules( with their support s and confidence c ) containing the frequent k-itemset for the
largest k.
(b)
At the granularity of brand-item_category(e.g. item; could be “sunset-milk”), for
the following rule temple:
1 2 3
, ( , ) ( , ) ( , )X customer buys X item buys X item buys X item
List the frequent k-itemset for the largest k. Note: do not print any rules.
2) Answers
(a)
Item collection:
X =
x1, , xm
=
Crab, Milk, Cheese, Bread,Apple, Pie
, m = 6
Frequent Itemsets( with Apriori Algorithm ):
L =
Milk: 4, Cheese: 3, Bread: 4, Milk Cheese: 3, Milk
Bread: 4, Cheese Bread: 3, Milk Cheese Bread: 3
support(Milk)=4/4=100%,support(Bread)=4/4=100%,
support(Cheese)=3/4=75%,
support(Milk∪Cheese)=3/4=75%,
support(Milk∪Bread)=4/4=100%,
support(Cheese∪Bread)=3/4=75%,
support(Milk∪Cheese∪Bread0)=3/4=75%
Frequent k-itemset for the largest k( min_sup=60%):
K=3
L =
Milk Cheese Bread: 3
[s =
3
4
= 75%]
Association rules:
Confidence
A =>
= P
A
B
=
P(A B)
P(A)
=
count(A B)
count(A)
milk cheese bread 3/3 100%confidence
cheese bread milk 3/3 100%confidence
milk bread cheese 3/ 4 75%confidence
milk bread cheese 3/ 4 75%confidence
cheese milk bread 3/3 100%confidence
bread milk cheese 3/ 4 75%confidence
Strong association rules( min_conf=80% ):
剩余14页未读,继续阅读
资源评论
jarrywong
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功