没有合适的资源？快使用搜索试试~ 我知道了~

文库首页大数据数据挖掘数据挖掘大作业答案

数据挖掘大作业答案

Data

mining

需积分: 19 16 下载量 193 浏览量 2013-11-24 15:03:41 上传评论收藏 585KB PDF 举报

温馨提示

试读

15页

数据挖掘大作业答案

资源推荐

资源详情

资源评论

Part I: 书面作业

1. (Data Preprocessing) Data transformation

Normalization: scaled to fall within a small, specified range

1) Problems

Suppose that the data for analysis includes the attribute age. The age values for

the data tuples are(in increasing order): 13,15,16,16,19,20,20,21,

22,22,25,25,25,25,30,33,33,35,35,35,35,36,40,45,46,52,70.

2) Answers

v = 35

(a) min-max normalization: to [ new_min(A), new_max(A)]

′

v  min(A)

max

󰇛

󰇜

 min(A)

(new_max(A)  new_min(A)) + new_min(A)

min

󰇛

󰇜

= 13, max

󰇛

󰇜

= 70

new_min(A) = 0.0, new_max(A) = 1.0

′

35  13

70  13

󰇛

1.0  0.0

󰇜

+ 0.0 = 0.386

(b) z-score normalization( μ :mean,σ :standard deviation )

′

v μ (A)

σ (A)

󰇛

󰇜

809

= 29.96

󰇛

󰇜

= 12.94

′

35 29.96

12.94

= 0.389

′

Where j is the smallest integer such that Max(|v’|)<1

′

= 0.35, j = 2

(d) Comment on which method you would prefer to use for the given data, giving

reason as to why.

I prefer to use the normalization by decimal scaling.

For min-max normalization, if new data inserted are out of the range of attribute

age, then “out of bounds” error will occurs.

For z-score normalization, additional calculations and storage have to be taken in

these two parameters( mean and standard deviation) .

For normalization by decimal scaling, the values of attribute age are basically no

more than two digits, thus j can be unified to be 2.

2. Mining association rules

1) Problems

A database has four transactions. Let min_sup=60% and min_conf=80%.

(a)

At the granularity of item_category(eg.item;could be “milk”),for the following

rule template:

1 2 3

, ( , ) ( , ) ( , ) [ , ]X transaction buys X item buys X item buys X item s c   

List the frequent k-itemset for the largest k and all of the strong association

rules( with their support s and confidence c ) containing the frequent k-itemset for the

largest k.

(b)

At the granularity of brand-item_category(e.g. item; could be “sunset-milk”), for

the following rule temple:

1 2 3

, ( , ) ( , ) ( , )X customer buys X item buys X item buys X item   

List the frequent k-itemset for the largest k. Note: do not print any rules.

2) Answers

(a)

Item collection:

X =

󰇝

x1,  , xm

󰇞

󰇝

Crab, Milk, Cheese, Bread,Apple, Pie

󰇞

, m = 6

Frequent Itemsets( with Apriori Algorithm ):

L =

󰇝

Milk: 4, Cheese: 3, Bread: 4, Milk  Cheese: 3, Milk

 Bread: 4, Cheese  Bread: 3, Milk  Cheese  Bread: 3

󰇞

support(Milk)=4/4=100%,support(Bread)=4/4=100%,

support(Cheese)=3/4=75%,

support(Milk∪Cheese)=3/4=75%,

support(Milk∪Bread)=4/4=100%,

support(Cheese∪Bread)=3/4=75%,

support(Milk∪Cheese∪Bread0)=3/4=75%

Frequent k-itemset for the largest k( min_sup=60%):

K=3

L =

󰇝

Milk  Cheese  Bread: 3

󰇞

[s =

= 75%]

Association rules:

Confidence

󰇛

A => 

󰇜

= P

󰇛



󰇜

P(A  B)

P(A)

count(A  B)

count(A)

milk cheese bread 3/3 100%confidence   

cheese bread milk 3/3 100%confidence   

milk bread cheese 3/ 4 75%confidence   

milk bread cheese 3/ 4 75%confidence   

cheese milk bread 3/3 100%confidence   

bread milk cheese 3/ 4 75%confidence   

Strong association rules( min_conf=80% ):

剩余14页未读，继续阅读

评论收藏

内容反馈

资源评论

资源反馈

评论星级较低，若资源使用遇到问题可联系上传者，3个工作日内问题未解决可申请退款~

jarrywong

粉丝: 0
资源: 1

上传资源快速赚钱

我的内容管理展开

我的资源快来上传第一个资源

我的收益

登录查看自己的收益

我的积分登录查看自己的积分

我的C币登录后查看C币余额

我的收藏

我的下载

下载帮助

前往需求广场，查看用户热搜

数据挖掘大作业答案

数据挖掘大作业

数据挖掘作业

数据挖掘大作业数据集

数据挖掘题目4大作业

大工20秋《数据挖掘》大作业题目及要求.pdf

Python与数据挖掘期末大作业1

吉林大学数据挖掘大作业2020.zip

数据挖掘聚类分析大作业

西电数据挖掘作业——网页聚类算法python实现

期末数据挖掘作业

数据挖掘期末作业

PhraseAnalysis:数据仓库与数据挖掘 大作业 -- 频繁模式挖掘

西电数据挖掘大作业之商场数据分析

南理工数据挖掘大作业（网站评级SyskillWebert），包含代码和实验报告

小麦种子数据集-数据集

北京大学计算方法B作业答案集锦

西电数据挖掘作业——医院数据处理

数据挖掘大作业代码及结果_数据挖掘_matlab

国科大_网络数据挖掘大作业_2017_垃圾短信分类

国科大数据挖掘大作业2018交通拥堵预测

小麦种子数据集excel格式

国科大数据挖掘刘莹第一次作业

数据挖掘课程设计30篇

《大数据分析与挖掘》课后习题答案（部分）.pdf

数据挖掘大作业1

西电数据挖掘大作业之电影评级数据分析

Python基于机器学习实现的股票价格预测、股票预测源码+数据集，机器学习大作业

身份证前6位对应的省市区代码（超详细）

基于在线教学平台的数据挖掘与学习行为分析超星集团数据集

最新资源

PhraseAnalysis:数据仓库与数据挖掘大作业 -- 频繁模式挖掘