Assignment.rar资源-CSDN文库

共7个文件

pdf：3个

csv：2个

xlsx：1个

需积分: 5 55 浏览量 2022-11-30 20:45:15 上传评论收藏 472KB RAR 举报

资源详情

资源评论

收起资源包目录

Assignment.rar （7个子文件）

assignment1.pdf 103KB

logistic_regression.py 5KB

DATA7703_Group_Project_22.pdf 110KB

penguins_size.csv 14KB

Snakes.xlsx 108KB

assignment2.pdf 164KB

reg2d.csv 2KB

DATA7703, Assignment 2

2022 Semester 2, due 5pm 7 Oct

Instructions.

(a) Submit your solutions as a single PDF ﬁle on Blackboard. Go to Assessment, Assign-

ment 2 to submit. If you don’t know how to convert your ﬁle to a PDF, please search

for a guide online. You can submit as many times as you want before the deadline.

The last submission will be graded.

(b) Write down your name and student number on the ﬁrst page of your solution

report, and write down the question numbers for your solutions. For programming

questions, you are welcome to submit your code ﬁles or output ﬁles in a separate zip

ﬁle, but you must include both your code and relevant output in your submitted PDF

ﬁle. Excessive code output may be penalised.

classmates, but you are required to write your solutions independently, and specify

who you have discussed with in your solution. If you do not know how to solve a

problem, you can get 15% of the mark by writing down “I don’t know”.

You are encouraged to keep your solutions concise — these questions require thoughts, not

long answers.

1. (20 marks) This question concerns some theoretical aspects of ensemble methods.

(a) (5 marks) Consider a problem with a single real-valued feature 𝑥. For any 𝑎 < 𝑏,

consider the threshold classiﬁers or decision stumps 𝑐

(𝑥) = 𝐼(𝑥 > 𝑎), 𝑐

(𝑥) = 𝐼(𝑥 <

𝑏), and 𝑐

(𝑥) = 𝐼(𝑥 < +∞), where the indicator function 𝐼(·) takes value +1 if its

argument is true, and -1 otherwise.

What is the set of real numbers classiﬁed as positive by 𝑓(𝑥) = 𝐼(0.1𝑐

(𝑥) − 𝑐

(𝑥) −

𝑐

(𝑥) > 0)? If 𝑓(𝑥) a threshold classiﬁer? Justify your answer.

(b) (5 marks) Explain why OOB error is a preferred generalization performance measure

for bagging as compared to the generalization performance measures estimated using

the validation set method and cross-validation.

bagging algorithm, called Wagging (Weighted Aggregating), and claims that it works

better than standard bagging.

Wagging is used for regression. As in bagging, Wagging ﬁrst trains a certain number

of 𝑚 models 𝑓

, . . . , 𝑓

𝑚

on 𝑚 bootstrap samples. Unlike bagging, Wagging assigns

weights 𝑤

, 𝑤

, . . . , 𝑤

𝑚−1

, 𝑤

𝑚

𝑚−1

to the models. If 𝑌

𝑖

is the

prediction of 𝑓

𝑖

, then Wagging predicts

𝑌 =



𝑖

𝑤

𝑖

𝑌

𝑖

We assume that 𝑌

, . . . , 𝑌

𝑚

are identically distributed with Var(𝑌

𝑖

) = 𝜎

for all 𝑖,

and cov(𝑌

𝑖

, 𝑌

𝑗

) = 𝜌𝜎

for all 1 ≤ 𝑖 = 𝑗 ≤ 𝑚.

i. (5 marks) Bob claims that Wagging has a smaller bias than bagging. True or

false? Justify your answer.

ii. (5 marks) Bob also claims that Wagging has a smaller variance than bagging.

True or false? Justify your answer.

Hint: show that Var(



𝑚

𝑖=1

𝑤

𝑖

𝑌

𝑖

) =



𝑚

𝑖=1

𝑤

𝑖

(1− 𝜌)𝜎

+𝜌𝜎

for any 𝑤

𝑖

’s such that



𝑚

𝑖=1

𝑤

𝑖

= 1.

2. (30 marks) In this question, you will perform some experiments to examine the eﬀect

of the hyperparameter 𝑚 used in random forest. You will investigate how 𝑚 aﬀects the

correlation between the trees and the generalization performance of random forests.

Recall that 𝑚 is the number of random features used in choosing the splitting point in the

decision trees. When constructing a decision tree in a random forest, at each node, instead

of choosing the best split from all 𝑑 given features, we can ﬁrst choose 1 ≤ 𝑚 ≤ 𝑑 features,

and then choose the best split among them. This randomization trick decorrelates the

trees and makes the generalization performance of random forests better than bagging

with decision trees.

(a) (5 marks) Load the California housing dataset provided in sklearn.datasets, and

construct a random 70/30 train-test split. Set the random seed to a number of your

choice to make the split reproducible. What is the value of 𝑑 here?

(b) (5 marks) Train a random forest of 100 decision trees using default hyperparameters.

Report the training and test MSEs. What is the value of 𝑚 used?

test set predictions of all pairs of distinct trees. Report the average of all these

pairwise correlations.

You can retrieve all the trees in a RandomForestRegressor object using the estimators_

attribute.

(d) (5 marks) Repeat (b) and (c) for 𝑚 = 1 to 𝑑. Produce a table containing the training

and test MSEs, and the average correlations for all 𝑚 values. In addition, plot the

training and test MSEs against 𝑚 in a single ﬁgure, and plot the average correlation

against 𝑚 in another ﬁgure.

(e) (5 marks) Describe how the average correlation changes as 𝑚 increases. Explain the

observed pattern.

(f) (5 marks) A data scientist claims that we should choose 𝑚 such that the average

correlation is smallest, because it gives us maximum reduction in the variance, thus

maximum reduction in the expected prediction error. True or false? Justify your

answer.

3. (15 marks) In lecture, we discussed the architecture and representational power of neu-

ral nets, some training objectives and training algorithms. These are important design

decisions when building your own neural nets. In this question, you will explore some

questions on neural network learning.

评论收藏

内容反馈

Assignment.rar

评论0

最新资源

Assignment.rar

评论0

最新资源

相关推荐

Python Assignment.rar

for_loop_assignment.rar_assignment

assignment.rar_assignment

Laboratory-Assignment.rar_assignment_stripline

os-assignment.rar_assignment

assignment.rar_assignment_truss

JAVA-LAB-ASSIGNMENT.rar_assignment

arguments-read-and-assignment.rar_assignment

Image-Processing-Assignment.rar_assignment_stretching

hata_model_assignment.rar_assignment_hata

wvent_instantiation_assignment.rar_android开发_assignment

Assignment.rar_assignment_matlab_定位_物联网_生成网络

ASSIGNMENT.rar_8psk_8psk误码率_assignment_误码率

Assignment.rar_roadn53_分岔图_动力学图_斑图_混沌 动力

capacity-limit-traffic-assignment.rar_交通_交通 容量_交通分配 matlab_交通流_交

Assignment 2.rar

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

李飞飞自传 我看见的世界 The World I see

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

农村公交与异构无人机协同配送优化

4个亲测好用的ChatGPT4渠道

学术海报模板+论文科研+研究生

北森能力测评题库.zip

2023泛娱乐社交出海手册-ZEGO即构科技

车载毫米波雷达DOA估计综述博文仿真代码

Assignment.rar_roadn53_分岔图_动力学图_斑图_混沌动力

capacity-limit-traffic-assignment.rar_交通_交通容量_交通分配 matlab_交通流_交

李飞飞自传我看见的世界 The World I see