DATA7703, Assignment 2
2022 Semester 2, due 5pm 7 Oct
Instructions.
(a) Submit your solutions as a single PDF file on Blackboard. Go to Assessment, Assign-
ment 2 to submit. If you don’t know how to convert your file to a PDF, please search
for a guide online. You can submit as many times as you want before the deadline.
The last submission will be graded.
(b) Write down your name and student number on the first page of your solution
report, and write down the question numbers for your solutions. For programming
questions, you are welcome to submit your code files or output files in a separate zip
file, but you must include both your code and relevant output in your submitted PDF
file. Excessive code output may be penalised.
(c) Follow integrity rules, and provide citations as needed. You can discuss with your
classmates, but you are required to write your solutions independently, and specify
who you have discussed with in your solution. If you do not know how to solve a
problem, you can get 15% of the mark by writing down “I don’t know”.
You are encouraged to keep your solutions concise — these questions require thoughts, not
long answers.
1. (20 marks) This question concerns some theoretical aspects of ensemble methods.
(a) (5 marks) Consider a problem with a single real-valued feature 𝑥. For any 𝑎 < 𝑏,
consider the threshold classifiers or decision stumps 𝑐
1
(𝑥) = 𝐼(𝑥 > 𝑎), 𝑐
2
(𝑥) = 𝐼(𝑥 <
𝑏), and 𝑐
3
(𝑥) = 𝐼(𝑥 < +∞), where the indicator function 𝐼(·) takes value +1 if its
argument is true, and -1 otherwise.
What is the set of real numbers classified as positive by 𝑓(𝑥) = 𝐼(0.1𝑐
3
(𝑥) − 𝑐
1
(𝑥) −
𝑐
2
(𝑥) > 0)? If 𝑓(𝑥) a threshold classifier? Justify your answer.
(b) (5 marks) Explain why OOB error is a preferred generalization performance measure
for bagging as compared to the generalization performance measures estimated using
the validation set method and cross-validation.
(c) (10 marks) Bob is a very creative data scientist. He proposes a variant of the standard
bagging algorithm, called Wagging (Weighted Aggregating), and claims that it works
better than standard bagging.
Wagging is used for regression. As in bagging, Wagging first trains a certain number
of 𝑚 models 𝑓
1
, . . . , 𝑓
𝑚
on 𝑚 bootstrap samples. Unlike bagging, Wagging assigns
weights 𝑤
1
=
1
2
, 𝑤
2
=
1
2
2
, . . . , 𝑤
𝑚−1
=
1
2
𝑚−1
, 𝑤
𝑚
=
1
2
𝑚−1
to the models. If 𝑌
𝑖
is the
prediction of 𝑓
𝑖
, then Wagging predicts
¯
𝑌 =
𝑖
𝑤
𝑖
𝑌
𝑖
.
We assume that 𝑌
1
, . . . , 𝑌
𝑚
are identically distributed with Var(𝑌
𝑖
) = 𝜎
2
for all 𝑖,
and cov(𝑌
𝑖
, 𝑌
𝑗
) = 𝜌𝜎
2
for all 1 ≤ 𝑖 = 𝑗 ≤ 𝑚.
1
评论0
最新资源