CS229 Problem Set #3 Solutions 1
CS 229, Public Course
Problem Set #3 Solutions: Learning Theory and
Unsupervised Learning
1. Uniform convergence and Model Selection
In this problem, we will prove a bound on the error of a simple model selection procedure.
Let there be a binary classification problem with labels y ∈ {0, 1}, and let H
1
⊆ H
2
⊆
. . . ⊆ H
k
be k different finite hypothesis classes (|H
i
| < ∞). Given a dataset S of m iid
training examples, we will divide it into a training set S
train
consisting of the first (1− β)m
examples, and a hold-out cross validation set S
cv
consisting of the remaining βm examples.
Here, β ∈ (0, 1).
Let
ˆ
h
i
= arg min
h∈H
i
ˆε
S
train
(h) be the hypothesis in H
i
with the lowest training error
(on S
train
). Thus,
ˆ
h
i
would be the hypothesis returned by training (with empirical risk
minimization) using hypothesis class H
i
and dataset S
train
. Also let h
⋆
i
= arg min
h∈H
i
ε(h)
be the hypothesis in H
i
with the lowest generalization error.
Suppose that our algorithm first finds all the
ˆ
h
i
’s using empirical risk minimization then
uses the hold-out cross validation set to select a hypothesis from this the {
ˆ
h
1
, . . . ,
ˆ
h
k
} with
minimum training error. That is, the algorithm will output
ˆ
h = arg min
h∈{
ˆ
h
1
,...,
ˆ
h
k
}
ˆε
S
cv
(h).
For this question you will prove the following bound. Let any δ > 0 be fixed. Then with
probability at least 1 − δ, we have that
ε(
ˆ
h) ≤ min
i=1,...,k
ε(h
∗
i
) +
s
2
(1 − β)m
log
4|H
i
|
δ
!
+
s
2
2βm
log
4k
δ
(a) Prove that with probability at least 1 −
δ
2
, for all
ˆ
h
i
,
|ε(
ˆ
h
i
) − ˆε
S
cv
(
ˆ
h
i
)| ≤
s
1
2βm
log
4k
δ
.
Answer: For each
ˆ
h
i
, the empirical error on the cross-validation set, ˆε(
ˆ
h
i
) represents
the average of βm random variables with mean ε(
ˆ
h
i
), so by the Hoeffding inequality for
any
ˆ
h
i
,
P (|ε(
ˆ
h
i
) − ˆε
S
cv
(
ˆ
h
i
)| ≥ γ) ≤ 2 exp(−2γ
2
βm).
As in the class notes, to insure that this holds for all
ˆ
h
i
, we need to take the union over
all k of the
ˆ
h
i
’s.
P (∃i, s.t.|ε(
ˆ
h
i
) − ˆε
S
cv
(
ˆ
h
i
)| ≥ γ) ≤ 2k exp(−2γ
2
βm).
- 1
- 2
前往页