et al., 2015
]
, additive functions were considered, i.e., the
function value f (x) is the sum of several sub-functions with
smaller dimensions, and there is no variable overlaps be-
tween any two sub-functions. In
[
Kandasamy et al., 2015
]
,
via employing a Bayesian optimization method, it was shown
that using the additive structure can effectively accelerate the
Bayesian optimization method. In
[
Friesen and Domingos,
2015
]
, a recursive decomposition method was proposed for
approximately locally decomposable problems. These meth-
ods, however, rely on the (mostly axis-parallel) decompos-
ability, which may restrict their applications.
Embedding methods assume that in the high-dimensional
space, only a small subspace effects the function value.
Therefore, optimization only in the effective subspace can
save a lot of efforts. In
[
Carpentier and Munos, 2012
]
, com-
pressed sensing was employed to deal with linear bandit prob-
lems with low-dimensional effective subspaces. In
[
Chen
et al., 2012
]
, a variable selection method was proposed to
identify the effective axis-parallel subspace. In
[
Djolonga
et al., 2013
]
, a low-rank matrix recovery technique was em-
ployed to learn the effective subspace. In
[
Wang et al., 2013;
Qian and Yu, 2016
]
, the random embedding based on the ran-
dom matrix theory was employed to identify the underlying
linear effective subspace. However, real-world problems may
not have a clear effective subspace, also it is hard to verify the
existence of the effective subspace.
Our Contributions. In this paper, we study high-
dimensional problems with low optimal "-effective dimen-
sions (see Definition 1). In these problems, any (linear trans-
formed) variable is allowed to effect the function value, how-
ever, only a small linear subspace has a large impact on
the function value, and the orthogonal complement subspace
makes only a small bounded effect.
Firstly, we characterize the property of random embedding
for this kind of problems. We find that, given optimal "-
effective dimension, single random embedding bears 2" em-
bedding gap. Note that this embedding gap cannot be com-
pensated by the optimization algorithm.
We then propose the sequential random embeddings (SRE)
to overcome the embedding gap. SRE applies the random
embedding several times sequentially, and in each subspace,
SRE employs an optimization algorithm to reduce the residue
of the previous solution. Therefore, SRE can also be viewed
as a combination of decomposition and embedding, as each
random embedding defines a sub-problem. We also disclose
the condition under which SRE could improve the optimiza-
tion quality for a large class of problems.
In experiments, we apply SRE to several state-of-the-art
derivative-free optimization methods, and conduct experi-
ments on synthetic functions as well as classification tasks
using the non-convex Ramp loss. Experiment results show
that SRE can significantly improve the performance of the
optimization methods in high-dimensional problems. More-
over, comparing with previous related studies where testing
functions are mostly up to 1,000 variables, the derivative-free
methods with SRE are tested for up to 100,000 variables, in
real-world data sets.
The consequent sections respectively introduce the optimal
"-effective dimension problems and present the property of
random embedding, describe the proposed SRE technique as
well as its theoretical property, present the empirical results,
and finally conclude the paper.
2 Optimal "-Effective Dimension and
Random Embedding
Optimal "-Effective Dimension
Effective dimension defined in
[
Wang et al., 2013
]
requires
the existence of a non-effective linear subspace, which has
exactly zero effect on the function value. It is often unrealis-
tic to make such an assumption. We thus make a relaxation to
this assumption as the optimal "-effective dimension in Defi-
nition 1.
Note that a function with optimal "-effective dimension can
have no low-dimensional effective subspace according to the
definition given in
[
Wang et al., 2013
]
, i.e., no linear subspace
that has exactly zero effect on the function value. Instead, it
has a linear subspace that makes at most " contribution to the
function value. Therefore, this kind of problems may still be
efficiently tackled when " is not so large.
DEFINITION 1 (Optimal "-Effective Dimension)
For any ">0, a function f : R
D
! R is said to have
an "-effective subspace V
"
, if there exists a linear subspace
V
"
✓ R
D
s.t. for all x 2 R
D
, we have |f(x) f (x
"
)|",
where x
"
2V
"
is the orthogonal projection of x onto V
"
.
Let V
"
denote the collection of all the "-effective subspaces
of f, and dim(V) denote the dimension of a linear subspace
V. We define the optimal "-effective dimension of f as
d
"
=min
V
"
2V
"
dim(V
"
).
In the definition above, it should be noted that " and d
"
are related variables, commonly, a small d
"
implies a large ",
while a small " implies a large d
"
.
Random Embedding
Given the definition of optimal "-effective dimension,
Lemma 1 below shows the effect of random embedding for
such functions. For simplicity, let N denote the Gaussian
distribution with zero mean and
2
variance.
LEMMA 1
Given a function f : R
D
! R with optimal "-effective di-
mension d
"
, and any random matrix A 2 R
D⇥d
(d d
"
)
with independent entries sampled from N , then, with prob-
ability 1, for any x 2 R
D
, there exists y 2 R
d
s.t.
|f(x) f(Ay)|2".
Proof. We borrow the idea of constructing such y as in
[
Wang et al., 2013
]
. Since f has the optimal "-effective di-
mension d
"
, there exists an "-effective subspace V
"
✓ R
D
s.t. dim(V
"
)=d
"
. Besides, any x 2 R
D
can be decom-
posed as x = x
"
+ x
?
"
, where x
"
2V
"
, x
?
"
2V
?
"
and V
?
"
is the orthogonal complement of V
"
. By the definition of "-
effective subspace, we have |f(x) f(x
"
)|". Hence, it
suffices to show that, for any x
"
2V
"
, there exists y 2 R
d
s.t. |f(x
"
) f(Ay)|".
Let 2 R
D⇥d
"
be a matrix whose columns form a stan-
dard orthonormal basis for V
"
. For any x
"
2V
"
, there ex-
ists c 2 R
d
"
s.t. x
"
= c. Let us for now assume that