通过顺序随机嵌入对高维非凸函数进行无导数优化资源-CSDN文库

36 浏览量 2021-03-06 03:54:14 上传评论收藏 1.17MB PDF 举报

无导数优化方法适用于复杂的优化问题，但难以扩展到高维度（例如，大于1,000）。以前，随机嵌入技术已被证明成功地解决了具有低有效尺寸的高尺寸问题。但是，在许多应用中假设有效尺寸较低是不现实的。本文转向研究具有较低的最佳“有效”维数的高维问题，该维数使所有维数都有效，但其中许多只具有有限的有限效应。我们对此类问题的随机嵌入性质进行了刻画，并提出了建议顺序随机嵌入（SRE）以减少嵌入间隙，同时在低维空间中运行优化算法。我们将SRE应用于几种最新的无导数优化方法，并对合成函数以及具有100,000个变量的非凸分类任务，实验结果验证了SRE的有效性。 ### 通过顺序随机嵌入对高维非凸函数进行无导数优化 #### 摘要与背景本文探讨了一种新型的优化方法——顺序随机嵌入（Sequential Random Embeddings, SRE），它旨在解决高维度（如超过1,000维）的非凸优化问题。传统的无导数优化方法虽然适用于复杂的优化问题，但在高维空间中的应用却面临挑战，主要是由于计算复杂度和维度灾难的问题。 #### 随机嵌入技术概述在解决高维问题时，随机嵌入技术被证明是非常有效的。这种技术通过将高维数据投影到低维空间来降低问题的复杂度，从而使得原本在高维空间中难以处理的问题变得可行。然而，现有的随机嵌入技术通常假设目标问题的有效维度较低，即只有少数几个维度对于问题的结果有显著影响，而其他维度的影响可以忽略不计。但在实际应用中，这样的假设往往过于理想化。 #### 顺序随机嵌入（SRE）为了解决上述问题，本文提出了一种新的方法——顺序随机嵌入（SRE）。SRE 的核心思想是在考虑所有维度均为有效的情况下，通过逐步增加嵌入维度的方式减少随机嵌入带来的误差，同时保持优化算法在低维空间中的高效运行。 - **SRE 的原理**：SRE 方法首先对原始高维数据进行随机投影，得到一个初始的低维表示。随后，根据优化过程中反馈的信息动态调整嵌入维度，逐渐增加维度直到达到某个预设的标准或者满足收敛条件。 - **SRE 的优势**：这种方法能够在保留高维数据特征的同时，有效地降低计算成本。此外，SRE 还能够适应不同类型的优化算法，提高它们在处理高维非凸问题时的表现。 #### 应用实例为了验证 SRE 方法的有效性，作者们将其应用于多种最先进的无导数优化方法中，并在不同的测试场景下进行了实验。 - **合成函数**：在一组预先设计的合成函数上进行了测试，这些函数通常具有高维度和非凸特性，适合用于评估优化算法的性能。 - **非凸分类任务**：此外，还对一个包含高达 100,000 个变量的非凸分类任务进行了测试。这类任务在机器学习领域非常常见，能够更真实地反映 SRE 方法的实际应用场景。 #### 实验结果分析实验结果显示，无论是在合成函数还是在实际的非凸分类任务中，采用 SRE 方法都能够显著提高优化过程的效率和准确性。特别是在处理极高维度的数据时，SRE 方法表现出了明显的优势，能够有效地克服维度灾难带来的问题。 #### 结论本文提出的顺序随机嵌入方法为解决高维非凸优化问题提供了一个新的视角。通过逐步增加嵌入维度的方式，SRE 能够在保持优化算法高效运行的同时，减少由随机嵌入引入的误差。这一方法不仅理论上有创新之处，而且在实践应用中也展现出了良好的效果。未来的研究方向可能包括进一步改进 SRE 方法以适应更多类型的问题，以及探索其在其他领域的潜在应用。

资源推荐

资源详情

资源评论

Derivative-Free Optimization of High-Dimensional

Non-Convex Functions by Sequential Random Embeddings

⇤

Hong Qian, Yi-Qi Hu, and Yang Yu

National Key Laboratory for Novel Software Technology,

Nanjing University, Nanjing 210023, China

{qianh,huyq,yuy}@lamda.nju.edu.cn

Abstract

Derivative-free optimization methods are suitable

for sophisticated optimization problems, while are

hard to scale to high dimensionality (e.g., larger

than 1,000). Previously, the random embedding

technique has been shown successful for solving

high-dimensional problems with low effective di-

mensions. However, it is unrealistic to assume a

low effective dimension in many applications. This

paper turns to study high-dimensional problems

with low optimal "-effective dimensions, which al-

low all dimensions to be effective but many of them

only have a small bounded effect. We character-

ize the properties of random embedding for this

kind of problems, and propose the sequential ran-

dom embeddings (SRE) to reduce the embedding

gap while running optimization algorithms in the

low-dimensional spaces. We apply SRE to several

state-of-the-art derivative-free optimization meth-

ods, and conduct experiments on synthetic func-

tions as well as non-convex classiﬁcation tasks with

up to 100,000 variables. Experiment results verify

the effectiveness of SRE.

1 Introduction

Solving sophisticated optimization problems plays an impor-

tant role in artiﬁcial intelligence. Let f : R

! R be a func-

tion of which we assume that a global minimizer x

⇤

always

exists. An optimization problem can be formally written as

⇤

= argmin

x2R

f(x).

We assume that the optimization problems discussed in this

paper are deterministic, i.e., every call of f(x) returns the

same value for the same x.

In this paper, we focus on derivative-free optimization

methods, which regard f as a black-box function that can

only be evaluated point-wisely, i.e., they perform optimiza-

tion based on the function values f(x) for the sampled solu-

tions and other information like gradient is not used. Because

⇤

This research was supported by the NSFC (61375061,

61223003), Foundation for the Author of National Excellent Doc-

toral Dissertation of China (201451), and 2015 Microsoft Research

Asia Collaborative Research Program.

these methods do not rely on derivatives, they are suitable for

optimization problems that are, e.g., with many local optima,

non-differentiable, and discontinuous, which are often en-

countered in a wide range of applications. The performance

of a derivative-free optimization algorithm can be evaluated

by the simple regret

[

Bubeck et al., 2009

]

, i.e., given n func-

tion evaluations, for minimization,

= f(x(n))  min

x2R

f(x),

where x(n) 2 R

is the solution with the lowest function

value found by the algorithm when the budget of n function

evaluations is used up. The simple regret measures the differ-

ence between the best function value found by the algorithm

and the minimum of f.

Many derivative-free optimization methods have been de-

signed under various principles. They can be roughly cate-

gorized into three kinds: model-based methods, determinis-

tic Lipschitz optimization methods and meta-heuristic search.

Model-based methods, such as Bayesian optimization meth-

ods

[

Brochu et al., 2010; Snoek et al., 2012; Kawaguchi et

al., 2015

]

and classiﬁcation-based methods

[

Yu et al., 2016

]

learn a model from the solutions and the model is then applied

to guide sampling of solutions for the next round. Determinis-

tic Lipschitz optimization methods need Lipschitz continuity

assumption on f, such as

[

Jones et al., 1993; Pint

er, 1996;

Bubeck et al., 2011; Munos, 2014

]

. Meta-heuristic search is

designed with inspired heuristics, such as evolutionary strate-

gies

[

Hansen et al., 2003

]

Problem. Almost all derivative-free methods are effective

and efﬁcient in low-dimensional problems (usually less than

100 dimensions), but are hard to scale to high dimensionality

(say, larger than 1,000 dimensions). This is mainly due to ei-

ther the low convergence rate in high-dimensional space, thus

unbearably many iterations are required; or the per-iteration

computational cost is very high in high-dimensional space,

thus it is unbearable for ﬁnishing a few iterations; or even

both of the reasons. The unsatisfactory scalability is one of

the main bottlenecks of these methods.

Related Work. Recently, there are some studies focusing

on improving the scalability of derivative-free methods. The

two major directions are decomposition and embedding.

Decomposition methods extract sub-problems from the

original optimization problem, and by solving the sub-

problems the original problem will be solved. In

[

Kandasamy

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

1946

et al., 2015

]

, additive functions were considered, i.e., the

function value f (x) is the sum of several sub-functions with

smaller dimensions, and there is no variable overlaps be-

tween any two sub-functions. In

[

Kandasamy et al., 2015

]

via employing a Bayesian optimization method, it was shown

that using the additive structure can effectively accelerate the

Bayesian optimization method. In

[

Friesen and Domingos,

2015

]

, a recursive decomposition method was proposed for

approximately locally decomposable problems. These meth-

ods, however, rely on the (mostly axis-parallel) decompos-

ability, which may restrict their applications.

Embedding methods assume that in the high-dimensional

space, only a small subspace effects the function value.

Therefore, optimization only in the effective subspace can

save a lot of efforts. In

[

Carpentier and Munos, 2012

]

, com-

pressed sensing was employed to deal with linear bandit prob-

lems with low-dimensional effective subspaces. In

[

Chen

et al., 2012

]

, a variable selection method was proposed to

identify the effective axis-parallel subspace. In

[

Djolonga

et al., 2013

]

, a low-rank matrix recovery technique was em-

ployed to learn the effective subspace. In

[

Wang et al., 2013;

Qian and Yu, 2016

]

, the random embedding based on the ran-

dom matrix theory was employed to identify the underlying

linear effective subspace. However, real-world problems may

not have a clear effective subspace, also it is hard to verify the

existence of the effective subspace.

Our Contributions. In this paper, we study high-

dimensional problems with low optimal "-effective dimen-

sions (see Deﬁnition 1). In these problems, any (linear trans-

formed) variable is allowed to effect the function value, how-

ever, only a small linear subspace has a large impact on

the function value, and the orthogonal complement subspace

makes only a small bounded effect.

Firstly, we characterize the property of random embedding

for this kind of problems. We ﬁnd that, given optimal "-

effective dimension, single random embedding bears 2" em-

bedding gap. Note that this embedding gap cannot be com-

pensated by the optimization algorithm.

We then propose the sequential random embeddings (SRE)

to overcome the embedding gap. SRE applies the random

embedding several times sequentially, and in each subspace,

SRE employs an optimization algorithm to reduce the residue

of the previous solution. Therefore, SRE can also be viewed

as a combination of decomposition and embedding, as each

random embedding deﬁnes a sub-problem. We also disclose

the condition under which SRE could improve the optimiza-

tion quality for a large class of problems.

In experiments, we apply SRE to several state-of-the-art

derivative-free optimization methods, and conduct experi-

ments on synthetic functions as well as classiﬁcation tasks

using the non-convex Ramp loss. Experiment results show

that SRE can signiﬁcantly improve the performance of the

optimization methods in high-dimensional problems. More-

over, comparing with previous related studies where testing

functions are mostly up to 1,000 variables, the derivative-free

methods with SRE are tested for up to 100,000 variables, in

real-world data sets.

The consequent sections respectively introduce the optimal

"-effective dimension problems and present the property of

random embedding, describe the proposed SRE technique as

well as its theoretical property, present the empirical results,

and ﬁnally conclude the paper.

2 Optimal "-Effective Dimension and

Random Embedding

Optimal "-Effective Dimension

Effective dimension deﬁned in

[

Wang et al., 2013

]

requires

the existence of a non-effective linear subspace, which has

exactly zero effect on the function value. It is often unrealis-

tic to make such an assumption. We thus make a relaxation to

this assumption as the optimal "-effective dimension in Deﬁ-

nition 1.

Note that a function with optimal "-effective dimension can

have no low-dimensional effective subspace according to the

deﬁnition given in

[

Wang et al., 2013

]

, i.e., no linear subspace

that has exactly zero effect on the function value. Instead, it

has a linear subspace that makes at most " contribution to the

function value. Therefore, this kind of problems may still be

efﬁciently tackled when " is not so large.

DEFINITION 1 (Optimal "-Effective Dimension)

For any ">0, a function f : R

! R is said to have

an "-effective subspace V

, if there exists a linear subspace

✓ R

s.t. for all x 2 R

, we have |f(x)  f (x

)|",

where x

is the orthogonal projection of x onto V

Let V

denote the collection of all the "-effective subspaces

of f, and dim(V) denote the dimension of a linear subspace

V. We deﬁne the optimal "-effective dimension of f as

=min

dim(V

In the deﬁnition above, it should be noted that " and d

are related variables, commonly, a small d

implies a large ",

while a small " implies a large d

Random Embedding

Given the deﬁnition of optimal "-effective dimension,

Lemma 1 below shows the effect of random embedding for

such functions. For simplicity, let N denote the Gaussian

distribution with zero mean and 

variance.

LEMMA 1

Given a function f : R

! R with optimal "-effective di-

mension d

, and any random matrix A 2 R

D⇥d

(d  d

)

with independent entries sampled from N , then, with prob-

ability 1, for any x 2 R

, there exists y 2 R

s.t.

|f(x)  f(Ay)|2".

Proof. We borrow the idea of constructing such y as in

[

Wang et al., 2013

]

. Since f has the optimal "-effective di-

mension d

, there exists an "-effective subspace V

✓ R

s.t. dim(V

)=d

. Besides, any x 2 R

can be decom-

posed as x = x

+ x

, where x

, x

and V

is the orthogonal complement of V

. By the deﬁnition of "-

effective subspace, we have |f(x)  f(x

)|". Hence, it

sufﬁces to show that, for any x

, there exists y 2 R

s.t. |f(x

)  f(Ay)|".

Let  2 R

D⇥d

be a matrix whose columns form a stan-

dard orthonormal basis for V

. For any x

, there ex-

ists c 2 R

s.t. x

= c. Let us for now assume that

1947

剩余6页未读，继续阅读

评论收藏

内容反馈

weixin_38723559

粉丝: 1
资源: 961

通过顺序随机嵌入对高维非凸函数进行无导数优化

rembo, 在高维的随机嵌入中，贝叶斯优化.zip

单纯形免疫算法及其在高维非凸函数优化中的应用 (2007年)

基于随机森林的高维数据可视化 (2014年)

求解高维优化问题的混合灰狼优化算法

基于遗传粒子群算法的高维复杂函数优化方法.pdf

论文研究-求解高维优化问题的遗传鸡群优化算法.pdf

论文研究 - 基于随机分布嵌入理论的日本股票收益短期预测的可能性

一种求解高维复杂函数优化问题的混合粒子群优化算法.pdf

高维多峰函数的量子行为粒子群优化算法改进研究

基于增强可伸缩随机森林的高维大数据预测分析系统.zip

求解高维函数优化的动态粒子群算法.pdf

细菌觅食算法求解高维优化问题

基于增强可伸缩随机森林的高维大数据预测分析系统.pdf

高维直方图统计函数

基于CORDIC的反正弦和反余弦计算的FPGA实现

BA无标度网络中的SIR模型

使用3DCNN和卷积LSTM进行手势识别学习时空特征

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于维纳过程的退化模型，具有递归过滤算法，可用于估计剩余使用寿命

基于FPGA的奇异值和特征值分解的快速实现。

基于BP神经网络的人口预测

磁悬浮系统自适应模糊PID控制器的设计

最新资源