元学习应用，在神经网络方面的应用资源-CSDN文库

神经网络

需积分: 1 176 浏览量 2024-02-29 17:03:07 上传评论收藏 830KB PDF 举报

资源推荐

资源详情

资源评论

Meta-Learning in Neural Networks: A Survey

Timothy Hospedales, Antreas Antoniou, Paul Micaelli, Amos Storkey

Abstract—The ﬁeld of meta-learning, or learning-to-learn, has seen a dramatic rise in interest in recent years. Contrary to

conventional approaches to AI where tasks are solved from scratch using a ﬁxed learning algorithm, meta-learning aims to improve the

learning algorithm itself, given the experience of multiple learning episodes. This paradigm provides an opportunity to tackle many

conventional challenges of deep learning, including data and computation bottlenecks, as well as generalization. This survey describes

the contemporary meta-learning landscape. We ﬁrst discuss deﬁnitions of meta-learning and position it with respect to related ﬁelds,

such as transfer learning and hyperparameter optimization. We then propose a new taxonomy that provides a more comprehensive

breakdown of the space of meta-learning methods today. We survey promising applications and successes of meta-learning such as

few-shot learning and reinforcement learning. Finally, we discuss outstanding challenges and promising areas for future research.

Index Terms—Meta-Learning, Learning-to-Learn, Few-Shot Learning, Transfer Learning, Neural Architecture Search

1 INTRODUCTION

Contemporary machine learning models are typically

trained from scratch for a speciﬁc task using a ﬁxed learn-

ing algorithm designed by hand. Deep learning-based ap-

proaches speciﬁcally have seen great successes in a variety

of ﬁelds [1]–[3]. However there are clear limitations [4]. For

example, successes have largely been in areas where vast

quantities of data can be collected or simulated, and where

huge compute resources are available. This excludes many

applications where data is intrinsically rare or expensive [5],

or compute resources are unavailable [6].

Meta-learning provides an alternative paradigm where

a machine learning model gains experience over multiple

learning episodes – often covering a distribution of related

tasks – and uses this experience to improve its future

learning performance. This ‘learning-to-learn’ [7] can lead

to a variety of beneﬁts such as data and compute efﬁciency,

and it is better aligned with human and animal learning [8],

where learning strategies improve both on a lifetime and

evolutionary timescales [8]–[10].

Historically, the success of machine learning was driven

by the choice of hand-engineered features [11], [12]. Deep

learning realised the promise of joint feature and model

learning [13], providing a huge improvement in perfor-

mance for many tasks [1], [3]. Meta-learning in neural

networks can be seen as aiming to provide the next step

of integrating joint feature, model, and algorithm learning.

Neural network meta-learning has a long history [7],

[14], [15]. However, its potential as a driver to advance the

frontier of the contemporary deep learning industry has

led to an explosion of recent research. In particular meta-

learning has the potential to alleviate many of the main

criticisms of contemporary deep learning [4], for instance

by improving data efﬁciency, knowledge transfer and un-

supervised learning. Meta-learning has proven useful both

in multi-task scenarios where task-agnostic knowledge is

T. Hospedales is with Samsung AI Centre, Cambridge and University of Edin-

burgh. A. Antoniou, P. Micaelli and Storkey are with University of Edinburgh.

Email: {t.hospedales,a.antoniou,paul.micaelli,a.storkey}@ed.ac.uk.

extracted from a family of tasks and used to improve learn-

ing of new tasks from that family [7], [16]; and single-task

scenarios where a single problem is solved repeatedly and

improved over multiple episodes [17]–[19]. Successful appli-

cations have been demonstrated in areas spanning few-shot

image recognition [16], [20], unsupervised learning [21],

data efﬁcient [22], [23] and self-directed [24] reinforcement

learning (RL), hyperparameter optimization [17], and neural

architecture search (NAS) [18], [25], [26].

Many perspectives on meta-learning can be found in

the literature, in part because different communities use the

term differently. Thrun [7] operationally deﬁnes learning-to-

learn as occurring when a learner’s performance at solving

tasks drawn from a given task family improves with respect

to the number of tasks seen. (cf., conventional machine

learning performance improves as more data from a single

task is seen). This perspective [27]–[29] views meta-learning

as a tool to manage the ‘no free lunch’ theorem [30] and im-

prove generalization by searching for the algorithm (induc-

tive bias) that is best suited to a given problem, or problem

family. However, this deﬁnition can include transfer, multi-

task, feature-selection, and model-ensemble learning, which

are not typically considered as meta-learning today. Another

usage of meta-learning [31] deals with algorithm selection

based on dataset features, and becomes hard to distinguish

from automated machine learning (AutoML) [32], [33].

In this paper, we focus on contemporary neural-network

meta-learning. We take this to mean algorithm learning as

per [27], [28], but focus speciﬁcally on where this is achieved

by end-to-end learning of an explicitly deﬁned objective func-

tion (such as cross-entropy loss). Additionally we consider

single-task meta-learning, and discuss a wider variety of

(meta) objectives such as robustness and compute efﬁciency.

This paper thus provides a unique, timely, and up-to-

date survey of the rapidly growing area of neural network

meta-learning. In contrast, previous surveys are rather out

of date and/or focus on algorithm selection for data mining

[27], [31], [34], [35], AutoML [32], [33], or particular appli-

cations of meta-learning such as few-shot learning [36] or

neural architecture search [37].

arXiv:2004.05439v2 [cs.LG] 7 Nov 2020

We address both meta-learning methods and applica-

tions. We ﬁrst introduce meta-learning through a high-level

problem formalization that can be used to understand and

position work in this area. We then provide a new taxonomy

in terms of meta-representation, meta-objective and meta-

optimizer. This framework provides a design-space for de-

veloping new meta learning methods and customizing them

for different applications. We survey several popular and

emerging application areas including few-shot, reinforce-

ment learning, and architecture search; and position meta-

learning with respect to related topics such as transfer and

multi-task learning. We conclude by discussing outstanding

challenges and areas for future research.

2 B ACKGROUND

Meta-learning is difﬁcult to deﬁne, having been used in var-

ious inconsistent ways, even within contemporary neural-

network literature. In this section, we introduce our deﬁni-

tion and key terminology, and then position meta-learning

with respect to related topics.

Meta-learning is most commonly understood as learn-

ing to learn, which refers to the process of improving a

learning algorithm over multiple learning episodes. In con-

trast, conventional ML improves model predictions over

multiple data instances. During base learning, an inner

(or lower/base) learning algorithm solves a task such as

image classiﬁcation [13], deﬁned by a dataset and objective.

During meta-learning, an outer (or upper/meta) algorithm

updates the inner learning algorithm such that the model

it learns improves an outer objective. For instance this

objective could be generalization performance or learning

speed of the inner algorithm. Learning episodes of the base

task, namely (base algorithm, trained model, performance)

tuples, can be seen as providing the instances needed by the

outer algorithm to learn the base learning algorithm.

As deﬁned above, many conventional algorithms such

as random search of hyper-parameters by cross-validation

could fall within the deﬁnition of meta-learning. The

salient characteristic of contemporary neural-network meta-

learning is an explicitly deﬁned meta-level objective, and end-

to-end optimization of the inner algorithm with respect to

this objective. Often, meta-learning is conducted on learning

episodes sampled from a task family, leading to a base

learning algorithm that performs well on new tasks sampled

from this family. However, in a limiting case all training

episodes can be sampled from a single task. In the following

section, we introduce these notions more formally.

2.1 Formalizing Meta-Learning

Conventional Machine Learning In conventional super-

vised machine learning, we are given a training dataset

D = {(x

, y

), . . . , (x

, y

)}, such as (input image, output

label) pairs. We can train a predictive model ˆy = f

(x)

parameterized by θ, by solving:

∗

= arg min

L(D; θ, ω)

(1)

where L is a loss function that measures the error between

true labels and those predicted by f

(·). The conditioning on

ω denotes the dependence of this solution on assumptions

about ‘how to learn’, such as the choice of optimizer for θ

or function class for f . Generalization is then measured by

evaluating a number of test points with known labels.

The conventional assumption is that this optimization is

performed from scratch for every problem D; and that ω is

pre-speciﬁed. However, the speciﬁcation of ω can drastically

affect performance measures like accuracy or data efﬁciency.

Meta-learning seeks to improve these measures by learning

the learning algorithm itself, rather than assuming it is pre-

speciﬁed and ﬁxed. This is often achieved by revisiting the

ﬁrst assumption above, and learning from a distribution of

tasks rather than from scratch.

Meta-Learning: Task-Distribution View A common view

of meta-learning is to learn a general purpose learning algo-

rithm that can generalize across tasks, and ideally enable

each new task to be learned better than the last. We can

evaluate the performance of ω over a distribution of tasks

p(T ). Here we loosely deﬁne a task to be a dataset and loss

function T = {D, L}. Learning how to learn thus becomes

min

T ∼p(T )

L(D; ω) (2)

where L(D; ω) measures the performance of a model

trained using ω on dataset D. ‘How to learn’, i.e. ω, is often

referred to as across-task knowledge or meta-knowledge.

To solve this problem in practice, we often assume access

to a set of source tasks sampled from p(T ). Formally, we

denote the set of M source tasks used in the meta-training

stage as D

source

= {(D

train

source

, D

val

source

)

(i)

}

i=1

where each

task has both training and validation data. Often, the source

train and validation datasets are respectively called support

and query sets. The meta-training step of ‘learning how to

learn’ can be written as:

∗

= arg max

log p(ω|D

source

) (3)

Now we denote the set of Q target tasks used in the

meta-testing stage as D

target

= {(D

train

target

, D

test

target

)

(i)

}

i=1

where each task has both training and test data. In the meta-

testing stage we use the learned meta-knowledge ω

∗

to train

the base model on each previously unseen target task i:

∗ (i)

= arg max

log p(θ|ω

∗

, D

train (i)

target

) (4)

In contrast to conventional learning in Eq. 1, learning on

the training set of a target task i now beneﬁts from meta-

knowledge ω

∗

about the algorithm to use. This could be an

estimate of the initial parameters [16], or an entire learning

model [38] or optimization strategy [39]. We can evaluate the

accuracy of our meta-learner by the performance of θ

∗ (i)

the test split of each target task D

test (i)

target

This setup leads to analogies of conventional underﬁt-

ting and overﬁtting: meta-underﬁtting and meta-overﬁtting. In

particular, meta-overﬁtting is an issue whereby the meta-

knowledge learned on the source tasks does not generalize

to the target tasks. It is relatively common, especially in

the case where only a small number of source tasks are

available. It can be seen as learning an inductive bias ω

that constrains the hypothesis space of θ too tightly around

solutions to the source tasks.

Meta-Learning: Bilevel Optimization View The previous

discussion outlines the common ﬂow of meta-learning in a

multiple task scenario, but does not specify how to solve

the meta-training step in Eq. 3. This is commonly done

by casting the meta-training step as a bilevel optimization

problem. While this picture is arguably only accurate for

the optimizer-based methods (see section 3.1), it is helpful

to visualize the mechanics of meta-learning more generally.

Bilevel optimization [40] refers to a hierarchical optimiza-

tion problem, where one optimization contains another

optimization as a constraint [17], [41]. Using this notation,

meta-training can be formalised as follows:

∗

= arg min

i=1

meta

can measure, such as validation performance,

learning speed or model robustness.

Finally, we note that the above formalization of meta-

training uses the notion of a distribution over tasks. While

common in the meta-learning literature, it is not a necessary

condition for meta-learning. More formally, if we are given

a single train and test dataset (M = Q = 1), we can split

the training set to get validation data such that D

source

train

source

, D

val

source

) for meta-training, and for meta-testing

we can use D

target

= (D

train

source

∪ D

val

source

, D

test

target

). We still

learn ω over several episodes, and different train-val splits

are usually used during meta-training.

Meta-Learning: Feed-Forward Model View As we will

see, there are a number of meta-learning approaches that

synthesize models in a feed-forward manner, rather than via

an explicit iterative optimization as in Eqs. 5-6 above. While

they vary in their degree of complexity, it can be instructive

to understand this family of approaches by instantiating the

abstract objective in Eq. 2 to deﬁne a toy example for meta-

training linear regression [43].

min

T ∼p(T )

val

)∈T

(x,y)∈D

val

) − y)

(7)

Here we meta-train by optimizing over a distribution of

tasks. For each task a train and validation set is drawn. The

train set D

is embedded [44] into a vector g

which deﬁnes

the linear regression weights to predict examples x from the

validation set. Optimizing Eq. 7 ‘learns to learn’ by training

the function g

to map a training set to a weight vector.

Thus g

should provide a good solution for novel meta-

test tasks T

drawn from p(T ). Methods in this family

vary in the complexity of the predictive model g used, and

how the support set is embedded [44] (e.g., by pooling,

CNN or RNN). These models are also known as amortized

[45] because the cost of learning a new task is reduced

to a feed-forward operation through g

(·), with iterative

optimization already paid for during meta-training of ω.

2.2 Historical Context of Meta-Learning

Meta-learning and learning-to-learn ﬁrst appear in the lit-

erature in 1987 [14]. J. Schmidhuber introduced a family of

methods that can learn how to learn, using self-referential

learning. Self-referential learning involves training neural

networks that can receive as inputs their own weights and

predict updates for said weights. Schmidhuber proposed to

learn the model itself using evolutionary algorithms.

Meta-learning was subsequently extended to multiple

areas. Bengio et al. [46], [47] proposed to meta-learn biolog-

ically plausible learning rules. Schmidhuber et al.continued

to explore self-referential systems and meta-learning [48],

[49]. S. Thrun et al. took care to more clearly deﬁne the

term learning to learn in [7] and introduced initial theoretical

justiﬁcations and practical implementations. Proposals for

training meta-learning systems using gradient descent and

backpropagation were ﬁrst made in 1991 [50] followed by

more extensions in 2001 [51], [52], with [27] giving an

overview of the literature at that time. Meta-learning was

used in the context of reinforcement learning in 1995 [53],

followed by various extensions [54], [55].

2.3 Related Fields

Here we position meta-learning against related areas whose

relation to meta-learning is often a source of confusion.

Transfer Learning (TL) TL [34], [56] uses past experi-

ence from a source task to improve learning (speed, data

efﬁciency, accuracy) on a target task. TL refers both to

this problem area and family of solutions, most commonly

parameter transfer plus optional ﬁne tuning [57] (although

there are numerous other approaches [34]).

In contrast, meta-learning refers to a paradigm that can

be used to improve TL as well as other problems. In TL

the prior is extracted by vanilla learning on the source task

without the use of a meta-objective. In meta-learning, the

corresponding prior would be deﬁned by an outer opti-

mization that evaluates the beneﬁt of the prior when learn

a new task, as illustrated by MAML [16]. More generally,

meta-learning deals with a much wider range of meta-

representations than solely model parameters (Section 4.1).

Domain Adaptation (DA) and Domain Generalization

(DG) Domain-shift refers to the situation where source

and target problems share the same objective, but the input

distribution of the target task is shifted with respect to the

source task [34], [58], reducing model performance. DA is

a variant of transfer learning that attempts to alleviate this

issue by adapting the source-trained model using sparse or

unlabeled data from the target. DG refers to methods to train

a source model to be robust to such domain-shift without

further adaptation. Many knowledge transfer methods have

been studied [34], [58] to boost target domain performance.

However, as for TL, vanilla DA and DG don’t use a meta-

objective to optimize ‘how to learn’ across domains. Mean-

while, meta-learning methods can be used to perform both

DA [59] and DG [42] (see Sec. 5.8).

Continual learning (CL) Continual or lifelong learning

[60]–[62] refers to the ability to learn on a sequence of tasks

drawn from a potentially non-stationary distribution, and

in particular seek to do so while accelerating learning new

tasks and without forgetting old tasks. Similarly to meta-

learning, a task distribution is considered, and the goal is

partly to accelerate learning of a target task. However most

continual learning methodologies are not meta-learning

methodologies since this meta objective is not solved for

explicitly. Nevertheless, meta-learning provides a potential

framework to advance continual learning, and a few recent

studies have begun to do so by developing meta-objectives

that encode continual learning performance [63]–[65].

Multi-Task Learning (MTL) aims to jointly learn sev-

eral related tasks, to beneﬁt from regularization due to

parameter sharing and the diversity of the resulting shared

representation [66]–[68], as well as compute/memory sav-

ings. Like TL, DA, and CL, conventional MTL is a single-

level optimization without a meta-objective. Furthermore,

the goal of MTL is to solve a ﬁxed number of known tasks,

whereas the point of meta-learning is often to solve unseen

future tasks. Nonetheless, meta-learning can be brought in

to beneﬁt MTL, e.g. by learning the relatedness between

tasks [69], or how to prioritise among multiple tasks [70].

Hyperparameter Optimization (HO) is within the remit

of meta-learning, in that hyperparameters like learning rate

or regularization strength describe ‘how to learn’. Here we

include HO tasks that deﬁne a meta objective that is trained

end-to-end with neural networks, such as gradient-based

hyperparameter learning [69], [71] and neural architecture

search [18]. But we exclude other approaches like random

search [72] and Bayesian Hyperparameter Optimization

[73], which are rarely considered to be meta-learning.

Hierarchical Bayesian Models (HBM) involve Bayesian

learning of parameters θ under a prior p(θ|ω). The prior

is written as a conditional density on some other variable

ω which has its own prior p(ω). Hierarchical Bayesian

models feature strongly as models for grouped data D =

|i = 1, 2, . . . , M}, where each group i has its own

. The full model is

i=1

p(D

|θ

)p(θ

|ω)

p(ω). The lev-

els of hierarchy can be increased further; in particular ω

can itself be parameterized, and hence p(ω) can be learnt.

Learning is usually full-pipeline, but using some form of

Bayesian marginalisation to compute the posterior over

ω: P (ω|D) ∼ p(ω)

i=1

dθ

p(D

|θ

)p(θ

|ω). The ease of

doing the marginalisation depends on the model: in some

(e.g. Latent Dirichlet Allocation [74]) the marginalisation is

exact due to the choice of conjugate exponential models,

in others (see e.g. [75]), a stochastic variational approach is

used to calculate an approximate posterior, from which a

lower bound to the marginal likelihood is computed.

Bayesian hierarchical models provide a valuable view-

point for meta-learning, by providing a modeling rather

than an algorithmic framework for understanding the meta-

learning process. In practice, prior work in HBMs has typi-

cally focused on learning simple tractable models θ while

most meta-learning work considers complex inner-loop

learning processes, involving many iterations. Nonetheless,

some meta-learning methods like MAML [16] can be under-

stood through the lens of HBMs [76].

AutoML: AutoML [31]–[33] is a rather broad umbrella

for approaches aiming to automate parts of the machine

learning process that are typically manual, such as data

preparation, algorithm selection, hyper-parameter tuning,

and architecture search. AutoML often makes use of numer-

ous heuristics outside the scope of meta-learning as deﬁned

here, and focuses on tasks such as data cleaning that are

less central to meta-learning. However, AutoML sometimes

makes use of end-to-end optimization of a meta-objective,

so meta-learning can be seen as a specialization of AutoML.

3 TAXONOMY

3.1 Previous Taxonomies

Previous [77], [78] categorizations of meta-learning meth-

ods have tended to produce a three-way taxonomy across

optimization-based methods, model-based (or black box)

methods, and metric-based (or non-parametric) methods.

Optimization Optimization-based methods include those

where the inner-level task (Eq. 6) is literally solved as

an optimization problem, and focuses on extracting meta-

knowledge ω required to improve optimization perfor-

mance. A famous example is MAML [16], which aims to

learn the initialization ω = θ

, such that a small number

of inner steps produces a classiﬁer that performs well on

validation data. This is also performed by gradient descent,

differentiating through the updates of the base model. More

elaborate alternatives also learn step sizes [79], [80] or

train recurrent networks to predict steps from gradients

[19], [39], [81]. Meta-optimization by gradient over long

inner optimizations leads to several compute and memory

challenges which are discussed in Section 6. A uniﬁed view

of gradient-based meta learning expressing many existing

methods as special cases of a generalized inner loop meta-

learning framework has been proposed [82].

Black Box / Model-based In model-based (or black-box)

methods the inner learning step (Eq. 6, Eq. 4) is wrapped up

in the feed-forward pass of a single model, as illustrated

in Eq. 7. The model embeds the current dataset D into

activation state, with predictions for test data being made

based on this state. Typical architectures include recurrent

networks [39], [51], convolutional networks [38] or hyper-

networks [83], [84] that embed training instances and labels

of a given task to deﬁne a predictor for test samples. In this

case all the inner-level learning is contained in the activation

states of the model and is entirely feed-forward. Outer-

level learning is performed with ω containing the CNN,

RNN or hypernetwork parameters. The outer and inner-

level optimizations are tightly coupled as ω and D directly

specify θ. Memory-augmented neural networks [85] use an

explicit storage buffer and can be seen as a model-based

剩余19页未读，继续阅读

评论收藏

内容反馈

稽函数

粉丝: 6
资源: 33

元学习应用，在神经网络方面的应用

神经网络与应用

深度卷积神经网络在计算机视觉中的应用研究综述_卢宏涛.pdf

遗传算法和神经网络在导弹测试设备故障诊断中的应用研究-遗传算法和神经网络在导弹测试设备故障诊断中的应用研究.rar

人工神经网络应用系统体系

模糊理论和神经网络的基础与应用-模糊理论和神经网络的基础与应用_10280267.rar

人工神经网络在机器人控制中的应用

基于图论的机器学习算法设计及在神经网络中的应用研究

基于卷积神经网络的深度学习算法与应用研究

进化算法在人工神经网络中的应用研究_habitpnh_进化算法_

忆阻器在人工神经网络方面的研究应用.pdf

卷积神经网络研究综述

BP神经网络在管网漏失方面的应用.pdf

卷积神经网络在入侵检测方面的应用.pdf

神经网络学习方法，非常有意思

神经网络在学校人才培养方面的应用.pdf

深度神经网络综述

面向matlab工具箱的神经网络理论与应用

基于模糊神经网络自学习控制的交流变频调速控制.pdf

人工神经网络在肾脏疾病方面的应用及展望.pdf

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

民宿网站

桌面聊天室

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

2023泛娱乐社交出海手册-ZEGO即构科技

4个亲测好用的ChatGPT4渠道

最新资源