【免费】Asimpleneuralnetworkmoduleforrelationalreasoning-《简单事实关系推理模块》资源-CSDN文库

4星 · 超过85%的资源需积分: 0 100 浏览量 2017-06-17 17:06:17 上传评论 1 收藏 1.37MB PDF 举报

资源推荐

资源详情

资源评论

A simple neural network module

for relational reasoning

Adam Santoro

∗

, David Raposo

∗

, David G.T. Barrett, Mateusz Malinowski,

Razvan Pascanu, Peter Battaglia, Timothy Lillicrap

adamsantoro@, draposo@, barrettdavid@, mateuszm@,

razp@, peterbattaglia@, countzero@google.com

DeepMind

London, United Kingdom

Abstract

Relational reasoning is a central component of generally intelligent behavior, but has proven

diﬃcult for neural networks to learn. In this paper we describe how to use Relation Networks

(RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational

reasoning. We tested RN-augmented networks on three tasks: visual question answering

using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human

performance; text-based question answering using the bAbI suite of tasks; and complex reasoning

about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show

that powerful convolutional networks do not have a general capacity to solve relational questions,

but can gain this capacity when augmented with RNs. Our work shows how a deep learning

architecture equipped with an RN module can implicitly discover and learn to reason about

entities and their relations.

1 Introduction

The ability to reason about the relations between entities and their properties is central to generally

intelligent behavior (Figure 1) [

]. Consider a child proposing a race between the two trees

in the park that are furthest apart: the pairwise distances between every tree in the park must be

inferred and compared to know where to run. Or, consider a reader piecing together evidence to

predict the culprit in a murder-mystery novel: each clue must be considered in its broader context to

build a plausible narrative and solve the mystery.

Symbolic approaches to artiﬁcial intelligence are inherently relational [

]. Practitioners deﬁne

the relations between symbols using the language of logic and mathematics, and then reason about

these relations using a multitude of powerful methods, including deduction, arithmetic, and algebra.

But symbolic approaches suﬀer from the symbol grounding problem and are not robust to small

task and input variations [

]. Other approaches, such as those based on statistical learning, build

representations from raw data and often generalize across diverse and noisy conditions [

]. However,

a number of these approaches, such as deep learning, often struggle in data-poor problems where the

underlying structure is characterized by sparse but complex relations [

]. Our results corroborate

these claims, and further demonstrate that seemingly simple relational inferences are remarkably

∗

Equal contribution.

arXiv:1706.01427v1 [cs.CL] 5 Jun 2017

What is the size of

the brown sphere?

Non-relational question:

Original Image:

Relational question:

Are there any rubber

things that have the

same size as the yellow

metallic cylinder?

Figure 1:

An illustrative example from the CLEVR dataset of relational reasoning

. An

image containing four objects is shown alongside non-relational and relational questions. The

relational question requires explicit reasoning about the relations between the four objects in the

image, whereas the non-relational question requires reasoning about the attributes of a particular

object.

diﬃcult for powerful neural network architectures such as convolutional neural networks (CNNs) and

multi-layer perceptrons (MLPs).

Here, we explore “Relation Networks” (RN) as a general solution to relational reasoning in neural

networks. RNs are architectures whose computations focus explicitly on relational reasoning [

Although several other models supporting relation-centric computation have been proposed, such

as Graph Neural Networks, Gated Graph Sequence Neural Networks, and Interaction Networks,

[

], RNs are simple, plug-and-play, and are exclusively focused on ﬂexible relational reasoning.

Moreover, through joint training RNs can inﬂuence and shape upstream representations in CNNs

and LSTMs to produce implicit object-like representations that it can exploit for relational reasoning.

We applied an RN-augmented architecture to CLEVR [

], a recent visual question answering

(QA) dataset on which state-of-the-art approaches have struggled due to the demand for rich

relational reasoning. Our networks vastly outperformed the best generally-applicable visual QA

architectures, and achieve state-of-the-art, super-human performance. RNs also solve CLEVR from

state descriptions, highlighting their versatility in regards to the form of their input. We also applied

an RN-based architecture to the bAbI text-based QA suite [

] and solved 18/20 of the subtasks.

Finally, we trained an RN to make challenging relational inferences about complex physical systems

and motion capture data. The success of RNs across this set of substantially dissimilar task domains

is testament to the general utility of RNs for solving problems that require relation reasoning.

2 Relation Networks

An RN is a neural network module with a structure primed for relational reasoning. The design

philosophy behind RNs is to constrain the functional form of a neural network so that it captures the

core common properties of relational reasoning. In other words, the capacity to compute relations

is baked into the RN architecture without needing to be learned, just as the capacity to reason

about spatial, translation invariant properties is built-in to CNNs, and the capacity to reason about

sequential dependencies is built into recurrent neural networks.

In its simplest form the RN is a composite function:

RN(O) = f





i,j

, o

)





, (1)

where the input is a set of “objects”

, o

, ..., o

}

∈ R

is the

object, and

and

are functions with parameters

and

, respectively. For our purposes,

and

are MLPs, and the

parameters are learnable synaptic weights, making RNs end-to-end diﬀerentiable. We call the output

a “relation”; therefore, the role of

is to infer the ways in which two objects are related, or if

they are even related at all.

RNs have three notable strengths: they learn to infer relations, they are data eﬃcient, and they

operate on a set of objects – a particularly general and versatile input format – in a manner that is

order invariant.

RNs learn to infer relations

The functional form in Equation 1 dictates that an RN should

consider the potential relations between all object pairs. This implies that an RN is not necessarily

privy to which object relations actually exist, nor to the actual meaning of any particular relation.

Thus, RNs must learn to infer the existence and implications of object relations.

In graph theory parlance, the input can be thought of as a complete and directed graph whose

nodes are objects and whose edges denote the object pairs whose relations should be considered.

Although we focus on this “all-to-all” version of the RN throughout this paper, this RN deﬁnition

can be adjusted to consider only some object pairs. Similar to Interaction Networks [

], to which

RNs are related, RNs can take as input a list of only those pairs that should be considered, if this

information is available. This information could be explicit in the input data, or could perhaps be

extracted by some upstream mechanism.

RNs are data eﬃcient

RNs use a single function

to compute each relation. This can be

thought of as a single function operating on a batch of object pairs, where each member of the

batch is a particular object-object pair from the same object set. This mode of operation encourages

greater generalization for computing relations, since

is encouraged not to over-ﬁt to the features

of any particular object pair. Consider how an MLP would learn the same function. An MLP would

receive all objects from the object set simultaneously as its input. It must then learn and embed

(where

is the number of objects) identical functions within its weight parameters to account for all

possible object pairings. This quickly becomes intractable as the number of objects grows. Therefore,

the cost of learning a relation function

times using a single feedforward pass per sample, as in an

MLP, is replaced by the cost of

feedforward passes per object set (i.e., for each possible object

pair in the set) and learning a relation function just once, as in an RN.

RNs operate on a set of objects

The summation in Equation 1 ensures that the RN is invariant

to the order of objects in the input. This invariance ensures that the RN’s input respects the property

that sets are order invariant, and it ensures that the output is order invariant. Ultimately, this

invariance ensures that the RN’s output contains information that is generally representative of the

relations that exist in the object set.

3 Tasks

We applied RN-augmented networks to a variety of tasks that hinge on relational reasoning. To

demonstrate the versatility of these networks we chose tasks from a number of diﬀerent domains,

including visual QA, text-based QA, and dynamic physical systems.

3.1 CLEVR

In visual QA a model must learn to answer questions about an image (Figure 1). This is a challenging

problem domain because it requires high-level scene understanding [

]. Architectures must perform

complex relational reasoning – spatial and otherwise – over the features in the visual inputs, language

inputs, and their conjunction. However, the majority of visual QA datasets require reasoning in the

absence of fully speciﬁed word vocabularies, and perhaps more perniciously, a vast and complicated

knowledge of the world that is not available in the training data. They also contain ambiguities and

exhibit strong linguistic biases that allow a model to learn answering strategies that exploit those

biases, without reasoning about the visual input [1, 31, 36].

To control for these issues, and to distill the core challenges of visual QA, the CLEVR visual QA

dataset was developed [

]. CLEVR contains images of 3D-rendered objects, such as spheres and

cylinders (Figure 2). Each image is associated with a number of questions that fall into diﬀerent

categories. For example,

query attribute

questions may ask “What is the color of the sphere? ”,

while compare attribute questions may ask “Is the cube the same material as the cylinder? ”.

For our purposes, an important feature of CLEVR is that many questions are explicitly relational

in nature. Remarkably, powerful QA architectures [

] are unable to solve CLEVR, presumably

because they cannot handle core relational aspects of the task. For example, as reported in the

original paper a model comprised of ResNet-101 image embeddings with LSTM question processing

and augmented with stacked attention modules vastly outperformed other models at an overall

performance of 68

5% (compared to 52

3% for the next best, and 92

6% human performance) [

However, for

compare attribute

and

count

questions (i.e., questions heavily involving relations

across objects), the model performed little better than the simplest baseline, which answered questions

solely based on the probability of answers in the training set for a given question category (Q-type

baseline).

We used two versions of the CLEVR dataset: (i) the pixel version, in which images were

represented in standard 2D pixel form, and (ii) a state description version, in which images were

explicitly represented by state description matrices containing factored object descriptions. Each

row in the matrix contained the features of a single object – 3D coordinates (x, y, z); color (r, g,

b); shape (cube, cylinder, etc.); material (rubber, metal, etc.); size (small, large, etc.). When we

trained our models, we used either the pixel version or the state description version, depending on

the experiment, but not both together.

3.2 Sort-of-CLEVR

To explore our hypothesis that the RN architecture is better suited to general relational reasoning as

compared to more standard neural architectures, we constructed a dataset similar to CLEVR that

we call “Sort-of-CLEVR”

. This dataset separates relational and non-relational questions.

Sort-of-CLEVR consists of images of 2D colored shapes along with questions and answers about

the images. Each image has a total of 6 objects, where each object is a randomly chosen shape

(square or circle). We used 6 colors (red, blue, green, orange, yellow, gray) to unambiguously identify

each object. Questions are hard-coded as ﬁxed-length binary strings to reduce the diﬃculty involved

with natural language question-word processing, and thereby remove any confounding diﬃculty

with language parsing. For each image we generated 10 relational questions and 10 non-relational

questions. Examples of relational questions are: “What is the shape of the object that is farthest from

the gray object? ”; and “How many objects have the same shape as the green object? ”. Examples of

non-relational questions are: “What is the shape of the gray object?”; and “Is the blue object on the

top or bottom of the scene? ”. The dataset is also visually simple, reducing complexities involved in

image processing.

3.3 bAbI

bAbI is a pure text-based QA dataset [

]. There are 20 tasks, each corresponding to a particular

type of reasoning, such as deduction, induction, or counting. Each question is associated with a set

of supporting facts. For example, the facts “Sandra picked up the football” and “Sandra went to

the oﬃce” support the question “Where is the football? ” (answer: “oﬃce”). A model succeeds on

a task if its performance surpasses 95%. Many memory-augmented neural networks have reported

impressive results on bAbI. When training jointly on all tasks using 10

examples per task, Memory

Networks pass 14

20, DNC 18

20, Sparse DNC 19

20, and EntNet 16

20 (the authors of EntNets

report state-of-the-art at 20

20; however, unlike previously reported results this was not done with

joint training on all tasks, where they instead achieve 16/20) [42, 9, 34, 13].

The “Sort-of-CLEVR” dataset will be made publicly available online.

剩余15页未读，继续阅读

评论收藏

内容反馈

huhuhu777666

2018-06-13

能分享的，支持支持。

阿炜

粉丝: 129
资源: 24

A simple neural network module for relational reasoning -《简单事实关系...

最新资源

A simple neural network module for relational reasoning -《简单事实关系...

Python-Pytorch实现关系推理的简单神经网络模块RelationalNetworks

关系网络：“用于关系推理的简单神经网络模块”的Pytorch实现（关系网络）

Automatic Estimation and Removal of Noise from a Single Image

LSTM-Neural-Network-for-Time-Series-Prediction-master.rar

Matlab - Development of Neural Network Theory for Artificial Life-thesis, MATLAB and Java code

Simulation and verification of Zhang neural network for online time-varying matrix inversion

LSTM-Neural-Network-for-Time-Series-Prediction_LSTM_股票预测_921212a

Make Your Own Neural Network (Tariq Rashid) - {CHB Books}

BigWEric-BP_NeuralNetwork-GA-archive-refs-heads-master.zip

Neural-Network-Toolbox 神经网络工具箱

Neural network (unsupervised learning)-Ch5

AdaBits Neural Network Quantization With Adaptive Bit-Widths.pdf

_fuzzy-neural-network-theory-and-application.pdf

Compositional Language Understanding with Text-based Relational Reasoning.pdf

A Convolutional Neural Network Cascade for Face Detection

Neural-network-scracth--源码.rar

Development of Neural Network Theory for Artificial Life-thesis, MATLAB and Java code

Deep-Convolutional-Neural-Network-for-Inverse-Problems-in-Imagin

语音中的mask---Neural network based spectral mask estimation for aco

ExtremeLearningMachine资源共享-Neural-network-design-and-model-reduction-approach-for-black-box_2013_Neuroc.pdf

Python库 | MM_Simple_NeuralNetwork-0.1.0-py3-none-any.whl

ChatGPT教程（终极版）最全整理

博客中Kmeans以及FCM算法数据（免积分）

hugging face的models-openai-clip-vit-large-patch14文件夹

神经网络回归预测--气温数据集

XGBoost+LightGBM+LSTM-光伏发电量预测

Mathwork+Matlab+编程手册

Stable-Diffusion WEBUI 简体中文语言包（2023.05.30更新）

基于Python+pytorch的图像处理+附完整代码图像处理，能够轻松实现图像的读取、显示、裁剪等还有机器学习等操作

最新资源