TensorflowTutorial(英文)资源-CSDN文库

共382个文件

html：222个

png：76个

htm：63个

Tensorflow

需积分: 10 191 浏览量 2016-08-01 22:05:15 上传评论 1 收藏 17.94MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

Tensorflow Tutorial (英文) （382个子文件）

main.css 21KB

iris_training.csv 2KB

iris_test.csv 573B

IncremeterFifoQueue.gif 3.83MB

graph_vis_animation.gif 3.25MB

AnimatedFileQueues.gif 504KB

tensors_flowing.gif 374KB

op_kernel.htm 385KB

monitors.htm 336KB

op_compatibility_test.htm 321KB

tensor_c_api.htm 242KB

io_ops.htm 223KB

op_def_builder.htm 210KB

word2vec.htm 204KB

word2vec_optimized.htm 176KB

io_ops.htm 176KB

cifar10.htm 170KB

tensor_shape.htm 149KB

op.htm 139KB

types.htm 138KB

cifar10_multi_gpu_train.htm 133KB

resource_mgr.htm 131KB

example.htm 131KB

word2vec_basic.htm 129KB

cifar10_input.htm 125KB

meta_graph.htm 124KB

fully_connected_feed.htm 122KB

config.htm 120KB

example_trainer.htm 118KB

mnist_with_summaries.htm 116KB

types_2.htm 113KB

fully_connected_reader.htm 112KB

pad_op.htm 112KB

fully_connected_preloaded_var.htm 104KB

fully_connected_preloaded.htm 103KB

register_types.htm 102KB

cifar10_eval.htm 102KB

op_def_builder_2.htm 100KB

README.htm 100KB

device_name_utils.htm 100KB

mnist.htm 99KB

op_def.htm 99KB

cifar10_train.htm 95KB

base.htm 92KB

reader_base.htm 92KB

fixed_length_record_reader_op.htm 90KB

graph.htm 89KB

tf_record_reader_op.htm 87KB

convert_to_records.htm 87KB

text_line_reader_op.htm 87KB

status.htm 86KB

feature.htm 84KB

errors.htm 84KB

summary.htm 83KB

tensor.htm 79KB

inspect_checkpoint.htm 77KB

types_3.htm 77KB

pad_op_2.htm 76KB

event.htm 75KB

worker_service.htm 75KB

cuda_op_kernel.htm 73KB

reader_op_kernel.htm 72KB

tensor_shape_2.htm 72KB

input_data.htm 69KB

pad_op_gpu.cu.htm 68KB

cuda_op_kernel.cu.htm 68KB

user_ops.htm 68KB

zero_out_op_1.htm 68KB

saver.htm 67KB

deepdream.htm 64KB

kernels.html 860KB

contrib.distributions.html 327KB

contrib.learn.html 313KB

train.html 253KB

train.html 252KB

nn.html 251KB

math_ops.html 247KB

nn.html 234KB

math_ops.html 219KB

index.html 206KB

framework.html 205KB

framework.html 204KB

io_ops.html 203KB

state_ops.html 195KB

state_ops.html 194KB

array_ops.html 188KB

framework.html 182KB

state_ops.html 171KB

contrib.metrics.html 169KB

index.html 169KB

contrib.metrics.html 168KB

image.html 166KB

image.html 164KB

ops.html 155KB

sparse_ops.html 154KB

array_ops.html 153KB

sparse_ops.html 153KB

contrib.layers.html 145KB

contrib.layers.html 144KB

共 382 条

TensorFlow:

Large-Scale Machine Learning on Heterogeneous Distributed Systems

(Preliminary White Paper, November 9, 2015)

Mart

ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,

Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow,

Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,

Manjunath Kudlur, Josh Levenberg, Dan Man

e, Rajat Monga, Sherry Moore, Derek Murray,

Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,

Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi

egas, Oriol Vinyals,

Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng

Google Research

∗

Abstract

TensorFlow [1] is an interface for expressing machine learn-

ing algorithms, and an implementation for executing such al-

gorithms. A computation expressed using TensorFlow can be

executed with little or no change on a wide variety of hetero-

geneous systems, ranging from mobile devices such as phones

and tablets up to large-scale distributed systems of hundreds

of machines and thousands of computational devices such as

GPU cards. The system is ﬂexible and can be used to express

a wide variety of algorithms, including training and inference

algorithms for deep neural network models, and it has been

used for conducting research and for deploying machine learn-

ing systems into production across more than a dozen areas of

computer science and other ﬁelds, including speech recogni-

tion, computer vision, robotics, information retrieval, natural

language processing, geographic information extraction, and

computational drug discovery. This paper describes the Ten-

sorFlow interface and an implementation of that interface that

we have built at Google. The TensorFlow API and a reference

implementation were released as an open-source package under

the Apache 2.0 license in November, 2015 and are available at

www.tensorﬂow.org.

1 Introduction

The Google Brain project started in 2011 to explore the

use of very-large-scale deep neural networks, both for

research and for use in Google’s products. As part of

the early work in this project, we built DistBelief, our

ﬁrst-generation scalable distributed training and infer-

ence system [14], and this system has served us well. We

and others at Google have performed a wide variety of re-

search using DistBelief including work on unsupervised

learning [31], language representation [35, 52], models

for image classiﬁcation and object detection [16, 48],

video classiﬁcation [27], speech recognition [56, 21, 20],

∗

Corresponding authors: Jeffrey Dean and Rajat Monga:

{jeff,rajatmonga}@google.com

sequence prediction [47], move selection for Go [34],

pedestrian detection [2], reinforcement learning [38],

and other areas [17, 5]. In addition, often in close collab-

oration with the Google Brain team, more than 50 teams

at Google and other Alphabet companies have deployed

deep neural networks using DistBelief in a wide variety

of products, including Google Search [11], our advertis-

ing products, our speech recognition systems [50, 6, 46],

Google Photos [43], Google Maps and StreetView [19],

Google Translate [18], YouTube, and many others.

Based on our experience with DistBelief and a more

complete understanding of the desirable system proper-

ties and requirements for training and using neural net-

works, we have built TensorFlow, our second-generation

system for the implementation and deployment of large-

scale machine learning models. TensorFlow takes com-

putations described using a dataﬂow-like model and

maps them onto a wide variety of different hardware

platforms, ranging from running inference on mobile

device platforms such as Android and iOS to modest-

sized training and inference systems using single ma-

chines containing one or many GPU cards to large-scale

training systems running on hundreds of specialized ma-

chines with thousands of GPUs. Having a single system

that can span such a broad range of platforms signiﬁ-

cantly simpliﬁes the real-world use of machine learning

system, as we have found that having separate systems

for large-scale training and small-scale deployment leads

to signiﬁcant maintenance burdens and leaky abstrac-

tions. TensorFlow computations are expressed as stateful

dataﬂow graphs (described in more detail in Section 2),

and we have focused on making the system both ﬂexible

enough for quickly experimenting with new models for

research purposes and sufﬁciently high performance and

robust for production training and deployment of ma-

chine learning models. For scaling neural network train-

ing to larger deployments, TensorFlow allows clients to

easily express various kinds of parallelism through repli-

cation and parallel execution of a core model dataﬂow

graph, with many different computational devices all col-

laborating to update a set of shared parameters or other

state. Modest changes in the description of the com-

putation allow a wide variety of different approaches

to parallelism to be achieved and tried with low effort

[14, 29, 42]. Some TensorFlow uses allow some ﬂexibil-

ity in terms of the consistency of parameter updates, and

we can easily express and take advantage of these relaxed

synchronization requirements in some of our larger de-

ployments. Compared to DistBelief, TensorFlow’s pro-

gramming model is more ﬂexible, its performance is sig-

niﬁcantly better, and it supports training and using a

broader range of models on a wider variety of hetero-

geneous hardware platforms.

Dozens of our internal clients of DistBelief have al-

ready switched to TensorFlow. These clients rely on

TensorFlow for research and production, with tasks as

diverse as running inference for computer vision mod-

els on mobile phones to large-scale training of deep

neural networks with hundreds of billions of parame-

ters on hundreds of billions of example records using

many hundreds of machines [11, 47, 48, 18, 53, 41].

Although these applications have concentrated on ma-

chine learning and deep neural networks in particular,

we expect that TensorFlow’s abstractions will be useful

in a variety of other domains, including other kinds of

machine learning algorithms, and possibly other kinds

of numerical computations. We have open-sourced the

TensorFlow API and a reference implementation under

the Apache 2.0 license in November, 2015, available at

www.tensorﬂow.org.

The rest of this paper describes TensorFlow in more

detail. Section 2 describes the programming model and

basic concepts of the TensorFlow interface, and Section 3

describes both our single machine and distributed imple-

mentations. Section 4 describes several extensions to

the basic programming model, and Section 5 describes

several optimizations to the basic implementations. Sec-

tion 6 describes some of our experiences in using Ten-

sorFlow, Section 7 describes several programming id-

ioms we have found helpful when using TensorFlow, and

Section 9 describes several auxiliary tools we have built

around the core TensorFlow system. Sections 10 and 11

discuss future and related work, respectively, and Sec-

tion 12 offers concluding thoughts.

2 Programming Model and Basic Concepts

A TensorFlow computation is described by a directed

graph, which is composed of a set of nodes. The graph

represents a dataﬂow computation, with extensions for

allowing some kinds of nodes to maintain and update

persistent state and for branching and looping control

structures within the graph in a manner similar to Naiad

[36]. Clients typically construct a computational graph

using one of the supported frontend languages (C++ or

Python). An example fragment to construct and then ex-

ecute a TensorFlow graph using the Python front end is

shown in Figure 1, and the resulting computation graph

in Figure 2.

In a TensorFlow graph, each node has zero or more in-

puts and zero or more outputs, and represents the instan-

tiation of an operation. Values that ﬂow along normal

edges in the graph (from outputs to inputs) are tensors,

arbitrary dimensionality arrays where the underlying el-

ement type is speciﬁed or inferred at graph-construction

time. Special edges, called control dependencies, can

also exist in the graph: no data ﬂows along such edges,

but they indicate that the source node for the control de-

pendence must ﬁnish executing before the destination

node for the control dependence starts executing. Since

our model includes mutable state, control dependencies

can be used directly by clients to enforce happens before

relationships. Our implementation also sometimes in-

serts control dependencies to enforce orderings between

otherwise independent operations as a way of, for exam-

ple, controlling the peak memory usage.

Operations and Kernels

An operation has a name and represents an abstract com-

putation (e.g., “matrix multiply”, or “add”). An opera-

tion can have attributes, and all attributes must be pro-

vided or inferred at graph-construction time in order to

instantiate a node to perform the operation. One com-

mon use of attributes is to make operations polymorphic

over different tensor element types (e.g., add of two ten-

sors of type ﬂoat versus add of two tensors of type int32).

A kernel is a particular implementation of an operation

that can be run on a particular type of device (e.g., CPU

or GPU). A TensorFlow binary deﬁnes the sets of opera-

tions and kernels available via a registration mechanism,

and this set can be extended by linking in additional op-

eration and/or kernel deﬁnitions/registrations. Table 1

shows some of the kinds of operations built into the core

TensorFlow library.

Sessions

Clients programs interact with the TensorFlow system by

creating a Session. To create a computation graph, the

Session interface supports an Extend method to augment

the current graph managed by the session with additional

nodes and edges (the initial graph when a session is cre-

ated is empty). The other primary operation supported

import tensorflow as tf

b = tf.Variable(tf.zeros([100])) # 100-d vector, init to zeroes

W = tf.Variable(tf.random_uniform([784,100],-1,1)) # 784x100 matrix w/rnd vals

x = tf.placeholder(name="x") # Placeholder for input

relu = tf.nn.relu(tf.matmul(W, x) + b) # Relu(Wx+b)

C = [...] # Cost computed as a function

# of Relu

s = tf.Session()

for step in xrange(0, 10):

input = ...construct 100-D input array ... # Create 100-d vector for input

result = s.run(C, feed_dict={x: input}) # Fetch cost, feeding x=input

print step, result

Figure 1: Example TensorFlow code fragment

MatMul

Add

ReLU

...

Figure 2: Corresponding computation graph for Figure 1

Category Examples

Element-wise mathematical operations Add, Sub, Mul, Div, Exp, Log, Greater, Less, Equal, ...

Array operations Concat, Slice, Split, Constant, Rank, Shape, Shufﬂe, ...

Matrix operations MatMul, MatrixInverse, MatrixDeterminant, ...

Stateful operations Variable, Assign, AssignAdd, ...

Neural-net building blocks SoftMax, Sigmoid, ReLU, Convolution2D, MaxPool, ...

Checkpointing operations Save, Restore

Queue and synchronization operations Enqueue, Dequeue, MutexAcquire, MutexRelease, ...

Control ﬂow operations Merge, Switch, Enter, Leave, NextIteration

Table 1: Example TensorFlow operation types

by the session interface is Run, which takes a set of out-

put names that need to be computed, as well as an op-

tional set of tensors to be fed into the graph in place of

certain outputs of nodes. Using the arguments to Run,

the TensorFlow implementation can compute the transi-

tive closure of all nodes that must be executed in order

to compute the outputs that were requested, and can then

arrange to execute the appropriate nodes in an order that

respects their dependencies (as described in more detail

in 3.1). Most of our uses of TensorFlow set up a Session

with a graph once, and then execute the full graph or a

few distinct subgraphs thousands or millions of times via

Run calls.

Variables

In most computations a graph is executed multiple times.

Most tensors do not survive past a single execution of the

graph. However, a Variable is a special kind of opera-

tion that returns a handle to a persistent mutable tensor

that survives across executions of a graph. Handles to

these persistent mutable tensors can be passed to a hand-

ful of special operations, such as Assign and AssignAdd

(equivalent to +=) that mutate the referenced tensor. For

machine learning applications of TensorFlow, the param-

eters of the model are typically stored in tensors held in

variables, and are updated as part of the Run of the train-

ing graph for the model.

3 Implementation

The main components in a TensorFlow system are the

client, which uses the Session interface to communicate

with the master, and one or more worker processes, with

each worker process responsible for arbitrating access to

one or more computational devices (such as CPU cores

or GPU cards) and for executing graph nodes on those

devices as instructed by the master. We have both lo-

cal and distributed implementations of the TensorFlow

interface. The local implementation is used when the

client, the master, and the worker all run on a single ma-

chine in the context of a single operating system process

(possibly with multiple devices, if for example, the ma-

chine has many GPU cards installed). The distributed

implementation shares most of the code with the local

implementation, but extends it with support for an en-

vironment where the client, the master, and the workers

can all be in different processes on different machines.

In our distributed environment, these different tasks are

containers in jobs managed by a cluster scheduling sys-

tem [51]. These two different modes are illustrated in

Figure 3. Most of the rest of this section discusses is-

sues that are common to both implementations, while

Section 3.3 discusses some issues that are particular to

the distributed implementation.

Devices

Devices are the computational heart of TensorFlow. Each

worker is responsible for one or more devices, and

each device has a device type, and a name. Device

names are composed of pieces that identify the de-

vice’s type, the device’s index within the worker, and,

in our distributed setting, an identiﬁcation of the job

and task of the worker (or localhost for the case where

the devices are local to the process). Example device

names are "/job:localhost/device:cpu:0" or

"/job:worker/task:17/device:gpu:3". We

have implementations of our Device interface for CPUs

and GPUs, and new device implementations for other de-

vice types can be provided via a registration mechanism.

Each device object is responsible for managing alloca-

tion and deallocation of device memory, and for arrang-

ing for the execution of any kernels that are requested by

higher levels in the TensorFlow implementation.

Tensors

A tensor in our implementation is a typed, multi-

dimensional array. We support a variety of tensor ele-

ment types, including signed and unsigned integers rang-

ing in size from 8 bits to 64 bits, IEEE ﬂoat and double

types, a complex number type, and a string type (an ar-

bitrary byte array). Backing store of the appropriate size

is managed by an allocator that is speciﬁc to the device

on which the tensor resides. Tensor backing store buffers

are reference counted and are deallocated when no refer-

ences remain.

3.1 Single-Device Execution

Let’s ﬁrst consider the simplest execution scenario: a sin-

gle worker process with a single device. The nodes of the

graph are executed in an order that respects the depen-

dencies between nodes. In particular, we keep track of

a count per node of the number of dependencies of that

node that have not yet been executed. Once this count

drops to zero, the node is eligible for execution and is

added to a ready queue. The ready queue is processed in

some unspeciﬁed order, delegating execution of the ker-

nel for a node to the device object. When a node has

ﬁnished executing, the counts of all nodes that depend

on the completed node are decremented.

3.2 Multi-Device Execution

Once a system has multiple devices, there are two main

complications: deciding which device to place the com-

putation for each node in the graph, and then managing

the required communication of data across device bound-

aries implied by these placement decisions. This subsec-

tion discusses these two issues.

3.2.1 Node Placement

Given a computation graph, one of the main responsi-

bilities of the TensorFlow implementation is to map the

computation onto the set of available devices. A sim-

pliﬁed version of this algorithm is presented here. See

Section 4.3 for extensions supported by this algorithm.

One input to the placement algorithm is a cost model,

which contains estimates of the sizes (in bytes) of the

client

master

session

run

execute

subgraph

worker

GPU0 GPU1

...

CPU0

client

process

session

run

execute

subgraph

worker

process 1

GPU0

GPU1

...

CPU0

GPU0

GPU1

...

CPU0

GPU0

GPU1

...

CPU0

master

process

worker

process 2

worker

process 3

single process

Figure 3: Single machine and distributed system structure

input and output tensors for each graph node, along with

estimates of the computation time required for each node

when presented with its input tensors. This cost model is

either statically estimated based on heuristics associated

with different operation types, or is measured based on

an actual set of placement decisions for earlier execu-

tions of the graph.

The placement algorithm ﬁrst runs a simulated execu-

tion of the graph. The simulation is described below and

ends up picking a device for each node in the graph using

greedy heuristics. The node to device placement gener-

ated by this simulation is also used as the placement for

the real execution.

The placement algorithm starts with the sources of the

computation graph, and simulates the activity on each

device in the system as it progresses. For each node that

is reached in this traversal, the set of feasible devices is

considered (a device may not be feasible if the device

does not provide a kernel that implements the particular

operation). For nodes with multiple feasible devices, the

placement algorithm uses a greedy heuristic that exam-

ines the effects on the completion time of the node of

placing the node on each possible device. This heuristic

takes into account the estimated or measured execution

time of the operation on that kind of device from the cost

model, and also includes the costs of any communica-

tion that would be introduced in order to transmit inputs

to this node from other devices to the considered device.

The device where the node’s operation would ﬁnish the

soonest is selected as the device for that operation, and

the placement process then continues onwards to make

placement decisions for other nodes in the graph, includ-

ing downstream nodes that are now ready for their own

simulated execution. Section 4.3 describes some exten-

sions that allow users to provide hints and partial con-

straints to guide the placement algorithm. The placement

algorithm is an area of ongoing development within the

system.

3.2.2 Cross-Device Communication

Once the node placement has been computed, the graph

is partitioned into a set of subgraphs, one per device. Any

cross-device edge from x to y is removed and replaced

by an edge from x to a new Send node in x’s subgraph

and an edge from a corresponding Receive node to y in

y’s subgraph. See Figure 4 for an example of this graph

transformation.

recv

send

recv

send

Device A

W W

Device A

Device B Device B

Figure 4: Before & after insertion of Send/Receive nodes

At runtime, the implementations of the Send and Re-

ceive nodes coordinate to transfer data across devices.

This allows us to isolate all communication inside Send

and Receive implementations, which simpliﬁes the rest

of the runtime.

When we insert Send and Receive nodes, we canoni-

calize all users of a particular tensor on a particular de-

vice to use a single Receive node, rather than one Re-

ceive node per downstream user on a particular device.

This ensures that the data for the needed tensor is only

transmitted once between a source device → destination

device pair, and that memory for the tensor on the desti-

nation device is only allocated once, rather than multiple

times (e.g., see nodes b and c in Figure 4)

By handling communication in this manner, we also

allow the scheduling of individual nodes of the graph

on different devices to be decentralized into the work-

ers: the Send and Receive nodes impart the necessary

评论收藏

内容反馈

rudy_sky

粉丝: 0
资源: 1

Tensorflow Tutorial (英文)

TensorFlow Tutorial

tensorflow-tutorial

tensorflow tutorial

中文版Tensorflow手册Tutorial

TensorFlow入门指导（中英文 PDF版）

tensorflow官方文档（英文）

tensorflow-tutorial-samples

Tensorflow 2 Tutorial.pdf

Tensorflow-CNN-Tutorial-master.zip

TensorFlow学习参考资料（英文原著）

TensorFlow高校PPT.rar

python使用tensorflow保存、加载和使用模型的方法

Neural-Finance:【最新技术】一些真正可行的想法，关于如何在定量金融中使用人工智能

AI-Face-Detection:使用TensorFlow检测图像中的人脸

TensorFLow PPT课件

斯坦福Tensorflow课件笔记

tensorflow（斯坦福大学的资源）

TensorFlow入门PPT（原理、安装和使用算法资源）

DeepLearningZeroToAll：TensorFlow基本教程实验室

TensorFlow基本教程实验室-Python开发

ML代码的100天：ML代码的100天中文版

使用张量流进行强化学习：简单的强化学习教程

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

仿真电路以及操作方法

【纯干货啊】华为IPD流程管理(完整版).pptx

可编程语言标准IEC61131-3中文版.pdf

OFDM完整仿真过程与教程.zip

信号与系统——保研复习资料.pdf

Landsat_WRS2.zip

最新资源

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar