谷歌关于大数据的三篇经典论文（高清）_google三篇论文大数据资源-CSDN文库

共3个文件

pdf：3个

mapreduce

hdfs

bigtable

5星 · 超过95%的资源需积分: 50 19 浏览量 2018-06-07 16:13:16 上传评论收藏 590KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

谷歌研究论文.zip （3个子文件）

gfs-sosp2003.pdf 269KB

mapreduce-osdi04.pdf 186KB

bigtable-osdi06.pdf 216KB

The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Google

∗

ABSTRACT

We have designed and implemented the Google File Sys-

tem, a scalable distributed ﬁle system for large distributed

data-intensive applications. It provides fault tolerance while

running on inexpensive commodity hardware, and it delivers

high aggregate performance to a large number of clients.

While sharing many of the same goals as previous dis-

tributed ﬁle systems, our design has been driven by obser-

vations of our application workloads and technological envi-

ronment, both current and anticipated, that reﬂect a marked

departure from some earlier ﬁle system assumptions. This

has led us to reexamine traditional choices and explore rad-

ically diﬀerent design points.

The ﬁle system has successfully met our storage needs.

It is widely deployed within Google as the storage platform

for the generation and processing of data used by our ser-

vice as well as research and development eﬀorts that require

large data sets. The largest cluster to date provides hun-

dreds of terabytes of storage across thousands of disks on

over a thousand machines, and it is concurrently accessed

by hundreds of clients.

In this paper, we present ﬁle system interface extensions

designed to support distributed applications, discuss many

aspects of our design, and report measurements from both

micro-benchmarks and real world use.

Categories and Subject Descriptors

D[4]: 3—Distributed ﬁle systems

General Terms

Design, reliability, performance, measurement

Keywords

Fault tolerance, scalability, data storage, clustered storage

∗

The authors can be reached at the following addresses:

{sanjay,hgobioﬀ,shuntak}@google.com.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA.

$5.00.

1. INTRODUCTION

We have designed and implemented the Google File Sys-

tem (GFS) to meet the rapidly growing demands of Google’s

data processing needs. GFS shares many of the same goals

as previous distributed ﬁle systems such as performance,

scalability, reliability, and availability. However, its design

has been driven by key observations of our application work-

loads and technological environment, both current and an-

ticipated, that reﬂect a marked departure from some earlier

ﬁle system design assumptions. We have reexamined tradi-

tional choices and explored radically diﬀerent points in the

design space.

First, component failures are the norm rather than the

exception. The ﬁle system consists of hundreds or even

thousands of storage machines built from inexpensive com-

modity parts and is accessed by a comparable number of

client machines. The quantity and quality of the compo-

nents virtually guarantee that some are not functional at

any given time and some will not recover from their cur-

rent failures. We have seen problems caused by application

bugs, operating system bugs, human errors, and the failures

of disks, memory, connectors, networking, and power sup-

plies. Therefore, constant monitoring, error detection, fault

tolerance, and automatic recovery must be integral to the

system.

Second, ﬁles are huge by traditional standards. Multi-GB

ﬁles are common. Each ﬁle typically contains many applica-

tion objects such as web documents. When we are regularly

working with fast growing data sets of many TBs comprising

billions of objects, it is unwieldy to manage billions of ap-

proximately KB-sized ﬁles even when the ﬁle system could

support it. As a result, design assumptions and parameters

such as I/O operation and block sizes have to be revisited.

Third, most ﬁles are mutated by appending new data

rather than overwriting existing data. Random writes within

a ﬁle are practically non-existent. Once written, the ﬁles

are only read, and often only sequentially. A variety of

data share these characteristics. Some may constitute large

repositories that data analysis programs scan through. Some

may be data streams continuously generated by running ap-

plications. Some may be archival data. Some may be in-

termediate results produced on one machine and processed

on another, whether simultaneously or later in time. Given

this access pattern on huge ﬁles, appending becomes the fo-

cus of performance optimization and atomicity guarantees,

while caching data blocks in the client loses its appeal.

Fourth, co-designing the applications and the ﬁle system

API beneﬁts the overall system by increasing our ﬂexibility.

For example, we have relaxed GFS’s consistency model to

vastly simplify the ﬁle system without imposing an onerous

burden on the applications. We have also introduced an

atomic append operation so that multiple clients can append

concurrently to a ﬁle without extra synchronization between

them. These will be discussed in more details later in the

paper.

Multiple GFS clusters are currently deployed for diﬀerent

purposes. The largest ones have over 1000 storage nodes,

over 300 TB of disk storage, and are heavily accessed by

hundreds of clients on distinct machines on a continuous

basis.

2. DESIGN OVERVIEW

2.1 Assumptions

In designing a ﬁle system for our needs, we have been

guided by assumptions that oﬀer both challenges and op-

portunities. We alluded to some key observations earlier

and now lay out our assumptions in more details.

• The system is built from many inexpensive commodity

components that often fail. It must constantly monitor

itself and detect, tolerate, and recover promptly from

component failures on a routine basis.

• The system stores a modest number of large ﬁles. We

expect a few million ﬁles, each typically 100 MB or

larger in size. Multi-GB ﬁles are the common case

and should be managed eﬃciently. Small ﬁles must be

supported, but we need not optimize for them.

• The workloads primarily consist of two kinds of reads:

large streaming reads and small random reads. In

large streaming reads, individual operations typically

read hundreds of KBs, more commonly 1 MB or more.

Successive operations from the same client often read

through a contiguous region of a ﬁle. A small ran-

dom read typically reads a few KBs at some arbitrary

oﬀset. Performance-conscious applications often batch

and sort their small reads to advance steadily through

the ﬁle rather than go back and forth.

• The workloads also have many large, sequential writes

that append data to ﬁles. Typical operation sizes are

similar to those for reads. Once written, ﬁles are sel-

dom modiﬁed again. Small writes at arbitrary posi-

tions in a ﬁle are supported but do not have to be

eﬃcient.

• The system must eﬃciently implement well-deﬁned se-

mantics for multiple clients that concurrently append

to the same ﬁle. Our ﬁles are often used as producer-

consumer queues or for many-way merging. Hundreds

of producers, running one per machine, will concur-

rently append to a ﬁle. Atomicity with minimal syn-

chronization overhead is essential. The ﬁle may be

read later, or a consumer may be reading through the

ﬁle simultaneously.

• High sustained bandwidth is more important than low

latency. Most of our target applications place a pre-

mium on processing data in bulk at a high rate, while

few have stringent response time requirements for an

individual read or write.

2.2 Interface

GFS provides a familiar ﬁle system interface, though it

does not implement a standard API such as POSIX. Files are

organized hierarchically in directories and identiﬁed by path-

names. We support the usual operations to create, delete,

open, close, read,andwrite ﬁles.

Moreover, GFS has snapshot and record append opera-

tions. Snapshot creates a copy of a ﬁle or a directory tree

at low cost. Record append allows multiple clients to ap-

pend data to the same ﬁle concurrently while guaranteeing

the atomicity of each individual client’s append. It is use-

ful for implementing multi-way merge results and producer-

consumer queues that many clients can simultaneously ap-

pend to without additional locking. We have found these

types of ﬁles to be invaluable in building large distributed

applications. Snapshot and record append are discussed fur-

ther in Sections 3.4 and 3.3 respectively.

2.3 Architecture

A GFS cluster consists of a single master and multiple

chunkservers and is accessed by multiple clients,asshown

in Figure 1. Each of these is typically a commodity Linux

machine running a user-level server process. It is easy to run

both a chunkserver and a client on the same machine, as long

as machine resources permit and the lower reliability caused

by running possibly ﬂaky application code is acceptable.

Files are divided into ﬁxed-size chunks. Each chunk is

identiﬁed by an immutable and globally unique 64 bit chunk

handle assigned by the master at the time of chunk creation.

Chunkservers store chunks on local disks as Linux ﬁles and

read or write chunk data speciﬁed by a chunk handle and

byte range. For reliability, each chunk is replicated on multi-

ple chunkservers. By default, we store three replicas, though

users can designate diﬀerent replication levels for diﬀerent

regions of the ﬁle namespace.

The master maintains all ﬁle system metadata. This in-

cludes the namespace, access control information, the map-

ping from ﬁles to chunks, and the current locations of chunks.

It also controls system-wide activities such as chunk lease

management, garbage collection of orphaned chunks, and

chunk migration between chunkservers. The master peri-

odically communicates with each chunkserver in HeartBeat

messages to give it instructions and collect its state.

GFS client code linked into each application implements

the ﬁle system API and communicates with the master and

chunkservers to read or write data on behalf of the applica-

tion. Clients interact with the master for metadata opera-

tions, but all data-bearing communication goes directly to

the chunkservers. We do not provide the POSIX API and

therefore need not hook into the Linux vnode layer.

Neither the client nor the chunkserver caches ﬁle data.

Client caches oﬀer little beneﬁt because most applications

stream through huge ﬁles or have working sets too large

to be cached. Not having them simpliﬁes the client and

the overall system by eliminating cache coherence issues.

(Clients do cache metadata, however.) Chunkservers need

not cache ﬁle data because chunks are stored as local ﬁles

and so Linux’s buﬀer cache already keeps frequently accessed

data in memory.

2.4 Single Master

Having a single master vastly simpliﬁes our design and

enables the master to make sophisticated chunk placement

Legend:

Data messages

Control messages

Application

(file name, chunk index)

(chunk handle,

chunk locations)

GFS master

File namespace

/foo/bar

Instructions to chunkserver

Chunkserver state

GFS chunkserverGFS chunkserver

(chunk handle, byte range)

chunk data

chunk 2ef0

Linux file system Linux file system

GFS client

Figure 1: GFS Architecture

and replication decisions using global knowledge. However,

we must minimize its involvement in reads and writes so

that it does not become a bottleneck. Clients never read

and write ﬁle data through the master. Instead, a client asks

the master which chunkservers it should contact. It caches

this information for a limited time and interacts with the

chunkservers directly for many subsequent operations.

Let us explain the interactions for a simple read with refer-

ence to Figure 1. First, using the ﬁxed chunk size, the client

translates the ﬁle name and byte oﬀset speciﬁed by the ap-

plication into a chunk index within the ﬁle. Then, it sends

the master a request containing the ﬁle name and chunk

index. The master replies with the corresponding chunk

handle and locations of the replicas. The client caches this

information using the ﬁle name and chunk index as the key.

The client then sends a request to one of the replicas,

most likely the closest one. The request speciﬁes the chunk

handle and a byte range within that chunk. Further reads

of the same chunk require no more client-master interaction

until the cached information expires or the ﬁle is reopened.

In fact, the client typically asks for multiple chunks in the

same request and the master can also include the informa-

tion for chunks immediately following those requested. This

extra information sidesteps several future client-master in-

teractions at practically no extra cost.

2.5 Chunk Size

Chunk size is one of the key design parameters. We have

chosen 64 MB, which is much larger than typical ﬁle sys-

tem block sizes. Each chunk replica is stored as a plain

Linux ﬁle on a chunkserver and is extended only as needed.

Lazy space allocation avoids wasting space due to internal

fragmentation, perhaps the greatest objection against such

a large chunk size.

A large chunk size oﬀers several important advantages.

First, it reduces clients’ need to interact with the master

because reads and writes on the same chunk require only

one initial request to the master for chunk location informa-

tion. The reduction is especially signiﬁcant for our work-

loads because applications mostly read and write large ﬁles

sequentially. Even for small random reads, the client can

comfortably cache all the chunk location information for a

multi-TB working set. Second, since on a large chunk, a

client is more likely to perform many operations on a given

chunk, it can reduce network overhead by keeping a persis-

tent TCP connection to the chunkserver over an extended

period of time. Third, it reduces the size of the metadata

stored on the master. This allows us to keep the metadata

in memory, which in turn brings other advantages that we

will discuss in Section 2.6.1.

On the other hand, a large chunk size, even with lazy space

allocation, has its disadvantages. A small ﬁle consists of a

small number of chunks, perhaps just one. The chunkservers

storing those chunks may become hot spots if many clients

are accessing the same ﬁle. In practice, hot spots have not

been a major issue because our applications mostly read

large multi-chunk ﬁles sequentially.

However, hot spots did develop when GFS was ﬁrst used

by a batch-queue system: an executable was written to GFS

as a single-chunk ﬁle and then started on hundreds of ma-

chines at the same time. The few chunkservers storing this

executable were overloaded by hundreds of simultaneous re-

quests. We ﬁxed this problem by storing such executables

with a higher replication factor and by making the batch-

queue system stagger application start times. A potential

long-term solution is to allow clients to read data from other

clients in such situations.

2.6 Metadata

The master stores three major types of metadata: the ﬁle

and chunk namespaces, the mapping from ﬁles to chunks,

and the locations of each chunk’s replicas. All metadata is

kept in the master’s memory. The ﬁrst two types (names-

paces and ﬁle-to-chunk mapping) are also kept persistent by

logging mutations to an operation log stored on the mas-

ter’s local disk and replicated on remote machines. Using

a log allows us to update the master state simply, reliably,

and without risking inconsistencies in the event of a master

crash. The master does not store chunk location informa-

tion persistently. Instead, it asks each chunkserver about its

chunks at master startup and whenever a chunkserver joins

the cluster.

2.6.1 In-Memory Data Structures

Since metadata is stored in memory, master operations are

fast. Furthermore, it is easy and eﬃcient for the master to

periodically scan through its entire state in the background.

This periodic scanning is used to implement chunk garbage

collection, re-replication in the presence of chunkserver fail-

ures, and chunk migration to balance load and disk space

usage across chunkservers. Sections 4.3 and 4.4 will discuss

these activities further.

One potential concern for this memory-only approach is

that the number of chunks and hence the capacity of the

whole system is limited by how much memory the master

has. This is not a serious limitation in practice. The mas-

ter maintains less than 64 bytes of metadata for each 64 MB

chunk. Most chunks are full because most ﬁles contain many

chunks, only the last of which may be partially ﬁlled. Sim-

ilarly, the ﬁle namespace data typically requires less then

64 bytes per ﬁle because it stores ﬁle names compactly us-

ing preﬁx compression.

If necessary to support even larger ﬁle systems, the cost

of adding extra memory to the master is a small price to pay

for the simplicity, reliability, performance, and ﬂexibility we

gain by storing the metadata in memory.

2.6.2 Chunk Locations

The master does not keep a persistent record of which

chunkservers have a replica of a given chunk. It simply polls

chunkservers for that information at startup. The master

can keep itself up-to-date thereafter because it controls all

chunk placement and monitors chunkserver status with reg-

ular HeartBeat messages.

We initially attempted to keep chunk location information

persistently at the master, but we decided that it was much

simpler to request the data from chunkservers at startup,

and periodically thereafter. This eliminated the problem of

keeping the master and chunkservers in sync as chunkservers

join and leave the cluster, change names, fail, restart, and

so on. In a cluster with hundreds of servers, these events

happen all too often.

Another way to understand this design decision is to real-

ize that a chunkserver has the ﬁnal word over what chunks

it does or does not have on its own disks. There is no point

in trying to maintain a consistent view of this information

on the master because errors on a chunkserver may cause

chunks to vanish spontaneously (e.g., a disk may go bad

and be disabled) or an operator may rename a chunkserver.

2.6.3 Operation Log

The operation log contains a historical record of critical

metadata changes. It is central to GFS. Not only is it the

only persistent record of metadata, but it also serves as a

logical time line that deﬁnes the order of concurrent op-

erations. Files and chunks, as well as their versions (see

Section 4.5), are all uniquely and eternally identiﬁed by the

logical times at which they were created.

Since the operation log is critical, we must store it reli-

ably and not make changes visible to clients until metadata

changes are made persistent. Otherwise, we eﬀectively lose

the whole ﬁle system or recent client operations even if the

chunks themselves survive. Therefore, we replicate it on

multiple remote machines and respond to a client opera-

tion only after ﬂushing the corresponding log record to disk

both locally and remotely. The master batches several log

records together before ﬂushing thereby reducing the impact

of ﬂushing and replication on overall system throughput.

The master recovers its ﬁle system state by replaying the

operation log. To minimize startup time, we must keep the

log small. The master checkpoints its state whenever the log

grows beyond a certain size so that it can recover by loading

the latest checkpoint from local disk and replaying only the

Wri te Record Append

Serial deﬁned deﬁned

success interspersed with

Concurrent consistent inconsistent

successes but undeﬁned

Failure inconsistent

Table 1: File Region State After Mutation

limited number of log records after that. The checkpoint is

in a compact B-tree like form that can be directly mapped

into memory and used for namespace lookup without ex-

tra parsing. This further speeds up recovery and improves

availability.

Because building a checkpoint can take a while, the mas-

ter’s internal state is structured in such a way that a new

checkpoint can be created without delaying incoming muta-

tions. The master switches to a new log ﬁle and creates the

new checkpoint in a separate thread. The new checkpoint

includes all mutations before the switch. It can be created

in a minute or so for a cluster with a few million ﬁles. When

completed, it is written to disk both locally and remotely.

Recovery needs only the latest complete checkpoint and

subsequent log ﬁles. Older checkpoints and log ﬁles can

be freely deleted, though we keep a few around to guard

against catastrophes. A failure during checkpointing does

not aﬀect correctness because the recovery code detects and

skips incomplete checkpoints.

2.7 Consistency Model

GFS has a relaxed consistency model that supports our

highly distributed applications well but remains relatively

simple and eﬃcient to implement. We now discuss GFS’s

guarantees and what they mean to applications. We also

highlight how GFS maintains these guarantees but leave the

details to other parts of the paper.

2.7.1 Guarantees by GFS

File namespace mutations (e.g., ﬁle creation) are atomic.

They are handled exclusively by the master: namespace

locking guarantees atomicity and correctness (Section 4.1);

the master’s operation log deﬁnes a global total order of

these operations (Section 2.6.3).

The state of a ﬁle region after a data mutation depends

on the type of mutation, whether it succeeds or fails, and

whether there are concurrent mutations. Table 1 summa-

rizes the result. A ﬁle region is consistent if all clients will

always see the same data, regardless of which replicas they

read from. A region is deﬁned after a ﬁle data mutation if it

is consistent and clients will see what the mutation writes in

its entirety. When a mutation succeeds without interference

from concurrent writers, the aﬀected region is deﬁned (and

by implication consistent): all clients will always see what

the mutation has written. Concurrent successful mutations

leave the region undeﬁned but consistent: all clients see the

same data, but it may not reﬂect what any one mutation

has written. Typically, it consists of mingled fragments from

multiple mutations. A failed mutation makes the region in-

consistent (hence also undeﬁned): diﬀerent clients may see

diﬀerent data at diﬀerent times. We describe below how our

applications can distinguish deﬁned regions from undeﬁned

regions. The applications do not need to further distinguish

between diﬀerent kinds of undeﬁned regions.

Data mutations may be writes or record appends.Awrite

causes data to be written at an application-speciﬁed ﬁle

oﬀset. A record append causes data (the “record”) to be

appended atomically at least once even in the presence of

concurrent mutations, but at an oﬀset of GFS’s choosing

(Section 3.3). (In contrast, a “regular” append is merely a

write at an oﬀset that the client believes to be the current

end of ﬁle.) The oﬀset is returned to the client and marks

the beginning of a deﬁned region that contains the record.

In addition, GFS may insert padding or record duplicates in

between. They occupy regions considered to be inconsistent

and are typically dwarfed by the amount of user data.

After a sequence of successful mutations, the mutated ﬁle

region is guaranteed to be deﬁned and contain the data writ-

ten by the last mutation. GFS achieves this by (a) applying

mutations to a chunk in the same order on all its replicas

(Section 3.1), and (b) using chunk version numbers to detect

any replica that has become stale because it has missed mu-

tations while its chunkserver was down (Section 4.5). Stale

replicas will never be involved in a mutation or given to

clients asking the master for chunk locations. They are

garbage collected at the earliest opportunity.

Since clients cache chunk locations, they may read from a

stale replica before that information is refreshed. This win-

dow is limited by the cache entry’s timeout and the next

open of the ﬁle, which purges from the cache all chunk in-

formation for that ﬁle. Moreover, as most of our ﬁles are

append-only, a stale replica usually returns a premature

end of chunk rather than outdated data. When a reader

retries and contacts the master, it will immediately get cur-

rent chunk locations.

Long after a successful mutation, component failures can

of course still corrupt or destroy data. GFS identiﬁes failed

chunkservers by regular handshakes between master and all

chunkservers and detects data corruption by checksumming

(Section 5.2). Once a problem surfaces, the data is restored

from valid replicas as soon as possible (Section 4.3). A chunk

is lost irreversibly only if all its replicas are lost before GFS

can react, typically within minutes. Even in this case, it be-

comes unavailable, not corrupted: applications receive clear

errors rather than corrupt data.

2.7.2 Implications for Applications

GFS applications can accommodate the relaxed consis-

tency model with a few simple techniques already needed for

other purposes: relying on appends rather than overwrites,

checkpointing, and writing self-validating, self-identifying

records.

Practically all our applications mutate ﬁles by appending

rather than overwriting. In one typical use, a writer gener-

ates a ﬁle from beginning to end. It atomically renames the

ﬁle to a permanent name after writing all the data, or pe-

riodically checkpoints how much has been successfully writ-

ten. Checkpoints may also include application-level check-

sums. Readers verify and process only the ﬁle region up

to the last checkpoint, which is known to be in the deﬁned

state. Regardless of consistency and concurrency issues, this

approach has served us well. Appending is far more eﬃ-

cient and more resilient to application failures than random

writes. Checkpointing allows writers to restart incremen-

tally and keeps readers from processing successfully written

ﬁle data that is still incomplete from the application’s per-

spective.

In the other typical use, many writers concurrently ap-

pend to a ﬁle for merged results or as a producer-consumer

queue. Record append’s append-at-least-once semantics pre-

serves each writer’s output. Readers deal with the occa-

sional padding and duplicates as follows. Each record pre-

pared by the writer contains extra information like check-

sums so that its validity can be veriﬁed. A reader can

identify and discard extra padding and record fragments

using the checksums. If it cannot tolerate the occasional

duplicates (e.g., if they would trigger non-idempotent op-

erations), it can ﬁlter them out using unique identiﬁers in

the records, which are often needed anyway to name corre-

sponding application entities such as web documents. These

functionalities for record I/O (except duplicate removal) are

in library code shared by our applications and applicable to

other ﬁle interface implementations at Google. With that,

the same sequence of records, plus rare duplicates, is always

delivered to the record reader.

3. SYSTEM INTERACTIONS

We designed the system to minimize the master’s involve-

ment in all operations. With that background, we now de-

scribe how the client, master, and chunkservers interact to

implement data mutations, atomic record append, and snap-

shot.

3.1 Leases and Mutation Order

A mutation is an operation that changes the contents or

metadata of a chunk such as a write or an append opera-

tion. Each mutation is performed at all the chunk’s replicas.

We use leases to maintain a consistent mutation order across

replicas. The master grants a chunk lease to one of the repli-

cas, which we call the primary. The primary picks a serial

order for all mutations to the chunk. All replicas follow this

order when applying mutations. Thus, the global mutation

order is deﬁned ﬁrst by the lease grant order chosen by the

master, and within a lease by the serial numbers assigned

by the primary.

The lease mechanism is designed to minimize manage-

ment overhead at the master. A lease has an initial timeout

of 60 seconds. However, as long as the chunk is being mu-

tated, the primary can request and typically receive exten-

sions from the master indeﬁnitely. These extension requests

and grants are piggybacked on the HeartBeat messages reg-

ularly exchanged between the master and all chunkservers.

The master may sometimes try to revoke a lease before it

expires (e.g., when the master wants to disable mutations

on a ﬁle that is being renamed). Even if the master loses

communication with a primary, it can safely grant a new

lease to another replica after the old lease expires.

In Figure 2, we illustrate this process by following the

control ﬂow of a write through these numbered steps.

1. The client asks the master which chunkserver holds

the current lease for the chunk and the locations of

the other replicas. If no one has a lease, the master

grants one to a replica it chooses (not shown).

2. The master replies with the identity of the primary and

the locations of the other (secondary) replicas. The

client caches this data for future mutations. It needs

to contact the master again only when the primary

评论收藏

内容反馈

wpkissyan

2021-03-06

谢谢分享，学习了

大灰狼爱技术

粉丝: 2
资源: 6

谷歌关于大数据的三篇经典论文（高清）

Google大数据三大经典论文

谷歌关于大数据的三大论文

google大数据三篇论文

Google大数据经典论文（GFS/BigTable/MapReduce）

Google大数据三篇著名论文1

谷歌三篇大数据的著名论文（译文）

Google大数据必看三篇论文中文版

Google大数据三大经典论文(包含英文版和中文版)

【免费】Google大数据三篇著名论文中文版

Google大数据三篇著名论文中文版

大数据的三篇经典英文论文

谷歌发表的关于Goods的论文，高清英文原版

Google大数据三大论文（GFS/BigTable/MapReduce）

Google-大数据相关的3篇论文中文翻译.zip

Google大数据起源中文版三篇论文.zip

谷歌大数据三论文（中英合集）

大数据经典论文

Google引爆大数据时代的三篇论文

大数据必看论文

hadoop的三驾马车谷歌论文中文版(全)

Google大数据三大论文中文版下载 Google论文MapReduce、GFS、Bigtable论文下载

google大数据三大论文中英版合集

Google三驾马车大数据论文

Google大数据三篇论文.zip

谷歌大数据三大论文英文版

Google云计算大数据三大论文中文版英文版合集

第十五届蓝桥杯大赛软件赛省赛-C/C++ 研究生组

VRPTW 的 Solomon 标准测试数据集

数学建模国赛：无人机遂行编队飞行中的纯方位无源定位分析

最新资源