LearningSpark资源-CSDN文库

Spark

5星 · 超过95%的资源需积分: 50 48 浏览量 2014-12-17 23:41:10 上传评论 3 收藏 1.19MB PDF 举报

资源推荐

资源详情

资源评论

Table of Contents

Preface

................................................................................................................................................

Audience

................................................................................................................................................

How This Book is Organized

..............................................................................................................

Supporting Books

.................................................................................................................................

Code Examples

.....................................................................................................................................

Early Release Status and Feedback

...................................................................................................

Chapter 1. Introduction to Data Analysis with Spark

......................................................

What is Apache Spark?

.......................................................................................................................

A Unified Stack

.....................................................................................................................................

Who Uses Spark, and For What?

......................................................................................................

A Brief History of Spark

....................................................................................................................

Spark Versions and Releases

............................................................................................................

Spark and Hadoop

.............................................................................................................................

Chapter 2. Downloading and Getting Started

...................................................................

Downloading Spark

............................................................................................................................

Introduction to Spark’s Python and Scala Shells

..........................................................................

Introduction to Core Spark Concepts

.............................................................................................

Standalone Applications

...................................................................................................................

Conclusion

..........................................................................................................................................

Chapter 3. Programming with RDDs

...................................................................................

RDD Basics

.........................................................................................................................................

Creating RDDs

...................................................................................................................................

RDD Operations

................................................................................................................................

Passing Functions to Spark

..............................................................................................................

Common Transformations and Actions

.........................................................................................

Persistence (Caching)

........................................................................................................................

Conclusion

..........................................................................................................................................

Chapter 4. Working with Key-Value Pairs

.........................................................................

Motivation

..........................................................................................................................................

Creating Pair RDDs

...........................................................................................................................

Transformations on Pair RDDs

.......................................................................................................

Actions Available on Pair RDDs

......................................................................................................

Data Partitioning

................................................................................................................................

Conclusion

..........................................................................................................................................

Chapter 5. Loading and Saving Your Data

..........................................................................

Motivation

...........................................................................................................................................

Choosing a Format

.............................................................................................................................

Formats

...............................................................................................................................................

File Systems

........................................................................................................................................

Compression

.......................................................................................................................................

Databases

............................................................................................................................................

Conclusion

..........................................................................................................................................

About the Authors

........................................................................................................................

Preface

As parallel data analysis has become increasingly common, practitioners in many fields have

sought easier tools for this task. Apache Spark has quickly emerged as one of the most popular

tools for this purpose, extending and generalizing MapReduce. Spark offers three main benefits.

First, it is easy to use—you can develop applications on your laptop, using a high-level API that

lets you focus on the content of your computation. Second, Spark is fast, enabling interactive use

and complex algorithms. And third, Spark is a general engine, allowing you to combine multiple

types of computations (e.g., SQL queries, text processing and machine learning) that might

previously have required learning different engines. These features make Spark an excellent

starting point to learn about big data in general.

This introductory book is meant to get you up and running with Spark quickly. You’ll learn how

to learn how to download and run Spark on your laptop and use it interactively to learn the API.

Once there, we’ll cover the details of available operations and distributed execution. Finally,

you’ll get a tour of the higher-level libraries built into Spark, including libraries for machine

learning, stream processing, graph analytics and SQL. We hope that this book gives you the

tools to quickly tackle data analysis problems, whether you do so on one machine or hundreds.

Audience

This book targets Data Scientists and Engineers. We chose these two groups because they have

the most to gain from using Spark to expand the scope of problems they can solve. Spark’s rich

collection of data focused libraries (like MLlib) make it easy for data scientists to go beyond

problems that fit on single machine while making use of their statistical background. Engineers,

meanwhile, will learn how to write general-purpose distributed programs in Spark and operate

production applications. Engineers and data scientists will both learn different details from this

book, but will both be able to apply Spark to solve large distributed problems in their respective

fields.

Data scientists focus on answering questions or building models from data. They often have a

statistical or math background and some familiarity with tools like Python, R and SQL. We have

made sure to include Python, and wherever possible SQL, examples for all our material, as well

as an overview of the machine learning and advanced analytics libraries in Spark. If you are a

data scientist, we hope that after reading this book you will be able to use the same

mathematical approaches to solving problems, except much faster and on a much larger scale.

剩余94页未读，继续阅读

评论收藏

内容反馈

zmengcsdn

2018-09-25

还不错，比较有用
hellobigbigworld

2016-12-01

不错，有目录，清晰
zaffy

2015-04-28

很不错的一本书。
yufeng22222

2015-06-10

挺不错。先学入门版。等待 spark in action 出来。
mmd_007

2017-03-14

书都不全啊，只有前几章。。。

前往

页

fredfudan

粉丝: 0
资源: 4

Learning Spark

最新资源

Learning Spark

Learning Spark

Learning Spark pdf

基于Scala和Python的《Learning Spark》书例代码设计源码

Learning Spark SQL - Aurobindo Sarkar

Learning Spark - Lighting Fast Data Analysis.pdf

Learning Spark .pdf 2015出版 高清

Learning Spark SQL epub

Learning Spark.pdf

learning spark 中文版下载

Spark-Learning

spark 入门学习教程

spark:spark学习笔记

spark生态系统的学习

spark学习资料

spark学习笔记一

spark入门课程

最新资源

Learning Spark .pdf 2015出版高清