LearningSpark:Lightning-FastBigDataAnalysis资源-CSDN文库

5星 · 超过95%的资源需积分: 13 73 浏览量 2017-10-17 20:51:09 上传评论收藏 7.28MB PDF 举报

资源推荐

资源详情

资源评论

PROGRAMMING LANGUAGESSPARK

Learning Spark

ISBN: 978-1-449-35862-4

US $39.99 CAN $ 45.99

“

Learning Spark is at the

top of my list for anyone

needing a gentle guide

to the most popular

framework for building

big data applications.

”

—Ben Lorica

Chief Data Scientist, O’Reilly Media

Twitter: @oreillymedia

facebook.com/oreilly

Data in all domains is getting bigger. How can you work with it efficiently?

This book introduces Apache Spark, the open source cluster computing

system that makes data analytics fast to write and fast to run. With Spark,

you can tackle big datasets quickly through simple APIs in Python, Java,

and Scala.

Written by the developers of Spark, this book will have data scientists and

engineers up and running in no time. You’ll learn how to express parallel

jobs with just a few lines of code, and cover applications from simple batch

jobs to stream processing and machine learning.

■ Quickly dive into Spark capabilities such as distributed

datasets, in-memory caching, and the interactive shell

■ Leverage Spark’s powerful built-in libraries, including Spark

SQL, Spark Streaming, and MLlib

■ Use one programming paradigm instead of mixing and

matching tools like Hive, Hadoop, Mahout, and Storm

■ Learn how to deploy interactive, batch, and streaming

applications

■ Connect to data sources including HDFS, Hive, JSON, and S3

■ Master advanced topics like data partitioning and shared

variables

Holden Karau, a software development engineer at Databricks, is active in open

source and the author of Fast Data Processing with Spark (Packt Publishing).

Andy Konwinski, co-founder of Databricks, is a committer on Apache Spark and

co-creator of the Apache Mesos project.

Patrick Wendell is a co-founder of Databricks and a committer on Apache Spark.

He also maintains several subsystems of Spark’s core engine.

Matei Zaharia, CTO at Databricks, is the creator of Apache Spark and serves as

its Vice President at Apache.

Learning Spark

Karau, Konwinski,

Wendell & Zaharia

Holden Karau, Andy Konwinski,

Patrick Wendell & Matei Zaharia

Learning

Spark

LIGHTNING-FAST DATA ANALYSIS

PROGRAMMING LANGUAGESSPARK

Learning Spark

ISBN: 978-1-449-35862-4

US $39.99 CAN $45.99

“

Learning Spark is at the

top of my list for anyone

needing a gentle guide

to the most popular

framework for building

big data applications.

”

—Ben Lorica

Chief Data Scientist, O’Reilly Media

Twitter: @oreillymedia

facebook.com/oreilly

Data in all domains is getting bigger. How can you work with it efficiently?

This book introduces Apache Spark, the open source cluster computing

system that makes data analytics fast to write and fast to run. With Spark,

you can tackle big datasets quickly through simple APIs in Python, Java,

and Scala.

Written by the developers of Spark, this book will have data scientists and

engineers up and running in no time. You’ll learn how to express parallel

jobs with just a few lines of code, and cover applications from simple batch

jobs to stream processing and machine learning.

■ Quickly dive into Spark capabilities such as distributed

datasets, in-memory caching, and the interactive shell

■ Leverage Spark’s powerful built-in libraries, including Spark

SQL, Spark Streaming, and MLlib

■ Use one programming paradigm instead of mixing and

matching tools like Hive, Hadoop, Mahout, and Storm

■ Learn how to deploy interactive, batch, and streaming

applications

■ Connect to data sources including HDFS, Hive, JSON, and S3

■ Master advanced topics like data partitioning and shared

variables

Holden Karau, a software development engineer at Databricks, is active in open

source and the author of Fast Data Processing with Spark (Packt Publishing).

Andy Konwinski, co-founder of Databricks, is a committer on Apache Spark and

co-creator of the Apache Mesos project.

Patrick Wendell is a co-founder of Databricks and a committer on Apache Spark.

He also maintains several subsystems of Spark’s core engine.

Matei Zaharia, CTO at Databricks, is the creator of Apache Spark and serves as

its Vice President at Apache.

Learning Spark

Karau, Konwinski,

Wendell & Zaharia

Holden Karau, Andy Konwinski,

Patrick Wendell & Matei Zaharia

Learning

Spark

LIGHTNING-FAST DATA ANALYSIS

Table of Contents

Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Introduction to Data Analysis with Spark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

What Is Apache Spark? 1

A Unified Stack 2

Spark Core 3

Spark SQL 3

Spark Streaming 3

MLlib 4

GraphX 4

Cluster Managers 4

Who Uses Spark, and for What? 4

Data Science Tasks 5

Data Processing Applications 6

A Brief History of Spark 6

Spark Versions and Releases 7

Storage Layers for Spark 7

Downloading Spark and Getting Started. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Downloading Spark 9

Introduction to Spark’s Python and Scala Shells 11

Introduction to Core Spark Concepts 14

Standalone Applications 17

Initializing a SparkContext 17

Building Standalone Applications 18

Conclusion 21

iii

剩余273页未读，继续阅读

评论收藏

内容反馈

weixin_38245345

2018-03-25

非常好的入门资料，感谢！
树哥

2017-12-13

还可以，下来看看
要努力啊要努力

2018-08-12

很好，谢谢！

bfodxwt

粉丝: 0
资源: 3

Learning Spark: Lightning-Fast Big Data Analysis

最新资源

Learning Spark: Lightning-Fast Big Data Analysis

Learning-Spark-Lightning-Fast-Data-Analysis

Learning.Spark.Lightning-Fast.Big.Data.Analysis.pdf

FastSparkStreaming-2.0.jar

Learning.Spark.Lightning-Fast.Big.Data.Analysis

Learning Spark - Lighting Fast Data Analysis.pdf

Big-Data-Analysis-on-International-Health-and-Population-Metrics:我使用过 Hadoop、Hive、Spark 等大数据工具来分析我从 Kaggle 获取的数据集

Big_Data_Analysis_of_US_Accidents_data：此代码是雪城大学IST 716的最终项目。 有关方法和结果的详细信息位于报告文件中。 此外，该数据集包含约300万行数据，因此，Apache Spark已通过在线数据砖平台用于分析

learning-spark-lightning-fast-big-data-analysis:学习星火

Spark: Big Data Cluster Computing in Production

Python库 | pytorch_lightning-1.1.2-py3-none-any.whl

Python库 | pytorch-lightning-0.8.1.tar.gz

Big Data Analysis and Deep Learning Applications

lightning-ui:Lightning-UI基于bootstrap3.3和slds（salesforce）

sklearn_contrib_lightning-0.6.1-cp38-cp38-win_amd64

sklearn_contrib_lightning-0.6.1-cp39-cp39-win_amd64

sklearn_contrib_lightning-0.5.0-cp27-cp27m-win32

sklearn_contrib_lightning-0.5.0-cp27-cp27m-win_amd64

sklearn_contrib_lightning-0.6.0-cp36-cp36m-win_amd64

Python库 | pytorch-lightning-bolts-0.1.1.tar.gz

sklearn_contrib_lightning-0.6.1-pp37-pypy37_pp73-win_amd64

Lightning-Community-Carousel:闪电的简单实现

sklearn_contrib_lightning-0.6.1-cp39-cp39-win32

Scala-升级版.docx

基于spark的图书推荐系统

大数据期末课设~基于spark的气象数据处理与分析

全国职业技能大赛大数据赛项十套赛题（shtd）

全国2014-2018年空气质量csv数据集文件数据

最新资源

Big_Data_Analysis_of_US_Accidents_data：此代码是雪城大学IST 716的最终项目。有关方法和结果的详细信息位于报告文件中。此外，该数据集包含约300万行数据，因此，Apache Spark已通过在线数据砖平台用于分析