没有合适的资源?快使用搜索试试~ 我知道了~
Advanced Analytics with Spark 2nd Edition
需积分: 10 9 下载量 179 浏览量
2018-11-02
08:08:07
上传
评论
收藏 5.6MB PDF 举报
温馨提示
试读
275页
Spark Intro; Basic Scala Algorithm; Advanced Spark Project
资源推荐
资源详情
资源评论
978-1-491-97295-3
[LSI]
Advanced Analytics with Spark
by Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills
Copyright © 2017 Sanford Ryza, Uri Laserson, Sean Owen, Joshua Wills. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐
tutional sales department: 800-998-9938 or corporate@oreilly.com.
Editor: Marie Beaugureau
Production Editor: Melanie Yarbrough
Copyeditor: Gillian McGarvey
Proofreader: Christina Edwards
Indexer: WordCo Indexing Services
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
June 2017: Second Edition
Revision History for the Second Edition
2017-06-09: First Release
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Advanced Analytics with Spark, the
cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
www.allitebooks.com
Table of Contents
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1.
Analyzing Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The Challenges of Data Science 3
Introducing Apache Spark 4
About This Book 6
The Second Edition 7
2.
Introduction to Data Analysis with Scala and Spark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Scala for Data Scientists 10
The Spark Programming Model 11
Record Linkage 12
Getting Started: The Spark Shell and SparkContext 13
Bringing Data from the Cluster to the Client 19
Shipping Code from the Client to the Cluster 22
From RDDs to Data Frames 23
Analyzing Data with the DataFrame API 26
Fast Summary Statistics for DataFrames 32
Pivoting and Reshaping DataFrames 33
Joining DataFrames and Selecting Features 37
Preparing Models for Production Environments 38
Model Evaluation 40
Where to Go from Here 41
3.
Recommending Music and the Audioscrobbler Data Set. . . . . . . . . . . . . . . . . . . . . . . . . . 43
Data Set 44
iii
www.allitebooks.com
剩余274页未读,继续阅读
资源评论
CodingArtist
- 粉丝: 4
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功