没有合适的资源?快使用搜索试试~ 我知道了~
Spark-The Definitive Guide Big Data Processing Made Simple
5星 · 超过95%的资源 需积分: 14 148 下载量 30 浏览量
2018-02-25
20:15:53
上传
评论 4
收藏 8.41MB PDF 举报
温馨提示
Spark-The Definitive Guide Big Data Processing Made Simple 完美true pdf。 Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scientist interested in big data. Spark supports multiple widely used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale-up to big data processing or incredibly large scale.
资源推荐
资源详情
资源评论
Spark: The Definitive Guide
Big Data Processing Made Simple
Bill Chambers and Matei Zaharia
Spark: The Definitive Guide
by Bill Chambers and Matei Zaharia
Copyright © 2018 Databricks. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online
editions are also available for most titles (http://oreilly.com/safari). For more information,
contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
Editor: Nicole Tache
Production Editor: Justin Billing
Copyeditor: Octal Publishing, Inc., Chris Edwards, and Amanda Kersey
Proofreader: Jasmine Kwityn
Indexer: Judith McConville
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
February 2018: First Edition
Revision History for the First Edition
2018-02-08: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781491912218 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Spark: The Definitive Guide,
the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. Apache, Spark
and Apache Spark are trademarks of the Apache Software Foundation.
While the publisher and the authors have used good faith efforts to ensure that the information
and instructions contained in this work are accurate, the publisher and the authors disclaim all
responsibility for errors or omissions, including without limitation responsibility for damages
resulting from the use of or reliance on this work. Use of the information and instructions
contained in this work is at your own risk. If any code samples or other technology this work
contains or describes is subject to open source licenses or the intellectual property rights of
others, it is your responsibility to ensure that your use thereof complies with such licenses and/or
rights.
978-1-491-91221-8
[M]
Preface
Welcome to this first edition of Spark: The Definitive Guide! We are excited to bring you the
most complete resource on Apache Spark today, focusing especially on the new generation of
Spark APIs introduced in Spark 2.0.
Apache Spark is currently one of the most popular systems for large-scale data processing, with
APIs in multiple programming languages and a wealth of built-in and third-party libraries.
Although the project has existed for multiple years—first as a research project started at UC
Berkeley in 2009, then at the Apache Software Foundation since 2013—the open source
community is continuing to build more powerful APIs and high-level libraries over Spark, so
there is still a lot to write about the project. We decided to write this book for two reasons. First,
we wanted to present the most comprehensive book on Apache Spark, covering all of the
fundamental use cases with easy-to-run examples. Second, we especially wanted to explore the
higher-level “structured” APIs that were finalized in Apache Spark 2.0—namely DataFrames,
Datasets, Spark SQL, and Structured Streaming—which older books on Spark don’t always
include. We hope this book gives you a solid foundation to write modern Apache Spark
applications using all the available tools in the project.
In this preface, we’ll tell you a little bit about our background, and explain who this book is for
and how we have organized the material. We also want to thank the numerous people who
helped edit and review this book, without whom it would not have been possible.
About the Authors
Both of the book’s authors have been involved in Apache Spark for a long time, so we are very
excited to be able to bring you this book.
Bill Chambers started using Spark in 2014 on several research projects. Currently, Bill is a
Product Manager at Databricks where he focuses on enabling users to write various types of
Apache Spark applications. Bill also regularly blogs about Spark and presents at conferences and
meetups on the topic. Bill holds a Master’s in Information Management and Systems from the
UC Berkeley School of Information.
Matei Zaharia started the Spark project in 2009, during his time as a PhD student at UC
Berkeley. Matei worked with other Berkeley researchers and external collaborators to design the
core Spark APIs and grow the Spark community, and has continued to be involved in new
initiatives such as the structured APIs and Structured Streaming. In 2013, Matei and other
members of the Berkeley Spark team co-founded Databricks to further grow the open source
project and provide commercial offerings around it. Today, Matei continues to work as Chief
Technologist at Databricks, and also holds a position as an Assistant Professor of Computer
Science at Stanford University, where he does research on large-scale systems and AI. Matei
received his PhD in Computer Science from UC Berkeley in 2013.
剩余600页未读,继续阅读
寒沧
- 粉丝: 270
- 资源: 161
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- coco.names 文件
- (源码)基于Spring Boot和Vue的房屋租赁管理系统.zip
- (源码)基于Android的饭店点菜系统.zip
- (源码)基于Android平台的权限管理系统.zip
- (源码)基于CC++和wxWidgets框架的LEGO模型火车控制系统.zip
- (源码)基于C语言的操作系统实验项目.zip
- (源码)基于C++的分布式设备配置文件管理系统.zip
- (源码)基于ESP8266和Arduino的HomeMatic水表读数系统.zip
- (源码)基于Django和OpenCV的智能车视频处理系统.zip
- (源码)基于ESP8266的WebDAV服务器与3D打印机管理系统.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
- 1
- 2
前往页