Scalable analytics faster than ever
Second Edition
Mastering
Apache Spark 2.x
Romeo Kienzler
Mastering Apache Spark 2.x
Second Edition
4DBMBCMFBOBMZUJDTGBTUFSUIBOFWFS
Romeo Kienzler
BIRMINGHAM - MUMBAI
Mastering Apache Spark 2.x
Second Edition
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, without the prior written permission of the
publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this book is sold without
warranty, either express or implied. Neither the author, nor Packt Publishing, and its
dealers and distributors will be held liable for any damages caused or alleged to be caused
directly or indirectly by this book. Packt Publishing has endeavored to provide trademark
information about all of the companies and products mentioned in this book by the
appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of
this information.
First published: September 2015
Second Edition: July 2017
Production reference: 1190717
1VCMJTIFECZ1BDLU1VCMJTIJOH-UE
-JWFSZ1MBDF
-JWFSZ4USFFU
#JSNJOHIBN
#1#6,
ISBN 978-1-78646-274-9
XXXQBDLUQVCDPN
Credits
Author
Romeo Kienzler
Copy Editor
Tasneem Fatehi
Reviewer
Md. Rezaul Karim
Project Coordinator
Manthan Patel
Commissioning Editor
Amey Varangaonkar
Proofreader
Safis Editing
Acquisition Editor
Malaika Monteiro
Indexer
Tejal Daruwale Soni
Content Development Editor
Tejas Limkar
Graphics
Tania Dutta
Technical Editor
Dinesh Chaudhary
Production Coordinator
Deepika Naik
About the Author
Romeo Kienzler works as the chief data scientist in the IBM Watson IoT worldwide team,
helping clients to apply advanced machine learning at scale on their IoT sensor data. He
holds a Master's degree in computer science from the Swiss Federal Institute of Technology,
Zurich, with a specialization in information systems, bioinformatics, and applied statistics.
His current research focus is on scalable machine learning on Apache Spark. He is a
contributor to various open source projects and works as an associate professor for artificial
intelligence at Swiss University of Applied Sciences, Berne. He is a member of the IBM
Technical Expert Council and the IBM Academy of Technology, IBM's leading brains trust.
Writing a book is quite time-consuming. I want to thank my family for their
understanding and my employer, IBM, for giving me the time and flexibility to finish this
work. Finally, I want to thank the entire team at Packt Publishing, and especially, Tejas
Limkar, my editor, for all their support, patience, and constructive feedback.