
MANNING
Michael S. Malak
Robin East

Spark GraphX
in Action
MICHAEL S. MALAK
ROBIN EAST
MANNING
SHELTER ISLAND

For online information and ordering of this and other Manning books, please visit
www.manning.com. The publisher offers discounts on this book when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: orders@manning.com
©2016 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by means electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in the book, and Manning
Publications was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have
the books we publish printed on acid-free paper, and we exert our best efforts to that end.
Recognizing also our responsibility to conserve the resources of our planet, Manning books
are printed on paper that is at least 15 percent recycled and processed without the use of
elemental chlorine.
Manning Publications Co. Development editor: Marina Michaels
20 Baldwin Road Technical development editors: Michael Roberts
PO Box 761 and John Guthrie
Shelter Island, NY 11964 Copyeditor: Corbin Collins
Proofreader: Melody Dolab
Technical proofreader: Antonio Magnaghi
Typesetter: Dottie Marsico
Cover designer: Marija Tudor
ISBN 9781617292521
Printed in the United States of America
12345678910–EBM–212019181716

iii
brief contents
PART 1SPARK AND GRAPHS ......................................................... 1
1
■
Two important technologies: Spark and graphs 3
2
■
GraphX quick start 24
3
■
Some fundamentals 32
PART 2CONNECTING VERTICES .................................................. 59
4
■
GraphX Basics 61
5
■
Built-in algorithms 90
6
■
Other useful graph algorithms 110
7
■
Machine learning 125
PART 3OVER THE ARC ............................................................. 165
8
■
The missing algorithms 167
9
■
Performance and monitoring 187
10
■
Other languages and tools 216
