Ellen Friedman
& Kostas Tzoumas
Introduction
to Apache Flink
Stream Processing for
Real Time and Beyond
https://www.iteblog.com
Converged platform for streaming:
Quickly and easily build breakthrough
real-time applications.
Continuous data: Make data instantly
available for stream processing
Global IoT scale: Globally replicate millions
of messages/sec
To learn more, take the free training course:
mapr.com/learn-streaming
Turbocharge
Your Streaming
Applications
https://www.iteblog.com
Ellen Friedman and Kostas Tzoumas
Introduction to
Apache Flink
Stream Processing for
Real Time and Beyond
Boston Farnham Sebastopol TokyoBeijing Boston Farnham Sebastopol TokyoBeijing
https://www.iteblog.com
978-1-491-97393-6
[LSI]
Introduction to Apache Flink
by Ellen Friedman and Kostas Tzoumas
Copyright © 2016 Ellen Friedman and Kostas Tzoumas. All rights reserved.
All images copyright Ellen Friedman unless otherwise noted. Figure 1-3 courtesy
Michael Vasilyev / Alamy Stock Photo.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
O’Reilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (http://safaribooksonline.com). For
more information, contact our corporate/institutional sales department:
800-998-9938 or corporate@oreilly.com.
Editor: Shannon Cutt
Production Editor: Holly Bauer Forsyth
Copyeditor: Holly Bauer Forsyth
Proofreader: Octal Publishing, Inc.
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
September 2016: First Edition
Revision History for the First Edition
2016-09-01: First Release
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Introduction to
Apache Flink, the cover image, and related trade dress are trademarks of O’Reilly
Media, Inc.
While the publisher and the authors have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and
the authors disclaim all responsibility for errors or omissions, including without
limitation responsibility for damages resulting from the use of or reliance on this
work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is sub‐
ject to open source licenses or the intellectual property rights of others, it is your
responsibility to ensure that your use thereof complies with such licenses and/or
rights.
https://www.iteblog.com
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1. Why Apache Flink?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Consequences of Not Doing Streaming Well 2
Goals for Processing Continuous Event Data 7
Evolution of Stream Processing Technologies 7
First Look at Apache Flink 11
Flink in Production 14
Where Flink Fits 17
2. Stream-First Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Traditional Architecture versus Streaming Architecture 20
Message Transport and Message Processing 21
The Transport Layer: Ideal Capabilities 22
Streaming Data for a Microservices Architecture 24
Beyond Real-Time Applications 28
Geo-Distributed Replication of Streams 30
3. What Flink Does. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Different Types of Correctness 35
Hierarchical Use Cases: Adopting Flink in Stages 40
4. Handling Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Counting with Batch and Lambda Architectures 41
Counting with Streaming Architecture 44
Notions of Time 47
Windows 49
iii
https://www.iteblog.com