2. A distributed file system
3. Two higher-order functions
3. Optimizations in a reduce operation
1. Associativity : a necessary condition.
2. Shuffling
3. Map-side combiner
4. To Learn more about MapReduce
1. The Spark ecosystem, approach and polyglot APIs
2. Multiple frameworks, and a framework scheduler
3. A Data Processing engine
4. A polyglot API
5. A MapReduce extension
6. A SQL interface, expanding into a DataFrame interface.
7. A Real Time processing engine
8. In-memory computing, with impact on processing speed and latency
9. MapReduce and memory legacy
10. Spark’s Memory Usage
11. A customizable cache
12. Operation Latency
5. How Spark Streaming fits in the Big Picture
1. Micro-batching
2. A strong Streaming characteristic
3. A minimal delay
4. Throughput-oriented tasks
6. Why you would want to use Spark Streaming
1. Building a pipeline
2. Productive deployment of pipelines
3. Productive implementation of data analysis
7. To learn more about Spark
8. Conclusion
9. Bibliography
2. 2. Core Spark Streaming concepts
1. Apache Spark RDDs
1. Resilient Distributed Datasets
2. Transformations and Actions
3. The Shuffle
4. Partitions
5. Debugging RDDs
6. Witnessing caching
2. Spark Streaming Clusters
1. The Standalone Spark cluster
2. Yet Another Resource Negotiator (YARN)
3. Apache Mesos
4. Spark Streaming : a delicate deployment
3. To learn more about runinng Spark on a cluster
4. Fundamentals of a DStream
1. A Bulk-synchronous model
2. The Spark Streaming Context
1. 1. Introducing Spark Streaming
1. Large-scale data analytics and Apache Spark
2. More than MapReduce : how the model came about and how Spark extends it.
1. A Fault-tolerant MapReduce cluster
评论1
最新资源