DEVOPS ADVANCED CLASS
June 2015: Spark Summit West 2015
http://training.databricks.com/devops.pdf
www.linkedin.com/in/blueplastic
making big data simple
Databricks Cloud:
“A unified platform for building Big Data pipelines
– from ETL to Exploration and Dashboards, to
Advanced Analytics and Data Products.”
• Founded in late 2013
• by the creators of Apache Spark
• Original team from UC Berkeley AMPLab
• Raised $47 Million in 2 rounds
• ~55 employees
• We’re hiring!
• Level 2/3 support partnerships with
• Hortonworks
• MapR
• DataStax
(http://databricks.workable.com)
The Databricks team contributed more than 75% of the code added to Spark in the past year
AGENDA
• History of Spark
• RDD fundamentals
• Spark Runtime Architecture
Integration with Resource Managers
(Standalone, YARN)
• GUIs
• Lab: DevOps 101
Before Lunch
• Memory and Persistence
• Jobs -> Stages -> Tasks
• Broadcast Variables and
Accumulators
• PySpark
• DevOps 102
• Shuffle
• Spark Streaming
After Lunch
Some slides will be skipped
Please keep Q&A low during class
(5pm – 5:30pm for Q&A with instructor)
2 anonymous surveys: Pre and Post class
Lunch: noon – 1pm
2 breaks (before lunch and after lunch)
- 1
- 2
- 3
前往页