没有合适的资源?快使用搜索试试~ 我知道了~
藏经阁-Dockerizing Spark Workloads.pdf
需积分: 5 0 下载量 71 浏览量
2023-08-26
11:35:12
上传
评论
收藏 4.36MB PDF 举报
温馨提示
试读
34页
藏经阁-Dockerizing Spark Workloads.pdf
资源推荐
资源详情
资源评论
Lessons Learned From
Dockerizing Spark Workloads
Thomas Phelan Nanda Vijaydev
Chief Architect, BlueData Data Scientist, BlueData
@tapbluedata @nandavijaydev
February 8, 2017
Outline
• Docker Containers and Big Data
• Spark on Docker: Challenges
• How We Did It: Lessons Learned
• Key Takeaways
• Q & A
Distributed Spark Environments
• Data scientists want flexibility:
– New tools, latest versions of Spark, Kafka, H2O, et.al.
– Multiple options – e.g. Zeppelin, RStudio, JupyterHub
– Fast, iterative prototyping
• IT wants control:
– Multi-tenancy
– Data security
– Network isolation
Why “Dockerize”?
Infrastructure
•
Agility and elasticity
•
Standardized environments
(dev, test, prod)
•
Portability (on-premises and
public cloud)
•
Efficient (higher resource
utilization)
Applications
•
Fool-proof packaging (configs,
libraries, driver versions, etc.)
•
Repeatable builds and
orchestration
•
Faster app dev cycles
•
Lightweight (virtually no
performance or startup penalty)
The Journey to Spark on Docker
Start with a clear
goal in sight
Begin with your Docker toolbox
of a single container and basic
networking and storage
So you want to run Spark on Docker in a
multi-tenant enterprise deployment?
Warning: there are some pitfalls & challenges
剩余33页未读,继续阅读
资源评论
weixin_40191861_zj
- 粉丝: 62
- 资源: 1万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功