Apache Spark on Docker
==========
[![DockerPulls](https://img.shields.io/docker/pulls/sequenceiq/spark.svg)](https://registry.hub.docker.com/u/sequenceiq/spark/)
[![DockerStars](https://img.shields.io/docker/stars/sequenceiq/spark.svg)](https://registry.hub.docker.com/u/sequenceiq/spark/)
This repository contains a Docker file to build a Docker image with Apache Spark. This Docker image depends on our previous [Hadoop Docker](https://github.com/sequenceiq/hadoop-docker) image, available at the SequenceIQ [GitHub](https://github.com/sequenceiq) page.
The base Hadoop Docker image is also available as an official [Docker image](https://registry.hub.docker.com/u/sequenceiq/hadoop-docker/).
##Pull the image from Docker Repository
```
docker pull sequenceiq/spark:1.6.0
```
## Building the image
```
docker build --rm -t sequenceiq/spark:1.6.0 .
```
## Running the image
* if using boot2docker make sure your VM has more than 2GB memory
* in your /etc/hosts file add $(boot2docker ip) as host 'sandbox' to make it easier to access your sandbox UI
* open yarn UI ports when running container
```
docker run -it -p 8088:8088 -p 8042:8042 -p 4040:4040 -h sandbox sequenceiq/spark:1.6.0 bash
```
or
```
docker run -d -h sandbox sequenceiq/spark:1.6.0 -d
```
## Versions
```
Hadoop 2.6.0 and Apache Spark v1.6.0 on Centos
```
## Testing
There are two deploy modes that can be used to launch Spark applications on YARN.
### YARN-client mode
In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
```
# run the spark shell
spark-shell \
--master yarn-client \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1
# execute the the following command which should return 1000
scala> sc.parallelize(1 to 1000).count()
```
### YARN-cluster mode
In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.
Estimating Pi (yarn-cluster mode):
```
# execute the the following command which should write the "Pi is roughly 3.1418" into the logs
# note you must specify --files argument in cluster mode to enable metrics
spark-submit \
--class org.apache.spark.examples.SparkPi \
--files $SPARK_HOME/conf/metrics.properties \
--master yarn-cluster \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
$SPARK_HOME/lib/spark-examples-1.6.0-hadoop2.6.0.jar
```
Estimating Pi (yarn-client mode):
```
# execute the the following command which should print the "Pi is roughly 3.1418" to the screen
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
$SPARK_HOME/lib/spark-examples-1.6.0-hadoop2.6.0.jar
```
### Submitting from the outside of the container
To use Spark from outside of the container it is necessary to set the YARN_CONF_DIR environment variable to directory with a configuration appropriate for the docker. The repository contains such configuration in the yarn-remote-client directory.
```
export YARN_CONF_DIR="`pwd`/yarn-remote-client"
```
Docker's HDFS can be accessed only by root. When submitting Spark applications from outside of the cluster, and from a user different than root, it is necessary to configure the HADOOP_USER_NAME variable so that root user is used.
```
export HADOOP_USER_NAME=root
```
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
【hadoop&spark】说明:Hadoop、Spark、Python3容器 (Hadoop, Spark, Python3 containers) 文件列表: docker-compose.yml (560, 2019-07-19) docker-hadoop (0, 2019-07-19) docker-hadoop\Dockerfile (4971, 2019-07-19) docker-hadoop\LICENSE (71624, 2019-07-19) docker-hadoop\bootstrap.sh (735, 2019-07-19) docker-hadoop\core-site.xml (324, 2019-07-19) docker-hadoop\core-site.xml.template (154, 2019-07-19) docker-hadoop\hdfs-site.xml (467, 2019-07-19) docker-hadoop\mapred-site.xml (266, 2019-07-19) docker-hadoop\mapred-site
资源推荐
资源详情
资源评论
收起资源包目录
5198126.zip (44个子文件)
5198126
docker-hadoop-spark-master
docker-spark
健康养生秘笈.url 133B
主播培训.url 61B
LICENSE 70KB
武术资料获取.url 121B
yarn-remote-client
健康养生秘笈.url 133B
core-site.xml 325B
主播培训.url 61B
武术资料获取.url 121B
黑客技术.url 62B
美味小吃技术.url 127B
yarn-site.xml 1KB
撩妹套路(120G).url 195B
职业技能培训.url 61B
bootstrap.sh 901B
黑客技术.url 62B
Dockerfile 1KB
.gitignore 18B
美味小吃技术.url 127B
撩妹套路(120G).url 195B
README.md 3KB
职业技能培训.url 61B
docker-compose.yml 560B
.gitignore 7B
docker-hadoop
健康养生秘笈.url 133B
core-site.xml 324B
主播培训.url 61B
ssh_config 94B
LICENSE 70KB
武术资料获取.url 121B
bootstrap.sh 735B
mapred-site.xml.bak 138B
core-site.xml.template 154B
hdfs-site.xml 467B
黑客技术.url 62B
Dockerfile 5KB
mapred-site.xml 266B
yarn-site.xml.bak 251B
.gitignore 9B
美味小吃技术.url 127B
yarn-site.xml 256B
撩妹套路(120G).url 195B
README.md 3KB
职业技能培训.url 61B
README.md 365B
共 44 条
- 1
资源评论
hyzixue
- 粉丝: 41
- 资源: 165
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- HIVE-14706.01.patch
- C# WInForm IrisSkin2皮肤控件
- svn cleanup 失败怎么办
- Spring Boot集成Spring Security,HTTP请求授权配置:包含匿名访问、允许访问、禁止访问配置
- 易语言-画曲线模块及应用例程
- 电子元件行业知名厂商官网(TI/NXP/ST/Infineon/ADI/Microchip/Qualcomm/Diodes/Panasonic/TDK/TE/Vishay/Molex等)数据样例
- Cytoscape-3-10-0-windows-64bit.exe
- 基于STM32设计的宠物投喂器项目源代码(高分项目).zip
- 机器学习音频训练文件-24年抖音金曲
- 工业以太网无线通信解决方案
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功