# sparkctl
`sparkctl` is a command-line tool of the Spark Operator for creating, listing, checking status of, getting logs of, and deleting `SparkApplication`s. It can also do port forwarding from a local port to the Spark web UI port for accessing the Spark web UI on the driver. Each function is implemented as a sub-command of `sparkctl`.
To build `sparkctl`, make sure you followed build steps [here](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/developer-guide.md#build-the-operator) and have all the dependencies, then run the following command from within `sparkctl/`:
```bash
$ go build -o sparkctl
```
## Flags
The following global flags are available for all the sub commands:
* `--namespace`: the Kubernetes namespace of the `SparkApplication`(s). Defaults to `default`.
* `--kubeconfig`: the path to the file storing configuration for accessing the Kubernetes API server. Defaults to
`$HOME/.kube/config`
## Available Commands
### Create
`create` is a sub command of `sparkctl` for creating a `SparkApplication` object. There are two ways to create a `SparkApplication` object. One is parsing and creating a `SparkApplication` object in namespace specified by `--namespace` the from a given YAML file. In this way, `create` parses the YAML file, and sends the parsed `SparkApplication` object parsed to the Kubernetes API server. Usage of this way looks like the following:
Usage:
```bash
$ sparkctl create <path to YAML file>
```
The other way is creating a `SparkApplication` object from a named `ScheduledSparkApplication` to manually force a run of the `ScheduledSparkApplication`. Usage of this way looks like the following:
Usage:
```bash
$ sparkctl create <name of the SparkApplication> --from <name of the ScheduledSparkApplication>
```
The `create` command also supports shipping local Hadoop configuration files into the driver and executor pods. Specifically, it detects local Hadoop configuration files located at the path specified by the
environment variable `HADOOP_CONF_DIR`, create a Kubernetes `ConfigMap` from the files, and adds the `ConfigMap` to the `SparkApplication` object so it gets mounted into the driver and executor pods by the operator. The environment variable `HADOOP_CONF_DIR` is also set in the driver and executor containers.
#### Staging local dependencies
The `create` command also supports staging local application dependencies, though currently only uploading to a Google Cloud Storage (GCS) bucket is supported. The way it works is as follows. It checks if there is any local dependencies in `spec.mainApplicationFile`, `spec.deps.jars`, `spec.deps.files`, etc. in the parsed `SparkApplication` object. If so, it tries to upload the local dependencies to the remote location specified by `--upload-to`. The command fails if local dependencies are used but `--upload-to` is not specified. By default, a local file that already exists remotely, i.e., there exists a file with the same name and upload path remotely, will be ignored. If the remote file should be overridden instead, the `--override` flag should be specified.
##### Uploading to GCS
For uploading to GCS, the value should be in the form of `gs://<bucket>`. The bucket must exist and uploading fails if otherwise. The local dependencies will be uploaded to the path
`spark-app-dependencies/<SparkApplication namespace>/<SparkApplication name>` in the given bucket. It replaces the file path of each local dependency with the URI of the remote copy in the parsed `SparkApplication` object if uploading is successful.
Note that uploading to GCS requires a GCP service account with the necessary IAM permission to use the GCP project specified by service account JSON key file (`serviceusage.services.use`) and the permission to create GCS objects (`storage.object.create`).
The service account JSON key file must be locally available and be pointed to by the environment variable
`GOOGLE_APPLICATION_CREDENTIALS`. For more information on IAM authentication, please check
[Getting Started with Authentication](https://cloud.google.com/docs/authentication/getting-started).
Usage:
```bash
$ export GOOGLE_APPLICATION_CREDENTIALS="[PATH]/[FILE_NAME].json"
$ sparkctl create <path to YAML file> --upload-to gs://<bucket>
```
By default, the uploaded dependencies are not made publicly accessible and are referenced using URIs in the form of `gs://bucket/path/to/file`. Such dependencies are referenced through URIs of the form `gs://bucket/path/to/file`. To download the dependencies from GCS, a custom-built Spark init-container with the [GCS connector](https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage) installed and necessary Hadoop configuration properties specified is needed. An example Docker file of such an init-container can be found [here](https://gist.github.com/liyinan926/f9e81f7b54d94c05171a663345eb58bf).
If you want to make uploaded dependencies publicly available so they can be downloaded by the built-in init-container, simply add `--public` to the `create` command, as the following example shows:
```bash
$ sparkctl create <path to YAML file> --upload-to gs://<bucket> --public
```
Publicly available files are referenced through URIs of the form `https://storage.googleapis.com/bucket/path/to/file`.
##### Uploading to S3
For uploading to S3, the value should be in the form of `s3://<bucket>`. The bucket must exist and uploading fails if otherwise. The local dependencies will be uploaded to the path
`spark-app-dependencies/<SparkApplication namespace>/<SparkApplication name>` in the given bucket. It replaces the file path of each local dependency with the URI of the remote copy in the parsed `SparkApplication` object if uploading is successful.
Note that uploading to S3 with [AWS SDK](https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html) requires credentials to be specified. For GCP, the S3 Interoperability credentials can be retrieved as described [here](https://cloud.google.com/storage/docs/migrating#keys).
SDK uses the default credential provider chain to find AWS credentials.
The SDK uses the first provider in the chain that returns credentials without an error.
The default provider chain looks for credentials in the following order:
- Environment variables
```
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
```
- Shared credentials file (.aws/credentials)
For more information about AWS SDK authentication, please check [Specifying Credentials](https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html#specifying-credentials).
Usage:
```bash
$ export AWS_ACCESS_KEY_ID=[KEY]
$ export AWS_SECRET_ACCESS_KEY=[SECRET]
$ sparkctl create <path to YAML file> --upload-to s3://<bucket>
```
By default, the uploaded dependencies are not made publicly accessible and are referenced using URIs in the form of `s3a://bucket/path/to/file`. To download the dependencies from S3, a custom-built Spark Docker image with the required jars for `S3A Connector` (`hadoop-aws-2.7.6.jar`, `aws-java-sdk-1.7.6.jar` for Spark build with Hadoop2.7 profile, or `hadoop-aws-3.1.0.jar`, `aws-java-sdk-bundle-1.11.271.jar` for Hadoop3.1) need to be available in the classpath, and `spark-default.conf` with the AWS keys and the S3A FileSystemClass needs to be set (you can also use `spec.hadoopConf` in the SparkApplication YAML):
```
spark.hadoop.fs.s3a.endpoint https://storage.googleapis.com
spark.hadoop.fs.s3a.access.key [KEY]
spark.hadoop.fs.s3a.secret.key [SECRET]
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
```
NOTE: In Spark 2.3 init-containers are used for downloading remote application dependencies. In future versions, init-containers are removed.
It is recommended to use Apache Spark 2.4 for staging local dependencies with `s3`, which currently requires building a custom Docker image from the Spark master branch. Additionally, since Spark 2.4.0
there are two available bui
没有合适的资源?快使用搜索试试~ 我知道了~
spark-operator
共217个文件
go:135个
yaml:37个
md:14个
需积分: 0 0 下载量 40 浏览量
2024-06-18
19:13:39
上传
评论
收藏 484KB ZIP 举报
温馨提示
spark-operator
资源推荐
资源详情
资源评论
收起资源包目录
spark-operator (217个子文件)
binary.dat 6B
Dockerfile 2KB
Dockerfile 1KB
Dockerfile 813B
.dockerignore 7B
.gitignore 149B
patch_test.go 54KB
controller_test.go 50KB
controller.go 41KB
types.go 34KB
zz_generated.deepcopy.go 28KB
types.go 25KB
patch.go 24KB
controller_test.go 23KB
zz_generated.deepcopy.go 22KB
webhook.go 20KB
submission.go 19KB
sparkui_test.go 19KB
submission_test.go 17KB
constants.go 17KB
create.go 14KB
controller.go 13KB
sparkapp_metrics.go 13KB
main.go 13KB
sparkui.go 11KB
monitoring_config_test.go 10KB
volcano_scheduler.go 10KB
scheduledsparkapplication.go 8KB
util.go 8KB
sparkapplication.go 7KB
scheduledsparkapplication.go 7KB
metrics.go 7KB
framework.go 7KB
sparkapp_util.go 7KB
spark_pod_eventhandler_test.go 7KB
sparkapplication.go 7KB
watcher.go 6KB
factory.go 6KB
fake_scheduledsparkapplication.go 6KB
webhook_test.go 6KB
fake_sparkapplication.go 6KB
fake_scheduledsparkapplication.go 6KB
defaults_test.go 6KB
monitoring_config.go 6KB
event.go 5KB
fake_sparkapplication.go 5KB
forward.go 5KB
create_test.go 5KB
handlers.go 4KB
volcano_scheduler_test.go 4KB
scheduledsparkapplication.go 4KB
scheduledsparkapplication.go 4KB
scheduledsparkapplication.go 4KB
scheduledsparkapplication.go 4KB
enforcer.go 4KB
sparkapplication.go 4KB
sparkapplication.go 4KB
clientset.go 4KB
sparkapplication.go 4KB
sparkapplication.go 4KB
helpers.go 4KB
basic_test.go 4KB
secret_test.go 4KB
log.go 4KB
clientset_generated.go 3KB
service.go 3KB
lifecycle_test.go 3KB
spark_pod_eventhandler.go 3KB
sparkapp_metrics_test.go 3KB
sparkoperator.k8s.io_client.go 3KB
sparkoperator.k8s.io_client.go 3KB
generic.go 3KB
volume_mount_test.go 3KB
status.go 3KB
cluster_role_binding.go 3KB
main_test.go 3KB
role_binding.go 3KB
secret.go 3KB
sparkapplication.go 3KB
job.go 3KB
certs.go 3KB
cluster_role.go 2KB
deployment.go 2KB
defaults.go 2KB
service_account.go 2KB
defaults.go 2KB
role.go 2KB
config.go 2KB
scheduler_manager.go 2KB
client.go 2KB
register.go 2KB
interface.go 2KB
interface.go 2KB
interface.go 2KB
register.go 2KB
config_test.go 2KB
capabilities.go 2KB
gcs.go 2KB
doc.go 2KB
util.go 2KB
共 217 条
- 1
- 2
- 3
资源评论
半醉看夕阳
- 粉丝: 82
- 资源: 14
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功