# Apache Spark
Spark is a unified analytics engine for large-scale data processing. It provides
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
supports general computation graphs for data analysis. It also supports a
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
MLlib for machine learning, GraphX for graph processing,
and Structured Streaming for stream processing.
<https://spark.apache.org/>
[![Jenkins Build](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/badge/icon)](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3)
[![AppVeyor Build](https://img.shields.io/appveyor/ci/ApacheSoftwareFoundation/spark/master.svg?style=plastic&logo=appveyor)](https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark)
[![PySpark Coverage](https://img.shields.io/badge/dynamic/xml.svg?label=pyspark%20coverage&url=https%3A%2F%2Fspark-test.github.io%2Fpyspark-coverage-site&query=%2Fhtml%2Fbody%2Fdiv%5B1%5D%2Fdiv%2Fh1%2Fspan&colorB=brightgreen&style=plastic)](https://spark-test.github.io/pyspark-coverage-site)
## Online Documentation
You can find the latest Spark documentation, including a programming
guide, on the [project web page](https://spark.apache.org/documentation.html).
This README file only contains basic setup instructions.
## Building Spark
Spark is built using [Apache Maven](https://maven.apache.org/).
To build Spark and its example programs, run:
./build/mvn -DskipTests clean package
(You do not need to do this if you downloaded a pre-built package.)
More detailed documentation is available from the project site, at
["Building Spark"](https://spark.apache.org/docs/latest/building-spark.html).
For general development tips, including info on developing Spark using an IDE, see ["Useful Developer Tools"](https://spark.apache.org/developer-tools.html).
## Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
./bin/spark-shell
Try the following command, which should return 1,000,000,000:
scala> spark.range(1000 * 1000 * 1000).count()
## Interactive Python Shell
Alternatively, if you prefer Python, you can use the Python shell:
./bin/pyspark
And run the following command, which should also return 1,000,000,000:
>>> spark.range(1000 * 1000 * 1000).count()
## Example Programs
Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> [params]`. For example:
./bin/run-example SparkPi
will run the Pi example locally.
You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
"yarn" to run on YARN, and "local" to run
locally with one thread, or "local[N]" to run locally with N threads. You
can also use an abbreviated class name if the class is in the `examples`
package. For instance:
MASTER=spark://host:7077 ./bin/run-example SparkPi
Many of the example programs print usage help if no params are given.
## Running Tests
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
can be run using:
./dev/run-tests
Please see the guidance on how to
[run tests for a module, or individual tests](https://spark.apache.org/developer-tools.html#individual-tests).
There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md
## A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
Please refer to the build documentation at
["Specifying the Hadoop Version and Enabling YARN"](https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn)
for detailed guidance on building for a particular distribution of Hadoop, including
building for particular Hive and Hive Thriftserver distributions.
## Configuration
Please refer to the [Configuration Guide](https://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
## Contributing
Please review the [Contribution to Spark guide](https://spark.apache.org/contributing.html)
for information on how to get started contributing to the project.
没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
收起资源包目录
spark-3.0.3 (1206个子文件)
_common_metadata 210B
_metadata 743B
_SUCCESS 0B
_SUCCESS 0B
AnIndex 37KB
users.avro 334B
full_user.avsc 240B
user.avsc 185B
make2.bat 7KB
make.bat 199B
beeline 1KB
setup.cfg 854B
spark-class2.cmd 3KB
find-spark-home.cmd 3KB
load-spark-env.cmd 2KB
spark-shell2.cmd 2KB
pyspark2.cmd 2KB
run-example.cmd 1KB
spark-submit.cmd 1KB
spark-class.cmd 1KB
spark-shell.cmd 1KB
spark-sql.cmd 1KB
pyspark.cmd 1KB
sparkR.cmd 1KB
spark-submit2.cmd 1KB
spark-sql2.cmd 1KB
sparkR2.cmd 1KB
beeline.cmd 1KB
spark-defaults.conf 1KB
.coveragerc 872B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00004.gz.parquet.crc 12B
.part-r-00002.gz.parquet.crc 12B
.part-r-00005.gz.parquet.crc 12B
.part-r-00007.gz.parquet.crc 12B
.part-r-00008.gz.parquet.crc 12B
pyspark.css 2KB
R.css 1KB
ages_newlines.csv 87B
people.csv 49B
ages.csv 26B
lpsa.data 10KB
test.data 128B
DESCRIPTION 2KB
Dockerfile 2KB
Dockerfile 2KB
Dockerfile 2KB
find-spark-home 2KB
.gitignore 49B
sparkr-vignettes.html 152KB
00Index.html 120KB
LICENSE-javassist.html 25KB
index.html 1KB
layout.html 207B
MANIFEST.in 1KB
INDEX 16KB
breeze_2.12-1.0.jar 13.19MB
kubernetes-model-4.9.2.jar 11.36MB
hive-exec-2.3.7-core.jar 10.34MB
scala-compiler-2.12.10.jar 10.18MB
spark-3.0.3-yarn-shuffle.jar 9.96MB
spark-core_2.12-3.0.3.jar 9.47MB
spark-catalyst_2.12-3.0.3.jar 8.98MB
hive-metastore-2.3.7.jar 7.81MB
mesos-1.4.0-shaded-protobuf.jar 7MB
spire_2.12-0.17.0-M1.jar 6.86MB
spark-sql_2.12-3.0.3.jar 6.83MB
spark-mllib_2.12-3.0.3.jar 5.62MB
scala-library-2.12.10.jar 5.03MB
hadoop-hdfs-client-3.2.0.jar 4.79MB
zstd-jni-1.4.4-3.jar 4.02MB
netty-all-4.1.47.Final.jar 3.96MB
hadoop-common-3.2.0.jar 3.9MB
scala-reflect-2.12.10.jar 3.51MB
hadoop-yarn-api-3.2.0.jar 3.13MB
shapeless_2.12-2.3.3.jar 3.09MB
cats-kernel_2.12-2.0.0-M4.jar 3.08MB
derby-10.12.1.1.jar 3.08MB
hadoop-yarn-common-3.2.0.jar 2.76MB
curator-client-2.13.0.jar 2.31MB
spark-network-common_2.12-3.0.3.jar 2.29MB
guava-14.0.1.jar 2.09MB
spark-hive-thriftserver_2.12-3.0.3.jar 1.98MB
commons-math3-3.4.1.jar 1.94MB
datanucleus-core-4.1.17.jar 1.92MB
snappy-java-1.1.8.2.jar 1.88MB
datanucleus-rdbms-4.1.19.jar 1.82MB
ehcache-3.3.1.jar 1.65MB
hadoop-mapreduce-client-core-3.2.0.jar 1.58MB
avro-1.8.2.jar 1.48MB
spark-examples_2.12-3.0.3.jar 1.44MB
htrace-core4-4.1.0-incubating.jar 1.43MB
arrow-vector-0.15.1.jar 1.37MB
jackson-databind-2.10.0.jar 1.34MB
hadoop-yarn-server-common-3.2.0.jar 1.33MB
ivy-2.4.0.jar 1.22MB
arpack_combined_all-0.1.jar 1.14MB
JTransforms-3.1.jar 1.12MB
algebra_2.12-2.0.0-M2.jar 1.11MB
共 1206 条
- 1
- 2
- 3
- 4
- 5
- 6
- 13
Kang_Sec
- 粉丝: 7891
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于java开发的蓝牙防丢报警器,可设置报警距离+源码(毕业设计&课程设计&项目开发)
- 基于java开发的日志报警处理+源码+项目文档+使用说明(毕业设计&课程设计&项目开发)
- 工业数字化转型的关键技术及其应用场景解析
- 支付宝小程序开发指南:从入门到实践全面解析
- 基于java开发的通用报警框架,支持报警方式自定义,报警配置自定义+源码+项目文档+使用说明(毕业设计&课程设计&项目开发)
- ADS131E08中文数据手册
- chapter04.rar
- E036社会网络UNINET软件及操作教程.zip
- .archivetemp爱心飘零.zip
- 全面构建与维护云服务器ECS的安全防护体系:阿里云ECS安全实践与应用
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0