# Apache Spark
Spark is a unified analytics engine for large-scale data processing. It provides
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
supports general computation graphs for data analysis. It also supports a
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
MLlib for machine learning, GraphX for graph processing,
and Structured Streaming for stream processing.
<https://spark.apache.org/>
[![Jenkins Build](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/badge/icon)](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3)
[![AppVeyor Build](https://img.shields.io/appveyor/ci/ApacheSoftwareFoundation/spark/master.svg?style=plastic&logo=appveyor)](https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark)
[![PySpark Coverage](https://img.shields.io/badge/dynamic/xml.svg?label=pyspark%20coverage&url=https%3A%2F%2Fspark-test.github.io%2Fpyspark-coverage-site&query=%2Fhtml%2Fbody%2Fdiv%5B1%5D%2Fdiv%2Fh1%2Fspan&colorB=brightgreen&style=plastic)](https://spark-test.github.io/pyspark-coverage-site)
## Online Documentation
You can find the latest Spark documentation, including a programming
guide, on the [project web page](https://spark.apache.org/documentation.html).
This README file only contains basic setup instructions.
## Building Spark
Spark is built using [Apache Maven](https://maven.apache.org/).
To build Spark and its example programs, run:
./build/mvn -DskipTests clean package
(You do not need to do this if you downloaded a pre-built package.)
More detailed documentation is available from the project site, at
["Building Spark"](https://spark.apache.org/docs/latest/building-spark.html).
For general development tips, including info on developing Spark using an IDE, see ["Useful Developer Tools"](https://spark.apache.org/developer-tools.html).
## Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
./bin/spark-shell
Try the following command, which should return 1,000,000,000:
scala> spark.range(1000 * 1000 * 1000).count()
## Interactive Python Shell
Alternatively, if you prefer Python, you can use the Python shell:
./bin/pyspark
And run the following command, which should also return 1,000,000,000:
>>> spark.range(1000 * 1000 * 1000).count()
## Example Programs
Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> [params]`. For example:
./bin/run-example SparkPi
will run the Pi example locally.
You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
"yarn" to run on YARN, and "local" to run
locally with one thread, or "local[N]" to run locally with N threads. You
can also use an abbreviated class name if the class is in the `examples`
package. For instance:
MASTER=spark://host:7077 ./bin/run-example SparkPi
Many of the example programs print usage help if no params are given.
## Running Tests
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
can be run using:
./dev/run-tests
Please see the guidance on how to
[run tests for a module, or individual tests](https://spark.apache.org/developer-tools.html#individual-tests).
There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md
## A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
Please refer to the build documentation at
["Specifying the Hadoop Version and Enabling YARN"](https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn)
for detailed guidance on building for a particular distribution of Hadoop, including
building for particular Hive and Hive Thriftserver distributions.
## Configuration
Please refer to the [Configuration Guide](https://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
## Contributing
Please review the [Contribution to Spark guide](https://spark.apache.org/contributing.html)
for information on how to get started contributing to the project.
没有合适的资源?快使用搜索试试~ 我知道了~
spark-3.0.0-bin-hadoop2.7-hive1.2.tgz
需积分: 50 7 下载量 132 浏览量
2020-08-05
09:10:29
上传
评论
收藏 209.68MB TGZ 举报
温馨提示
共1171个文件
py:324个
jar:240个
scala:195个
spark安装包,spark 官网下载的,46%的已解决的问题都是是针对Spark SQL的,包括结构化流和MLlib,以及高层API,包括SQL和DataFrames。在经过了大量优化后,Spark 3.0的性能比Spark 2.4快了大约2倍
资源推荐
资源详情
资源评论
收起资源包目录
spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (1171个子文件)
_common_metadata 210B
_metadata 743B
_SUCCESS 0B
_SUCCESS 0B
AnIndex 37KB
users.avro 334B
full_user.avsc 240B
user.avsc 185B
make2.bat 7KB
make.bat 199B
beeline 1KB
setup.cfg 854B
spark-class2.cmd 3KB
find-spark-home.cmd 3KB
load-spark-env.cmd 2KB
spark-shell2.cmd 2KB
pyspark2.cmd 2KB
run-example.cmd 1KB
spark-submit.cmd 1KB
spark-class.cmd 1KB
spark-shell.cmd 1KB
spark-sql.cmd 1KB
pyspark.cmd 1KB
sparkR.cmd 1KB
spark-submit2.cmd 1KB
spark-sql2.cmd 1KB
sparkR2.cmd 1KB
beeline.cmd 1KB
spark-defaults.conf 1KB
.coveragerc 872B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00008.gz.parquet.crc 12B
.part-r-00005.gz.parquet.crc 12B
.part-r-00002.gz.parquet.crc 12B
.part-r-00004.gz.parquet.crc 12B
.part-r-00007.gz.parquet.crc 12B
pyspark.css 2KB
R.css 1KB
ages_newlines.csv 87B
people.csv 49B
ages.csv 26B
lpsa.data 10KB
test.data 128B
DESCRIPTION 2KB
Dockerfile 2KB
Dockerfile 2KB
Dockerfile 1KB
find-spark-home 2KB
.gitignore 49B
00Index.html 120KB
LICENSE-javassist.html 25KB
layout.html 207B
MANIFEST.in 1KB
INDEX 16KB
breeze_2.12-1.0.jar 13.19MB
kubernetes-model-4.9.2.jar 11.36MB
hive-exec-1.2.1.spark2.jar 10.97MB
scala-compiler-2.12.10.jar 10.18MB
spark-3.0.0-yarn-shuffle.jar 9.93MB
spark-core_2.12-3.0.0.jar 9.7MB
spark-catalyst_2.12-3.0.0.jar 8.95MB
hadoop-hdfs-2.7.4.jar 7.96MB
mesos-1.4.0-shaded-protobuf.jar 7MB
spire_2.12-0.17.0-M1.jar 6.86MB
spark-sql_2.12-3.0.0.jar 6.79MB
spark-mllib_2.12-3.0.0.jar 5.61MB
hive-metastore-1.2.1.spark2.jar 5.25MB
scala-library-2.12.10.jar 5.03MB
zstd-jni-1.4.4-3.jar 4.02MB
netty-all-4.1.47.Final.jar 3.96MB
scala-reflect-2.12.10.jar 3.51MB
hadoop-common-2.7.4.jar 3.34MB
shapeless_2.12-2.3.3.jar 3.09MB
cats-kernel_2.12-2.0.0-M4.jar 3.08MB
derby-10.12.1.1.jar 3.08MB
parquet-hadoop-bundle-1.6.0.jar 2.67MB
spark-network-common_2.12-3.0.0.jar 2.29MB
guava-14.0.1.jar 2.09MB
hadoop-yarn-api-2.7.4.jar 1.94MB
commons-math3-3.4.1.jar 1.94MB
snappy-java-1.1.7.5.jar 1.84MB
datanucleus-core-3.2.10.jar 1.8MB
spark-hive-thriftserver_2.12-3.0.0.jar 1.8MB
datanucleus-rdbms-3.2.9.jar 1.73MB
hadoop-yarn-common-2.7.4.jar 1.6MB
orc-core-1.5.10-nohive.jar 1.51MB
hadoop-mapreduce-client-core-2.7.4.jar 1.49MB
avro-1.8.2.jar 1.48MB
spark-examples_2.12-3.0.0.jar 1.44MB
htrace-core-3.1.0-incubating.jar 1.41MB
arrow-vector-0.15.1.jar 1.37MB
jackson-databind-2.10.0.jar 1.34MB
xercesImpl-2.12.0.jar 1.32MB
ivy-2.4.0.jar 1.22MB
arpack_combined_all-0.1.jar 1.14MB
JTransforms-3.1.jar 1.12MB
algebra_2.12-2.0.0-M2.jar 1.11MB
jersey-common-2.30.jar 1.11MB
spark-streaming_2.12-3.0.0.jar 1.09MB
共 1171 条
- 1
- 2
- 3
- 4
- 5
- 6
- 12
资源评论
u010991835
- 粉丝: 3
- 资源: 17
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功