# Apache Spark
Spark is a unified analytics engine for large-scale data processing. It provides
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
supports general computation graphs for data analysis. It also supports a
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
MLlib for machine learning, GraphX for graph processing,
and Structured Streaming for stream processing.
<https://spark.apache.org/>
[![Jenkins Build](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/badge/icon)](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3)
[![AppVeyor Build](https://img.shields.io/appveyor/ci/ApacheSoftwareFoundation/spark/master.svg?style=plastic&logo=appveyor)](https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark)
[![PySpark Coverage](https://img.shields.io/badge/dynamic/xml.svg?label=pyspark%20coverage&url=https%3A%2F%2Fspark-test.github.io%2Fpyspark-coverage-site&query=%2Fhtml%2Fbody%2Fdiv%5B1%5D%2Fdiv%2Fh1%2Fspan&colorB=brightgreen&style=plastic)](https://spark-test.github.io/pyspark-coverage-site)
## Online Documentation
You can find the latest Spark documentation, including a programming
guide, on the [project web page](https://spark.apache.org/documentation.html).
This README file only contains basic setup instructions.
## Building Spark
Spark is built using [Apache Maven](https://maven.apache.org/).
To build Spark and its example programs, run:
./build/mvn -DskipTests clean package
(You do not need to do this if you downloaded a pre-built package.)
More detailed documentation is available from the project site, at
["Building Spark"](https://spark.apache.org/docs/latest/building-spark.html).
For general development tips, including info on developing Spark using an IDE, see ["Useful Developer Tools"](https://spark.apache.org/developer-tools.html).
## Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
./bin/spark-shell
Try the following command, which should return 1,000,000,000:
scala> spark.range(1000 * 1000 * 1000).count()
## Interactive Python Shell
Alternatively, if you prefer Python, you can use the Python shell:
./bin/pyspark
And run the following command, which should also return 1,000,000,000:
>>> spark.range(1000 * 1000 * 1000).count()
## Example Programs
Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> [params]`. For example:
./bin/run-example SparkPi
will run the Pi example locally.
You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
"yarn" to run on YARN, and "local" to run
locally with one thread, or "local[N]" to run locally with N threads. You
can also use an abbreviated class name if the class is in the `examples`
package. For instance:
MASTER=spark://host:7077 ./bin/run-example SparkPi
Many of the example programs print usage help if no params are given.
## Running Tests
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
can be run using:
./dev/run-tests
Please see the guidance on how to
[run tests for a module, or individual tests](https://spark.apache.org/developer-tools.html#individual-tests).
There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md
## A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
Please refer to the build documentation at
["Specifying the Hadoop Version and Enabling YARN"](https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn)
for detailed guidance on building for a particular distribution of Hadoop, including
building for particular Hive and Hive Thriftserver distributions.
## Configuration
Please refer to the [Configuration Guide](https://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
## Contributing
Please review the [Contribution to Spark guide](https://spark.apache.org/contributing.html)
for information on how to get started contributing to the project.
没有合适的资源?快使用搜索试试~ 我知道了~
spark-3.0.0-bin-hadoop3.2
共1194个文件
py:324个
jar:263个
scala:195个
需积分: 0 30 下载量 163 浏览量
2023-02-17
17:21:56
上传
评论 1
收藏 215.18MB ZIP 举报
温馨提示
spark-3.0.0-bin-hadoop3.2下载安装包
资源推荐
资源详情
资源评论
收起资源包目录
spark-3.0.0-bin-hadoop3.2 (1194个子文件)
_common_metadata 210B
_metadata 743B
_SUCCESS 0B
_SUCCESS 0B
AnIndex 37KB
users.avro 334B
full_user.avsc 240B
user.avsc 185B
make2.bat 7KB
make.bat 199B
beeline 1KB
setup.cfg 854B
spark-class2.cmd 3KB
find-spark-home.cmd 3KB
load-spark-env.cmd 2KB
spark-shell2.cmd 2KB
pyspark2.cmd 2KB
run-example.cmd 1KB
spark-class.cmd 1KB
spark-submit.cmd 1KB
spark-shell.cmd 1KB
spark-sql.cmd 1KB
pyspark.cmd 1KB
sparkR.cmd 1KB
spark-submit2.cmd 1KB
spark-sql2.cmd 1KB
sparkR2.cmd 1KB
beeline.cmd 1KB
spark-defaults.conf 1KB
.coveragerc 872B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00007.gz.parquet.crc 12B
.part-r-00002.gz.parquet.crc 12B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00004.gz.parquet.crc 12B
.part-r-00008.gz.parquet.crc 12B
.part-r-00005.gz.parquet.crc 12B
pyspark.css 2KB
R.css 1KB
ages_newlines.csv 87B
people.csv 49B
ages.csv 26B
lpsa.data 10KB
test.data 128B
DESCRIPTION 2KB
Dockerfile 2KB
Dockerfile 2KB
Dockerfile 1KB
find-spark-home 2KB
.gitignore 49B
00Index.html 120KB
LICENSE-javassist.html 25KB
layout.html 207B
MANIFEST.in 1KB
INDEX 16KB
breeze_2.12-1.0.jar 13.19MB
kubernetes-model-4.9.2.jar 11.36MB
hive-exec-2.3.7-core.jar 10.34MB
scala-compiler-2.12.10.jar 10.18MB
spark-3.0.0-yarn-shuffle.jar 9.93MB
spark-core_2.12-3.0.0.jar 9.7MB
spark-catalyst_2.12-3.0.0.jar 8.95MB
hive-metastore-2.3.7.jar 7.81MB
mesos-1.4.0-shaded-protobuf.jar 7MB
spire_2.12-0.17.0-M1.jar 6.86MB
spark-sql_2.12-3.0.0.jar 6.79MB
spark-mllib_2.12-3.0.0.jar 5.61MB
scala-library-2.12.10.jar 5.03MB
hadoop-hdfs-client-3.2.0.jar 4.79MB
zstd-jni-1.4.4-3.jar 4.02MB
netty-all-4.1.47.Final.jar 3.96MB
hadoop-common-3.2.0.jar 3.9MB
scala-reflect-2.12.10.jar 3.51MB
hadoop-yarn-api-3.2.0.jar 3.13MB
shapeless_2.12-2.3.3.jar 3.09MB
cats-kernel_2.12-2.0.0-M4.jar 3.08MB
derby-10.12.1.1.jar 3.08MB
hadoop-yarn-common-3.2.0.jar 2.76MB
curator-client-2.13.0.jar 2.31MB
spark-network-common_2.12-3.0.0.jar 2.29MB
guava-14.0.1.jar 2.09MB
spark-hive-thriftserver_2.12-3.0.0.jar 1.98MB
commons-math3-3.4.1.jar 1.94MB
datanucleus-core-4.1.17.jar 1.92MB
snappy-java-1.1.7.5.jar 1.84MB
datanucleus-rdbms-4.1.19.jar 1.82MB
ehcache-3.3.1.jar 1.65MB
hadoop-mapreduce-client-core-3.2.0.jar 1.58MB
avro-1.8.2.jar 1.48MB
spark-examples_2.12-3.0.0.jar 1.44MB
htrace-core4-4.1.0-incubating.jar 1.43MB
arrow-vector-0.15.1.jar 1.37MB
jackson-databind-2.10.0.jar 1.34MB
hadoop-yarn-server-common-3.2.0.jar 1.33MB
ivy-2.4.0.jar 1.22MB
arpack_combined_all-0.1.jar 1.14MB
JTransforms-3.1.jar 1.12MB
algebra_2.12-2.0.0-M2.jar 1.11MB
jersey-common-2.30.jar 1.11MB
spark-streaming_2.12-3.0.0.jar 1.09MB
共 1194 条
- 1
- 2
- 3
- 4
- 5
- 6
- 12
资源评论
Lotus·
- 粉丝: 1
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 3122080306 邹子轩 实验报告二.docx
- 基于STM32 NUCLEO板设计彩色LED照明灯(纯cubeMX开发)(大赛作品,文档完整,可直接运行)
- 发那科工业机器人保养大全
- Sphere.h
- REMD固有时间尺度分解信号分量可视化(Matlab完整源码和数据)
- 嵌入式系统双单片机STC89C52+STC15W104多功能学习板电路图可扩展 适用于单片机初学者和教学
- 基于STM32蓝牙控制小车系统设计(硬件+源代码+论文)大赛作品
- XILINXFPGA源码基于Spartan3火龙刀系列FPGA开发板VGA测试例程
- Java聊天室的设计与实现【尚学堂·百战程序员】
- python中matplotlib教程
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功