# Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
supports general computation graphs for data analysis. It also supports a
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
MLlib for machine learning, GraphX for graph processing,
and Spark Streaming for stream processing.
<http://spark.apache.org/>
## Online Documentation
You can find the latest Spark documentation, including a programming
guide, on the [project web page](http://spark.apache.org/documentation.html).
This README file only contains basic setup instructions.
## Building Spark
Spark is built using [Apache Maven](http://maven.apache.org/).
To build Spark and its example programs, run:
build/mvn -DskipTests clean package
(You do not need to do this if you downloaded a pre-built package.)
You can build Spark using more than one thread by using the -T option with Maven, see ["Parallel builds in Maven 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3).
More detailed documentation is available from the project site, at
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
For general development tips, including info on developing Spark using an IDE, see ["Useful Developer Tools"](http://spark.apache.org/developer-tools.html).
## Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
./bin/spark-shell
Try the following command, which should return 1000:
scala> sc.parallelize(1 to 1000).count()
## Interactive Python Shell
Alternatively, if you prefer Python, you can use the Python shell:
./bin/pyspark
And run the following command, which should also return 1000:
>>> sc.parallelize(range(1000)).count()
## Example Programs
Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> [params]`. For example:
./bin/run-example SparkPi
will run the Pi example locally.
You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
"yarn" to run on YARN, and "local" to run
locally with one thread, or "local[N]" to run locally with N threads. You
can also use an abbreviated class name if the class is in the `examples`
package. For instance:
MASTER=spark://host:7077 ./bin/run-example SparkPi
Many of the example programs print usage help if no params are given.
## Running Tests
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
can be run using:
./dev/run-tests
Please see the guidance on how to
[run tests for a module, or individual tests](http://spark.apache.org/developer-tools.html#individual-tests).
There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md
## A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
Please refer to the build documentation at
["Specifying the Hadoop Version and Enabling YARN"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn)
for detailed guidance on building for a particular distribution of Hadoop, including
building for particular Hive and Hive Thriftserver distributions.
## Configuration
Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
## Contributing
Please review the [Contribution to Spark guide](http://spark.apache.org/contributing.html)
for information on how to get started contributing to the project.
没有合适的资源?快使用搜索试试~ 我知道了~
spark-2.4.0-bin-hadoop2.6.tgz
2星 需积分: 25 69 下载量 121 浏览量
2018-12-11
18:19:09
上传
评论
收藏 215.21MB TGZ 举报
温馨提示
共1053个文件
py:244个
jar:227个
scala:191个
spark-2.4.0-bin-hadoop2.6.tgz-----------------------------------------------linux spark安装
资源推荐
资源详情
资源评论
收起资源包目录
spark-2.4.0-bin-hadoop2.6.tgz (1053个子文件)
_common_metadata 210B
_metadata 743B
_SUCCESS 0B
_SUCCESS 0B
AnIndex 36KB
users.avro 334B
full_user.avsc 240B
user.avsc 185B
make2.bat 7KB
make.bat 199B
beeline 1KB
setup.cfg 854B
find-spark-home.cmd 3KB
spark-class2.cmd 2KB
load-spark-env.cmd 2KB
spark-shell2.cmd 2KB
pyspark2.cmd 2KB
run-example.cmd 1KB
spark-submit.cmd 1KB
spark-class.cmd 1KB
spark-shell.cmd 1KB
spark-sql.cmd 1KB
pyspark.cmd 1KB
sparkR.cmd 1KB
spark-submit2.cmd 1KB
spark-sql2.cmd 1KB
sparkR2.cmd 1KB
beeline.cmd 1KB
spark-defaults.conf 1KB
.coveragerc 872B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00004.gz.parquet.crc 12B
.part-r-00002.gz.parquet.crc 12B
.part-r-00005.gz.parquet.crc 12B
.part-r-00007.gz.parquet.crc 12B
.part-r-00008.gz.parquet.crc 12B
pyspark.css 2KB
R.css 1KB
ages_newlines.csv 87B
people.csv 49B
ages.csv 26B
lpsa.data 10KB
test.data 128B
DESCRIPTION 2KB
Dockerfile 2KB
Dockerfile 2KB
Dockerfile 992B
find-spark-home 2KB
.gitignore 49B
00Index.html 118KB
LICENSE-jtransforms.html 29KB
LICENSE-javassist.html 25KB
layout.html 207B
MANIFEST.in 1KB
INDEX 15KB
scala-compiler-2.11.12.jar 14.89MB
breeze_2.11-0.13.2.jar 14.41MB
spark-core_2.11-2.4.0.jar 12.84MB
hive-exec-1.2.1.spark2.jar 10.97MB
spark-catalyst_2.11-2.4.0.jar 9.71MB
spire_2.11-0.13.0.jar 9.65MB
spark-sql_2.11-2.4.0.jar 9.42MB
spark-2.4.0-yarn-shuffle.jar 9.27MB
spark-mllib_2.11-2.4.0.jar 7.65MB
hadoop-hdfs-2.6.5.jar 7.58MB
mesos-1.4.0-shaded-protobuf.jar 7MB
kubernetes-model-2.0.0.jar 6.69MB
scala-library-2.11.12.jar 5.48MB
hive-metastore-1.2.1.spark2.jar 5.25MB
scala-reflect-2.11.12.jar 4.41MB
netty-all-4.1.17.Final.jar 3.6MB
shapeless_2.11-2.3.2.jar 3.36MB
calcite-core-1.2.0-incubating.jar 3.36MB
hadoop-common-2.6.5.jar 3.2MB
derby-10.12.1.1.jar 3.08MB
parquet-hadoop-bundle-1.6.0.jar 2.67MB
spark-network-common_2.11-2.4.0.jar 2.28MB
zstd-jni-1.3.2-2.jar 2.23MB
guava-14.0.1.jar 2.09MB
spark-streaming_2.11-2.4.0.jar 2.07MB
commons-math3-3.4.1.jar 1.94MB
spark-examples_2.11-2.4.0.jar 1.93MB
snappy-java-1.1.7.1.jar 1.93MB
hadoop-yarn-api-2.6.5.jar 1.81MB
datanucleus-core-3.2.10.jar 1.8MB
spark-hive-thriftserver_2.11-2.4.0.jar 1.73MB
datanucleus-rdbms-3.2.9.jar 1.73MB
hppc-0.7.2.jar 1.59MB
hadoop-yarn-common-2.6.5.jar 1.55MB
orc-core-1.5.2-nohive.jar 1.5MB
avro-1.8.2.jar 1.48MB
hadoop-mapreduce-client-core-2.6.5.jar 1.47MB
spark-hive_2.11-2.4.0.jar 1.27MB
netty-3.9.9.Final.jar 1.27MB
arrow-vector-0.10.0.jar 1.26MB
ivy-2.4.0.jar 1.22MB
xercesImpl-2.9.1.jar 1.17MB
arpack_combined_all-0.1.jar 1.14MB
jackson-databind-2.6.7.1.jar 1.11MB
共 1053 条
- 1
- 2
- 3
- 4
- 5
- 6
- 11
资源评论
- 政zz2020-09-27能用,但开源收费还是不太可
- 剑海风云2019-05-22don't know
壑壑哒
- 粉丝: 206
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功