# Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
supports general computation graphs for data analysis. It also supports a
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
MLlib for machine learning, GraphX for graph processing,
and Spark Streaming for stream processing.
<http://spark.apache.org/>
## Online Documentation
You can find the latest Spark documentation, including a programming
guide, on the [project web page](http://spark.apache.org/documentation.html).
This README file only contains basic setup instructions.
## Building Spark
Spark is built using [Apache Maven](http://maven.apache.org/).
To build Spark and its example programs, run:
build/mvn -DskipTests clean package
(You do not need to do this if you downloaded a pre-built package.)
You can build Spark using more than one thread by using the -T option with Maven, see ["Parallel builds in Maven 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3).
More detailed documentation is available from the project site, at
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
For general development tips, including info on developing Spark using an IDE, see ["Useful Developer Tools"](http://spark.apache.org/developer-tools.html).
## Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
./bin/spark-shell
Try the following command, which should return 1000:
scala> sc.parallelize(1 to 1000).count()
## Interactive Python Shell
Alternatively, if you prefer Python, you can use the Python shell:
./bin/pyspark
And run the following command, which should also return 1000:
>>> sc.parallelize(range(1000)).count()
## Example Programs
Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> [params]`. For example:
./bin/run-example SparkPi
will run the Pi example locally.
You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
"yarn" to run on YARN, and "local" to run
locally with one thread, or "local[N]" to run locally with N threads. You
can also use an abbreviated class name if the class is in the `examples`
package. For instance:
MASTER=spark://host:7077 ./bin/run-example SparkPi
Many of the example programs print usage help if no params are given.
## Running Tests
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
can be run using:
./dev/run-tests
Please see the guidance on how to
[run tests for a module, or individual tests](http://spark.apache.org/developer-tools.html#individual-tests).
## A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
Please refer to the build documentation at
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
for detailed guidance on building for a particular distribution of Hadoop, including
building for particular Hive and Hive Thriftserver distributions.
## Configuration
Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html)
in the online documentation for an overview on how to configure Spark.
## Contributing
Please review the [Contribution to Spark guide](http://spark.apache.org/contributing.html)
for information on how to get started contributing to the project.
没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
收起资源包目录
spark-2.2.1-bin-2.6.0-cdh5.14.2.tar.gz (946个子文件)
_common_metadata 210B
_metadata 743B
_SUCCESS 0B
_SUCCESS 0B
users.avro 334B
full_user.avsc 240B
user.avsc 185B
make2.bat 7KB
make.bat 199B
beeline 1KB
setup.cfg 854B
find-spark-home.cmd 3KB
spark-class2.cmd 2KB
load-spark-env.cmd 2KB
spark-shell2.cmd 2KB
pyspark2.cmd 2KB
spark-submit2.cmd 1KB
run-example.cmd 1KB
sparkR2.cmd 1KB
spark-submit.cmd 1KB
spark-class.cmd 1KB
spark-shell.cmd 1KB
pyspark.cmd 1KB
sparkR.cmd 1023B
beeline.cmd 919B
.part-r-00005.gz.parquet.crc 12B
.part-r-00008.gz.parquet.crc 12B
.part-r-00004.gz.parquet.crc 12B
.part-r-00002.gz.parquet.crc 12B
.part-r-00007.gz.parquet.crc 12B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B
pyspark.css 2KB
ages_newlines.csv 87B
ages.csv 26B
lpsa.data 10KB
test.data 128B
find-spark-home 2KB
.gitignore 49B
layout.html 207B
MANIFEST.in 1KB
aws-java-sdk-bundle-1.11.134.jar 60.97MB
scala-compiler-2.11.8.jar 14.77MB
breeze_2.11-0.13.2.jar 14.41MB
spark-core_2.11-2.2.1.jar 11.92MB
hadoop-hdfs-2.6.0-cdh5.14.2.jar 11.17MB
hive-exec-1.2.1.spark2.jar 10.97MB
spire_2.11-0.13.0.jar 9.65MB
spark-catalyst_2.11-2.2.1.jar 7.93MB
spark-2.2.1-yarn-shuffle.jar 7.73MB
spark-sql_2.11-2.2.1.jar 6.82MB
spark-mllib_2.11-2.2.1.jar 6.2MB
scala-library-2.11.8.jar 5.48MB
hive-metastore-1.2.1.spark2.jar 5.25MB
scala-reflect-2.11.8.jar 4.36MB
hadoop-common-2.6.0-cdh5.14.2.jar 3.4MB
shapeless_2.11-2.3.2.jar 3.36MB
calcite-core-1.2.0-incubating.jar 3.36MB
derby-10.12.1.1.jar 3.08MB
bcprov-jdk15on-1.51.jar 2.71MB
parquet-hadoop-bundle-1.6.0.jar 2.67MB
spark-network-common_2.11-2.2.1.jar 2.27MB
netty-all-4.0.43.Final.jar 2.22MB
guava-14.0.1.jar 2.09MB
spark-streaming_2.11-2.2.1.jar 2.08MB
jets3t-0.9.3.jar 1.95MB
commons-math3-3.4.1.jar 1.94MB
spark-examples_2.11-2.2.1.jar 1.9MB
hadoop-yarn-api-2.6.0-cdh5.14.2.jar 1.84MB
datanucleus-core-3.2.10.jar 1.8MB
datanucleus-rdbms-3.2.9.jar 1.73MB
spark-hive-thriftserver_2.11-2.2.1.jar 1.72MB
hadoop-yarn-common-2.6.0-cdh5.14.2.jar 1.49MB
hadoop-mapreduce-client-core-2.6.0-cdh5.14.2.jar 1.48MB
htrace-core4-4.0.1-incubating.jar 1.42MB
netty-3.9.9.Final.jar 1.27MB
ivy-2.4.0.jar 1.22MB
spark-hive_2.11-2.2.1.jar 1.18MB
xercesImpl-2.9.1.jar 1.17MB
arpack_combined_all-0.1.jar 1.14MB
jackson-databind-2.6.5.jar 1.12MB
snappy-java-1.1.2.6.jar 1.01MB
parquet-jackson-1.8.2.jar 1024KB
leveldbjni-all-1.8.jar 1021KB
jersey-guava-2.22.2.jar 949KB
parquet-column-1.8.2.jar 935KB
jersey-server-2.22.2.jar 929KB
scalap-2.11.8.jar 784KB
zookeeper-3.4.6.jar 774KB
jackson-mapper-asl-1.9.13.jar 762KB
jtransforms-2.4.0.jar 747KB
hadoop-mapreduce-client-common-2.6.0-cdh5.14.2.jar 738KB
httpclient-4.5.2.jar 719KB
janino-3.0.0.jar 718KB
javassist-3.18.1-GA.jar 697KB
guice-3.0.jar 694KB
spark-graphx_2.11-2.2.1.jar 692KB
spark-yarn_2.11-2.2.1.jar 683KB
jersey-common-2.22.2.jar 682KB
apacheds-kerberos-codec-2.0.0-M15.jar 675KB
共 946 条
- 1
- 2
- 3
- 4
- 5
- 6
- 10
Alderaan
- 粉丝: 2217
- 资源: 14
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- alu.v
- H21-282学习参考.pdf
- QuestionTwo.java
- QuestionOne.java
- AWS Certified Solutions Architect Study Guide -SAA-C03 .docx
- 校园小情书微信小程序源码 社区小程序前后端开源 校园表白墙交友小程序.rar
- OA办公自动化管理系统(Struts1.2+Hibernate3.0+Spring2+DWR).rar
- 简历-求职简历-word-文件-简历模版免费分享-应届生-高颜值简历模版-个人简历模版-简约大气-大学生在校生-求职-实习
- 南京邮电大学数学实验:熟练掌握 Matlab 软件的基本命令和操作
- 简历-求职简历-word-文件-简历模版免费分享-应届生-高颜值简历模版-个人简历模版-简约大气-大学生在校生-求职-实习
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0