【免费】spark-3.0.0-bin-hadoop3.2_spark3.0下载资源-CSDN文库

共1194个文件

py：324个

jar：263个

scala：195个

需积分: 0 85 浏览量更新于2023-02-17 1 收藏 215.18MB ZIP 举报

Spark是Apache软件基金会的一个开源大数据处理框架，以其高效、易用和可扩展性著称。在本场景中，我们讨论的是Spark的3.0.0版本，与Hadoop3.2相结合的二进制发行版——"spark-3.0.0-bin-hadoop3.2"。这个压缩包是为了在Windows操作系统下运行Spark而设计的，因此标签明确指出它是适用于Windows平台的包。 Spark 3.0.0是Spark发展中的一个重要里程碑，它引入了许多新特性和性能优化。以下是一些关键知识点： 1. **Databricks Runtime (DBR) 合并**：Spark 3.0.0与Databricks Runtime进行了部分融合，引入了更多针对大规模数据处理和机器学习的优化。 2. **SQL增强**：Spark SQL得到了重大改进，包括对Hive Metastore的更好支持，新的DataFrame API，以及对标准SQL语法的更全面支持，使得数据分析更加便捷。 3. **性能提升**：Spark 3.0.0对Shuffle过程进行了优化，减少了数据传输和磁盘I/O，从而提高了整体性能。此外，还引入了Tungsten和Codegen技术，进一步加速了执行速度。 4. **PySpark改进**：Python API（PySpark）在新版本中得到了增强，支持更多的Python数据类型，提升了Python用户的工作效率。 5. **内存管理**：引入了统一内存管理模型，旨在更有效地利用内存资源，减少数据序列化和反序列化的开销。 6. **Kubernetes原生支持**：Spark 3.0.0增强了对Kubernetes的原生支持，使用户能够更方便地在Kubernetes集群上部署和管理Spark作业。 7. **安全特性**：提供了更强大的安全特性，如加密通信、身份验证和授权，确保了数据在处理过程中的安全性。 8. **Hadoop 3.2兼容性**：此版本的Spark与Hadoop 3.2兼容，意味着可以充分利用Hadoop的新功能，如YARN的资源调度优化和HDFS的增强。 9. **机器学习库MLlib**：MLlib在3.0.0版本中也有所更新，支持更多的算法，同时提供了更好的模型解释性和可重复性。 10. **图形处理库GraphX**：对于图计算，GraphX提供了一组API来处理和分析图数据，3.0.0版本可能包含了新的优化和增强。在解压"spark-3.0.0-bin-hadoop3.2"后，你将找到包含Spark运行所需的所有组件，如bin目录下的可执行脚本，lib目录下的库文件，以及conf目录下的配置文件。在Windows环境下，你可以通过修改配置文件，设置环境变量，并使用提供的启动脚本来运行Spark Shell、Spark Submit等工具，开始你的大数据处理之旅。为了充分利用Spark的功能，你需要了解如何配置Spark的运行环境，如设置Master和Worker节点，配置内存和CPU资源，以及理解和编写Spark程序。同时，理解Hadoop生态系统的其他组件，如HDFS和YARN，将有助于更好地集成和管理Spark作业。 Spark 3.0.0-bin-hadoop3.2是一个强大且灵活的大数据处理工具，适用于Windows平台，为开发者提供了高效的数据处理和分析能力。通过深入学习和实践，你可以掌握这一工具，解决各种大数据问题，实现复杂的分析任务。

收起资源包目录

spark-3.0.0-bin-hadoop3.2 （1194个子文件）

_common_metadata 210B

_metadata 743B

_SUCCESS 0B

AnIndex 37KB

users.avro 334B

full_user.avsc 240B

user.avsc 185B

make2.bat 7KB

make.bat 199B

beeline 1KB

setup.cfg 854B

spark-class2.cmd 3KB

find-spark-home.cmd 3KB

load-spark-env.cmd 2KB

spark-shell2.cmd 2KB

pyspark2.cmd 2KB

run-example.cmd 1KB

spark-class.cmd 1KB

spark-submit.cmd 1KB

spark-shell.cmd 1KB

spark-sql.cmd 1KB

pyspark.cmd 1KB

sparkR.cmd 1KB

spark-submit2.cmd 1KB

spark-sql2.cmd 1KB

sparkR2.cmd 1KB

beeline.cmd 1KB

spark-defaults.conf 1KB

.coveragerc 872B

.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B

.part-r-00007.gz.parquet.crc 12B

.part-r-00002.gz.parquet.crc 12B

.part-r-00000-829af031-b970-49d6-ad39-30460a0be2c8.orc.crc 12B

.part-r-00004.gz.parquet.crc 12B

.part-r-00008.gz.parquet.crc 12B

.part-r-00005.gz.parquet.crc 12B

pyspark.css 2KB

R.css 1KB

ages_newlines.csv 87B

people.csv 49B

ages.csv 26B

lpsa.data 10KB

test.data 128B

DESCRIPTION 2KB

Dockerfile 2KB

Dockerfile 1KB

find-spark-home 2KB

.gitignore 49B

00Index.html 120KB

LICENSE-javassist.html 25KB

layout.html 207B

MANIFEST.in 1KB

INDEX 16KB

breeze_2.12-1.0.jar 13.19MB

kubernetes-model-4.9.2.jar 11.36MB

hive-exec-2.3.7-core.jar 10.34MB

scala-compiler-2.12.10.jar 10.18MB

spark-3.0.0-yarn-shuffle.jar 9.93MB

spark-core_2.12-3.0.0.jar 9.7MB

spark-catalyst_2.12-3.0.0.jar 8.95MB

hive-metastore-2.3.7.jar 7.81MB

mesos-1.4.0-shaded-protobuf.jar 7MB

spire_2.12-0.17.0-M1.jar 6.86MB

spark-sql_2.12-3.0.0.jar 6.79MB

spark-mllib_2.12-3.0.0.jar 5.61MB

scala-library-2.12.10.jar 5.03MB

hadoop-hdfs-client-3.2.0.jar 4.79MB

zstd-jni-1.4.4-3.jar 4.02MB

netty-all-4.1.47.Final.jar 3.96MB

hadoop-common-3.2.0.jar 3.9MB

scala-reflect-2.12.10.jar 3.51MB

hadoop-yarn-api-3.2.0.jar 3.13MB

shapeless_2.12-2.3.3.jar 3.09MB

cats-kernel_2.12-2.0.0-M4.jar 3.08MB

derby-10.12.1.1.jar 3.08MB

hadoop-yarn-common-3.2.0.jar 2.76MB

curator-client-2.13.0.jar 2.31MB

spark-network-common_2.12-3.0.0.jar 2.29MB

guava-14.0.1.jar 2.09MB

spark-hive-thriftserver_2.12-3.0.0.jar 1.98MB

commons-math3-3.4.1.jar 1.94MB

datanucleus-core-4.1.17.jar 1.92MB

snappy-java-1.1.7.5.jar 1.84MB

datanucleus-rdbms-4.1.19.jar 1.82MB

ehcache-3.3.1.jar 1.65MB

hadoop-mapreduce-client-core-3.2.0.jar 1.58MB

avro-1.8.2.jar 1.48MB

spark-examples_2.12-3.0.0.jar 1.44MB

htrace-core4-4.1.0-incubating.jar 1.43MB

arrow-vector-0.15.1.jar 1.37MB

jackson-databind-2.10.0.jar 1.34MB

hadoop-yarn-server-common-3.2.0.jar 1.33MB

ivy-2.4.0.jar 1.22MB

arpack_combined_all-0.1.jar 1.14MB

JTransforms-3.1.jar 1.12MB

algebra_2.12-2.0.0-M2.jar 1.11MB

jersey-common-2.30.jar 1.11MB

spark-streaming_2.12-3.0.0.jar 1.09MB

共 1194 条

资源推荐

资源预览

资源评论

# Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. <https://spark.apache.org/> [![Jenkins Build](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/badge/icon)](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3) [![AppVeyor Build](https://img.shields.io/appveyor/ci/ApacheSoftwareFoundation/spark/master.svg?style=plastic&logo=appveyor)](https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark) [![PySpark Coverage](https://img.shields.io/badge/dynamic/xml.svg?label=pyspark%20coverage&url=https%3A%2F%2Fspark-test.github.io%2Fpyspark-coverage-site&query=%2Fhtml%2Fbody%2Fdiv%5B1%5D%2Fdiv%2Fh1%2Fspan&colorB=brightgreen&style=plastic)](https://spark-test.github.io/pyspark-coverage-site) ## Online Documentation You can find the latest Spark documentation, including a programming guide, on the [project web page](https://spark.apache.org/documentation.html). This README file only contains basic setup instructions. ## Building Spark Spark is built using [Apache Maven](https://maven.apache.org/). To build Spark and its example programs, run: ./build/mvn -DskipTests clean package (You do not need to do this if you downloaded a pre-built package.) More detailed documentation is available from the project site, at ["Building Spark"](https://spark.apache.org/docs/latest/building-spark.html). For general development tips, including info on developing Spark using an IDE, see ["Useful Developer Tools"](https://spark.apache.org/developer-tools.html). ## Interactive Scala Shell The easiest way to start using Spark is through the Scala shell: ./bin/spark-shell Try the following command, which should return 1,000,000,000: scala> spark.range(1000 * 1000 * 1000).count() ## Interactive Python Shell Alternatively, if you prefer Python, you can use the Python shell: ./bin/pyspark And run the following command, which should also return 1,000,000,000: >>> spark.range(1000 * 1000 * 1000).count() ## Example Programs Spark also comes with several sample programs in the `examples` directory. To run one of them, use `./bin/run-example <class> [params]`. For example: ./bin/run-example SparkPi will run the Pi example locally. You can set the MASTER environment variable when running examples to submit examples to a cluster. This can be a mesos:// or spark:// URL, "yarn" to run on YARN, and "local" to run locally with one thread, or "local[N]" to run locally with N threads. You can also use an abbreviated class name if the class is in the `examples` package. For instance: MASTER=spark://host:7077 ./bin/run-example SparkPi Many of the example programs print usage help if no params are given. ## Running Tests Testing first requires [building Spark](#building-spark). Once Spark is built, tests can be run using: ./dev/run-tests Please see the guidance on how to [run tests for a module, or individual tests](https://spark.apache.org/developer-tools.html#individual-tests). There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md ## A Note About Hadoop Versions Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the protocols have changed in different versions of Hadoop, you must build Spark against the same version that your cluster runs. Please refer to the build documentation at ["Specifying the Hadoop Version and Enabling YARN"](https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn) for detailed guidance on building for a particular distribution of Hadoop, including building for particular Hive and Hive Thriftserver distributions. ## Configuration Please refer to the [Configuration Guide](https://spark.apache.org/docs/latest/configuration.html) in the online documentation for an overview on how to configure Spark. ## Contributing Please review the [Contribution to Spark guide](https://spark.apache.org/contributing.html) for information on how to get started contributing to the project.