SparkR_3.2.1.tar.gz资源-CSDN文库

需积分: 12 122 浏览量 2022-04-02 23:22:45 上传评论收藏 341KB GZ 举报

共295个文件

rd：248个

r：41个

rmd：2个

SparkR是Apache Spark的一个扩展，它为R语言提供了与Spark交互的接口，使得大数据分析可以在R环境中进行。SparkR提供了一套轻量级的数据抽象层，名为DataFrame，以及一组类似于SQL的查询操作，称为SparkSQL，让数据科学家能够在大规模数据集上进行高效计算。 SparkR的3.2.1版本是一个重要的更新，它包含了多项性能优化和功能增强。SparkR的核心在于其分布式计算能力，能够在多台机器上并行处理数据，显著提升了数据处理速度。这个版本可能包括了对Spark核心组件的改进，比如更高效的Shuffle操作，优化的数据存储策略，以及对内存管理和资源调度的改进。 DataFrame是SparkR中的主要数据结构，它在Spark中等价于Spark SQL的DataFrame，但在R中表现为一个表格类对象，可以支持列式操作和关系型查询。DataFrame在分布式环境下提供了高效的计算性能，因为它能自动处理数据分区和并行化操作。 SparkSQL是SparkR中用于处理结构化数据的部分，它允许用户通过SQL语句或者DataFrame API来查询和操作数据。SparkSQL的兼容性很好，能够与多种数据源集成，如Hive、Parquet、JSON等，使得在SparkR中使用SQL进行数据分析变得十分便捷。在SparkR 3.2.1中，可能会有新的API或函数引入，以提升用户体验和功能扩展。例如，可能增加了对新数据格式的支持，或是增强了对复杂数据类型（如数组、结构体）的处理能力。此外，错误处理和调试工具也可能有所改进，使得开发者在遇到问题时能更快定位并解决问题。压缩包中的"SparkR"很可能包含了SparkR的库文件、文档、示例代码和安装指南。库文件是实际运行SparkR所需的二进制和脚本，用户可以通过这些文件将SparkR集成到R环境中。文档可能包括了API参考、用户指南和开发手册，帮助用户理解和使用SparkR的各项功能。示例代码可以帮助初学者快速上手，了解如何在实践中运用SparkR进行大数据分析。 SparkR 3.2.1是Apache Spark与R语言结合的产物，它提供了在R环境中进行大规模数据处理的能力。这个版本不仅强化了Spark的计算效率，还可能扩展了其在数据分析和SQL查询上的功能，使得数据科学家能够更方便地利用R语言处理和分析海量数据。

资源详情

资源评论

资源推荐

收起资源包目录

SparkR_3.2.1.tar.gz （295个子文件）

DESCRIPTION 2KB

sparkr-vignettes.html 158KB

NAMESPACE 15KB

functions.R 171KB

DataFrame.R 147KB

RDD.R 54KB

generics.R 52KB

mllib_regression.R 40KB

mllib_classification.R 40KB

mllib_tree.R 38KB

pairRDD.R 36KB

utils.R 34KB

mllib_clustering.R 31KB

SQLContext.R 25KB

sparkR.R 25KB

sparkr-vignettes.R 22KB

context.R 16KB

catalog.R 15KB

install.R 13KB

column.R 13KB

worker.R 10KB

mllib_fpm.R 10KB

stats.R 9KB

WindowSpec.R 9KB

group.R 8KB

schema.R 8KB

mllib_recommendation.R 8KB

deserialize.R 7KB

serialize.R 6KB

streaming.R 6KB

mllib_utils.R 6KB

mllib_stat.R 5KB

jvm.R 5KB

client.R 5KB

backend.R 4KB

types.R 4KB

daemon.R 4KB

test_basic.R 4KB

window.R 4KB

jobj.R 3KB

run-all.R 3KB

broadcast.R 3KB

shell.R 2KB

general.R 958B

column_collection_functions.Rd 24KB

column_string_functions.Rd 14KB

column_datetime_functions.Rd 11KB

column_math_functions.Rd 11KB

column_aggregate_functions.Rd 9KB

column_nonaggregate_functions.Rd 8KB

spark.logit.Rd 7KB

spark.randomForest.Rd 7KB

column_datetime_diff_functions.Rd 7KB

spark.gbt.Rd 7KB

spark.decisionTree.Rd 6KB

spark.glm.Rd 6KB

gapply.Rd 6KB

column_window_functions.Rd 6KB

spark.lda.Rd 6KB

write.stream.Rd 5KB

gapplyCollect.Rd 5KB

merge.Rd 5KB

spark.als.Rd 5KB

spark.mlp.Rd 4KB

write.df.Rd 4KB

subset.Rd 4KB

nafunctions.Rd 4KB

spark.fmClassifier.Rd 4KB

withWatermark.Rd 4KB

write.jdbc.Rd 4KB

spark.svmLinear.Rd 4KB

spark.fpGrowth.Rd 4KB

dapply.Rd 4KB

spark.bisectingKmeans.Rd 4KB

select.Rd 4KB

spark.lm.Rd 4KB

unionByName.Rd 4KB

repartitionByRange.Rd 4KB

spark.fmRegressor.Rd 4KB

saveAsTable.Rd 4KB

arrange.Rd 4KB

withColumn.Rd 4KB

first.Rd 4KB

coalesce.Rd 4KB

summary.Rd 4KB

join.Rd 4KB

repartition.Rd 4KB

spark.survreg.Rd 4KB

summarize.Rd 4KB

write.text.Rd 3KB

histogram.Rd 3KB

write.parquet.Rd 3KB

write.json.Rd 3KB

columns.Rd 3KB

dapplyCollect.Rd 3KB

alias.Rd 3KB

mutate.Rd 3KB

sample.Rd 3KB

write.orc.Rd 3KB

spark.kmeans.Rd 3KB

共 295 条

# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # #' @include generics.R column.R NULL #' Aggregate functions for Column operations #' #' Aggregate functions defined for \code{Column}. #' #' @param x Column to compute on. #' @param y,na.rm,use currently not used. #' @param ... additional argument(s). For example, it could be used to pass additional Columns. #' @name column_aggregate_functions #' @rdname column_aggregate_functions #' @family aggregate functions #' @examples #' \dontrun{ #' # Dataframe used throughout this doc #' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))} NULL #' Date time functions for Column operations #' #' Date time functions defined for \code{Column}. #' #' @param x Column to compute on. In \code{window}, it must be a time Column of #' \code{TimestampType}. This is not used with \code{current_date} and #' \code{current_timestamp} #' @param format The format for the given dates or timestamps in Column \code{x}. See the #' format used in the following methods: #' \itemize{ #' \item \code{to_date} and \code{to_timestamp}: it is the string to use to parse #' Column \code{x} to DateType or TimestampType. #' \item \code{trunc}: it is the string to use to specify the truncation method. #' 'year', 'yyyy', 'yy' to truncate by year, #' or 'month', 'mon', 'mm' to truncate by month #' Other options are: 'week', 'quarter' #' \item \code{date_trunc}: it is similar with \code{trunc}'s but additionally #' supports #' 'day', 'dd' to truncate by day, #' 'microsecond', 'millisecond', 'second', 'minute' and 'hour' #' } #' @param ... additional argument(s). #' @name column_datetime_functions #' @rdname column_datetime_functions #' @family data time functions #' @examples #' \dontrun{ #' dts <- c("2005-01-02 18:47:22", #' "2005-12-24 16:30:58", #' "2005-10-28 07:30:05", #' "2005-12-28 07:01:05", #' "2006-01-24 00:01:10") #' y <- c(2.0, 2.2, 3.4, 2.5, 1.8) #' df <- createDataFrame(data.frame(time = as.POSIXct(dts), y = y))} NULL #' Date time arithmetic functions for Column operations #' #' Date time arithmetic functions defined for \code{Column}. #' #' @param y Column to compute on. #' @param x For class \code{Column}, it is the column used to perform arithmetic operations #' with column \code{y}. For class \code{numeric}, it is the number of months or #' days to be added to or subtracted from \code{y}. For class \code{character}, it is #' \itemize{ #' \item \code{date_format}: date format specification. #' \item \code{from_utc_timestamp}, \code{to_utc_timestamp}: A string detailing #' the time zone ID that the input should be adjusted to. It should be in the format #' of either region-based zone IDs or zone offsets. Region IDs must have the form #' 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format #' (+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported #' as aliases of '+00:00'. Other short names are not recommended to use #' because they can be ambiguous. #' \item \code{next_day}: day of the week string. #' } #' @param ... additional argument(s). #' \itemize{ #' \item \code{months_between}, this contains an optional parameter to specify the #' the result is rounded off to 8 digits. #' } #' #' @name column_datetime_diff_functions #' @rdname column_datetime_diff_functions #' @family data time functions #' @examples #' \dontrun{ #' dts <- c("2005-01-02 18:47:22", #' "2005-12-24 16:30:58", #' "2005-10-28 07:30:05", #' "2005-12-28 07:01:05", #' "2006-01-24 00:01:10") #' y <- c(2.0, 2.2, 3.4, 2.5, 1.8) #' df <- createDataFrame(data.frame(time = as.POSIXct(dts), y = y))} NULL #' Math functions for Column operations #' #' Math functions defined for \code{Column}. #' #' @param x Column to compute on. In \code{shiftLeft}, \code{shiftRight} and #' \code{shiftRightUnsigned}, this is the number of bits to shift. #' @param y Column to compute on. #' @param ... additional argument(s). #' @name column_math_functions #' @rdname column_math_functions #' @family math functions #' @examples #' \dontrun{ #' # Dataframe used throughout this doc #' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) #' tmp <- mutate(df, v1 = log(df$mpg), v2 = cbrt(df$disp), #' v3 = bround(df$wt, 1), v4 = bin(df$cyl), #' v5 = hex(df$wt), v6 = degrees(df$gear), #' v7 = atan2(df$cyl, df$am), v8 = hypot(df$cyl, df$am), #' v9 = pmod(df$hp, df$cyl), v10 = shiftLeft(df$disp, 1), #' v11 = conv(df$hp, 10, 16), v12 = sign(df$vs - 0.5), #' v13 = sqrt(df$disp), v14 = ceil(df$wt)) #' head(tmp)} NULL #' String functions for Column operations #' #' String functions defined for \code{Column}. #' #' @param x Column to compute on except in the following methods: #' \itemize{ #' \item \code{instr}: \code{character}, the substring to check. See 'Details'. #' \item \code{format_number}: \code{numeric}, the number of decimal place to #' format to. See 'Details'. #' } #' @param y Column to compute on. #' @param pos In \itemize{ #' \item \code{locate}: a start position of search. #' \item \code{overlay}: a start position for replacement. #' } #' @param len In \itemize{ #' \item \code{lpad} the maximum length of each output result. #' \item \code{overlay} a number of bytes to replace. #' } #' @param ... additional Columns. #' @name column_string_functions #' @rdname column_string_functions #' @family string functions #' @examples #' \dontrun{ #' # Dataframe used throughout this doc #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} NULL #' Non-aggregate functions for Column operations #' #' Non-aggregate functions defined for \code{Column}. #' #' @param x Column to compute on. In \code{lit}, it is a literal value or a Column. #' In \code{expr}, it contains an expression character object to be parsed. #' @param y Column to compute on. #' @param ... additional Columns. #' @name column_nonaggregate_functions #' @rdname column_nonaggregate_functions #' @seealso coalesce,SparkDataFrame-method #' @family non-aggregate functions #' @examples #' \dontrun{ #' # Dataframe used throughout this doc #' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))} NULL #' Miscellaneous functions for Column operations #' #' Miscellaneous functions defined for \code{Column}. #' #' @param x Column to compute on. In \code{sha2}, it is one of 224, 256, 384, or 512. #' @param y Column to compute on. #' @param ... additional Columns. #' @name column_misc_functions #' @rdname column_misc_functions #' @family misc functions #' @examples #' \dontrun{ #' # Dataframe used throughout this doc #' df <- c