没有合适的资源?快使用搜索试试~ 我知道了~
caret chreatsheet
资源推荐
资源详情
资源评论
caret Package
Cheat Sheet
CC BY SA Max Kuhn • max@rstudio.com • https://github.com/topepo/
Learn more at https://topepo.github.io/caret/ • Updated: 9/17
train(y ~ x1 + x2, data = dat, ...)
train(x = predictor_df, y = outcome_vector, ...)
train(recipe_object, data = dat, ...)
Possible syntaxes for specifying the variables in the model:
• rfe, sbf, gafs, and safs only have the x/y interface.
• The train formula method will always create dummy
variables.
• The x/y interface to train will not create dummy variables
(but the underlying model function might).
Remember to:
• Have column names in your data.
• Use factors for a classification outcome (not 0/1 or integers).
• Have valid R names for class levels (not “0"/"1")
• Set the random number seed prior to calling train repeatedly
to get the same resamples across calls.
• Use the train option na.action = na.pass if you will
being imputing missing data. Also, use this option when
predicting new data containing missing values.
To pass options to the underlying model function, you can pass
them to train via the ellipses:
train(y ~ ., data = dat, method = "rf",
# options to `randomForest`:
importance = TRUE)
Specifying the Model
Parallel Processing
The foreach package is used to run models in parallel. The
train code does not change but a “do” package must be called
first.
# on MacOS or Linux
library(doMC)
registerDoMC(cores=4)
train(, preProc = c("method1", "method2"), ...)
Transformations, filters, and other operations can be applied to
the predictors with the preProc option.
• "center", "scale", and "range" to normalize predictors.
• "BoxCox", "YeoJohnson", or "expoTrans" to transform
predictors.
• "knnImpute", "bagImpute", or "medianImpute" to
impute.
•
"corr", "nzv", "zv", and "conditionalX" to filter.
• "pca", "ica", or "spatialSign" to transform groups.
Preprocessing
Adding Options
Many train options can be specified using the trainControl
function:
train(y ~ ., data = dat, method = "cubist",
trControl = trainControl(<options>))
# on Windows
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
The function parallel::detectCores can help too.
Methods include:
train determines the order of operations; the order that the
methods are declared does not matter.
The recipes package has a more extensive list of preprocessing
operations.
Resampling Options
trainControl is used to choose a resampling method:
trainControl(method = <method>, <options>)
Methods and options are:
• "cv" for K-fold cross-validation (number sets the # folds).
• "repeatedcv" for repeated cross-validation (repeats for #
repeats).
• "boot" for bootstrap (number sets the iterations).
• "LGOCV" for leave-group-out (number and p are options).
•
"LOO" for leave-one-out cross-validation.
• "oob" for out-of-bag resampling (only for some models).
• "timeslice" for time-series data (options are
initialWindow, horizon, fixedWindow, and skip).
To choose how to summarize a model, the trainControl
function is used again.
Performance Metrics
trainControl(summaryFunction = <R function>,
classProbs = <logical>)
Custom R functions can be used but caret includes several:
defaultSummary (for accuracy, RMSE, etc), twoClassSummary
(for ROC curves), and prSummary (for information retrieval). For
the last two functions, the option classProbs must be set to
TRUE.
Grid Search
To let train determine the values of the tuning parameter(s), the
tuneLength option controls how many values per tuning
parameter to evaluate.
Alternatively, specific values of the tuning parameters can be
declared using the tuneGrid argument:
grid <- expand.grid(alpha = c(0.1, 0.5, 0.9),
lambda = c(0.001, 0.01))
train(x = x, y = y, method = "glmnet",
preProc = c("center", "scale"),
tuneGrid = grid)
Random Search
For tuning, train can also generate random tuning parameter
combinations over a wide range. tuneLength controls the total
number of combinations to evaluate. To use random search:
trainControl(search = "random")
Subsampling
With a large class imbalance, train can subsample the data to
balance the classes them prior to model fitting.
trainControl(sampling = "down")
Other values are "up", "smote", or "rose". The latter two may
require additional package installs.
资源评论
医学和生信笔记
- 粉丝: 2018
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- JavaScript语法基础
- 基于python实现的真菌性中医皮肤病知识图谱及辅助诊断系统+源码+结果展示+项目解析(毕业设计&课程设计&项目开发)
- 植物大战僵尸(杂交版可用)修改器.zip
- 基于stm32的毕业设计 STM32-CO-CH4检测
- 网上购票系系统展开详细的需求分析,软件需求规格说明书
- 基于TypeScript+javaScript开发的医疗皮肤病识别小程序项目+大模型判断皮肤病种类+源码(毕业设计&课程设计)
- Jupyter Notebook是什么以及学习的意义
- 什么是c++管理系统以及学习了解c++管理系统的意义是什么
- 什么是vue开发以及学习了解vue开发的意义
- 鲁大师独立文件版本exe
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功