Model Function Consistency
Since there are many modeling packages in R written by different
people, there are inconsistencies in how models are specified and
predictions are created.
For example, many models have only one method of specifying the
model (e.g. formula method only)
> ## only one way here:
> rpart(y ~ ., data = dat)
>
> ## and both ways here:
> lda(y ~ ., data = dat)
>
> lda(x = predictors, y = outcome)
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 2 / 26
Generating Class Probabilities Using Different
Packages
Function predict Function Syntax
MASS::lda predict(obj) (no options needed)
stats:::glm predict(obj, type = "response")
gbm::gbm predict(obj, type = "response", n.trees)
mda::mda predict(obj, type = "posterior")
rpart::rpart predict(obj, type = "prob")
RWeka::Weka predict(obj, type = "probability")
caTools::LogitBoost predict(obj, type = "raw", nIter)
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 3 / 26
The caret Package
The caret package was developed to:
create a unified interface for modeling and prediction (interfaces
to 183 models)
streamline model tuning using resampling
provide a variety of “helper” functions and classes for
day–to–day model building tasks
increase computational efficiency using parallel processing
First commits within Pfizer: 6/2005, First version on CRAN: 10/2007
Website: http://topepo.github.io/caret/
JSS Paper: http://www.jstatsoft.org/v28/i05/paper
Model List: http://topepo.github.io/caret/bytag.html
Many computing sections in Applied Predictive Modeling
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 4 / 26
Easily Switching Between Models
> library(doMC)
> registerDoMC(cores=10)
>
> ctlr <- trainControl(classProbs = TRUE, method = "repeatedcv")
> gbm_mod <- train(Class ~ ., data = training,
+ method = "gbm",
+ trControl = ctlr,
+ ## gbm argument:
+ verbose = FALSE)
>
> pls_mod <- train(Class ~ ., data = training,
+ method = "pls",
+ tuneLength = 10,
+ preProc = c("center", "scale", "spatailSign"),
+ trControl = ctlr)
>
> pls_search <- gafs(x = training[, -1], y = training$Class,
+ gafsControl = gafsControl(method = "cv", functions = rfGA),
+ ## train options:
+ method = "pls",
+ tuneLength = 10,
+ preProc = c("center", "scale", "spatailSign"),
+ trControl = ctlr)
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 5 / 26