multiROC：在多类分类中计算和可视化ROC和PR曲线

共31个文件

r：11个

rd：11个

jpeg：3个

2星需积分: 50 16 浏览量 2021-02-10 10:36:07 上传评论 1 收藏 253KB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

multiROC-master.zip （31个子文件）

multiROC-master

vignettes

my-vignette.Rmd 8KB

NAMESPACE 462B

www

demo_pr.jpeg 36KB

demo_roc.jpeg 39KB

DESCRIPTION 972B

global_var.R 103B

pr_auc_with_ci.R 921B

roc_ci.R 410B

plot_pr_data.R 2KB

multi_pr.R 2KB

plot_roc_data.R 2KB

pr_ci.R 407B

cal_auc.R 125B

cal_confus.R 748B

roc_auc_with_ci.R 935B

multi_roc.R 3KB

LICENSE 34KB

README.md 14KB

data

test_data.rda 4KB

man

multi_roc.Rd 2KB

cal_auc.Rd 922B

multi_pr.Rd 2KB

pr_ci.Rd 2KB

plot_roc_data.Rd 525B

plot_pr_data.Rd 509B

roc_auc_with_ci.Rd 2KB

pr_auc_with_ci.Rd 2KB

roc_ci.Rd 2KB

cal_confus.Rd 1KB

test_data.Rd 1020B

multiROC_logo.jpeg 184KB

multiROC <img src="multiROC_logo.jpeg" align="right" height="254" width="220"/> ====================================================== Calculating and Visualizing ROC and PR Curves Across Multi-Class Classifications [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active) [![](https://www.r-pkg.org/badges/version/multiROC)](https://www.r-pkg.org/pkg/multiROC) ![](http://cranlogs.r-pkg.org/badges/grand-total/multiROC?color=blue) [![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/multiROC)](https://www.r-pkg.org/pkg/multiROC) [![GPLv3 license](https://img.shields.io/badge/License-GPLv3-blue.svg)](http://perso.crans.org/besson/LICENSE.html) [![GitHub watchers](https://img.shields.io/github/watchers/WandeRum/multiROC.svg?style=flat&label=Watch)](https://github.com/elise-is/multiROC/watchers) [![GitHub stars](https://img.shields.io/github/stars/WandeRum/multiROC.svg?style=flat&label=Star)](https://github.com/elise-is/multiROC/stargazers) [![GitHub forks](https://img.shields.io/github/forks/WandeRum/multiROC.svg?style=flat&label=Fork)](https://github.com/elise-is/multiROC/fork) The receiver operating characteristic (ROC) and precision recall (PR) is an extensively utilized method for comparing binary classifiers in various areas. However, many real-world problems are designed to multiple classes (e.g., tumor, node, and metastasis staging system of cancer), which require an evaluation strategy to assess multiclass classifiers. This package aims to fill the gap by enabling the calculation of multiclass ROC-AUC and PR-AUC with confidence intervals and the generation of publication-quality figures of multiclass ROC curves and PR curves. A user-friendly website is available at https://metabolomics.cc.hawaii.edu/software/multiROC/. ## 1 Citation Please cite our paper once it is published: (Submitted). ## 2 Installation Install `multiROC` from GitHub: ```r install.packages('devtools') require(devtools) install_github("WandeRum/multiROC") require(multiROC) ``` Install `multiROC` from CRAN: ```r install.packages('multiROC') require(multiROC) ``` ## 3 A demo example This demo is about the comparison between random forest and multinomial logistic regression based on Iris data. ### 3.1 data preparation ```r require(multiROC) data(iris) head(iris) ``` ### 3.2 60% training data and 40% testing data ```r set.seed(123456) total_number <- nrow(iris) train_idx <- sample(total_number, round(total_number*0.6)) train_df <- iris[train_idx, ] test_df <- iris[-train_idx, ] ``` ### 3.3 Random forest ```r rf_res <- randomForest::randomForest(Species~., data = train_df, ntree = 100) rf_pred <- predict(rf_res, test_df, type = 'prob') rf_pred <- data.frame(rf_pred) colnames(rf_pred) <- paste(colnames(rf_pred), "_pred_RF") ``` ### 3.4 Multinomial logistic regression ```r mn_res <- nnet::multinom(Species ~., data = train_df) mn_pred <- predict(mn_res, test_df, type = 'prob') mn_pred <- data.frame(mn_pred) colnames(mn_pred) <- paste(colnames(mn_pred), "_pred_MN") ``` ### 3.5 Merge true labels and predicted values ```r true_label <- dummies::dummy(test_df$Species, sep = ".") true_label <- data.frame(true_label) colnames(true_label) <- gsub(".*?\\.", "", colnames(true_label)) colnames(true_label) <- paste(colnames(true_label), "_true") final_df <- cbind(true_label, rf_pred, mn_pred) ``` ### 3.6 multiROC and multiPR ```r roc_res <- multi_roc(final_df, force_diag=T) pr_res <- multi_pr(final_df, force_diag=T) ``` ### 3.7 Plot ```r plot_roc_df <- plot_roc_data(roc_res) plot_pr_df <- plot_pr_data(pr_res) require(ggplot2) ggplot(plot_roc_df, aes(x = 1-Specificity, y=Sensitivity)) + geom_path(aes(color = Group, linetype=Method), size=1.5) + geom_segment(aes(x = 0, y = 0, xend = 1, yend = 1), colour='grey', linetype = 'dotdash') + theme_bw() + theme(plot.title = element_text(hjust = 0.5), legend.justification=c(1, 0), legend.position=c(.95, .05), legend.title=element_blank(), legend.background = element_rect(fill=NULL, size=0.5, linetype="solid", colour ="black")) ggplot(plot_pr_df, aes(x=Recall, y=Precision)) + geom_path(aes(color = Group, linetype=Method), size=1.5) + theme_bw() + theme(plot.title = element_text(hjust = 0.5), legend.justification=c(1, 0), legend.position=c(.95, .05), legend.title=element_blank(), legend.background = element_rect(fill=NULL, size=0.5, linetype="solid", colour ="black")) ``` ![](/www/demo_roc.jpeg) ![](/www/demo_pr.jpeg) ## 4 multiROC in a nutshell ```r library(multiROC) data(test_data) head(test_data) ``` ## G1_true G2_true G3_true G1_pred_m1 G2_pred_m1 G3_pred_m1 G1_pred_m2 G2_pred_m2 G3_pred_m2 ## 1 1 0 0 0.8566867 0.1169520 0.02636133 0.4371601 0.1443851 0.41845482 ## 2 1 0 0 0.8011788 0.1505448 0.04827643 0.3075236 0.5930025 0.09947397 ## 3 1 0 0 0.8473608 0.1229815 0.02965766 0.3046363 0.4101367 0.28522698 ## 4 1 0 0 0.8157730 0.1422322 0.04199482 0.2378494 0.5566147 0.20553591 ## 5 1 0 0 0.8069553 0.1472971 0.04574766 0.4067347 0.2355822 0.35768312 ## 6 1 0 0 0.6894488 0.2033285 0.10722271 0.1063048 0.4800507 0.41364450 This example dataset contains two classifiers (m1, m2), and three groups (G1, G2, G3). ### 4.1 multi_roc and multi_pr function ```r roc_res <- multi_roc(test_data, force_diag=T) pr_res <- multi_pr(test_data, force_diag=T) ``` The function **multi_roc** and **multi_pr** are core functions for calculating multiclass ROC-AUC and PR-AUC. Arguments of **multi_roc** and **multi_pr**: * **data** is the dataset contains both of true labels and corresponding predicted scores. True labels (0 - Negative, 1 - Positive) columns should be named as XX_true (e.g., S1_true, S2_true) and predictive scores (continuous) columns should be named as XX_pred_YY (e.g., S1_pred_SVM, S2_pred_RF). Predictive scores can be probabilities among [0, 1] or other continuous values. For each classifier, the number of columns should be equal to the number of groups of true labels. * If **force_diag** equals TRUE, true positive rate (TPR) and false positive rate (FPR) will be forced to across (0, 0) and (1, 1). Outputs of **multi_roc**: * **Specificity** contains a list of specificities for each group of different classifiers. * **Sensitivity** contains a list of sensitivities for each group of different classifiers. * **AUC** contains a list of AUC for each group of different classifiers. Micro-average ROC-AUC was calculated by stacking all groups together, thus converting the multi-class classification into binary classification. Macro-average ROC-AUC was calculated by averaging all groups results (one vs rest) and linear interpolation was used between points of ROC. * **Methods** shows names of different classifiers. * **Groups** shows names of different groups. Outputs of **multi_pr**: * **Recall** contains a list of recalls for each group of different classifiers. * **Precision** contains a list of precisions for each group of different classifiers. * **AUC** contains a list of AUC for each group of different classifiers. Micro-average PR-AUC was calculated by stacking all groups together, thus converting the multi-class classification into binary classification. Macro-average PR-AUC was calculated by averaging all groups results (one vs rest) and linear interpolation was used between points of ROC. * **Methods** shows names of different classifiers. * **Groups** shows names of different groups. ### 4.2 Confidence Intervals #### 4.2.1 List of