ROC curves and AUC values are common evaluation metric for binary classification models. Although there are some criticism of it especially its’s appropritatenes in evaluating models built with imbalanced data, they still remain the most popular evaluation metric for binary classification models. In the case of highly imbalanced classification, the area under the precision- recall curve may be an appropriate evaluation metric for comparing models and measuring performance of machine learning models. There are several packages in R for evaluating the area under the ROC curve in R. The pkgsearch can be used to search for packages information on packages for performing a particular task in R including downloads last month,version,bug reports etc.

library(tidyverse)  # for data manipulation
## Warning: package 'tidyverse' was built under R version 3.6.1
## Warning: package 'dplyr' was built under R version 3.6.1
library(dlstats)    # for package download stats
## Warning: package 'dlstats' was built under R version 3.6.1
library(pkgsearch)  # for searching packages
## Warning: package 'pkgsearch' was built under R version 3.6.1
pacman::p_load(ggthemes,scales,ggpubr,viridis,bench,microbenchmark,ggbeeswarm)

options(dplyr.width = Inf)
options(dplyr.length = Inf)
theme_set(theme_pubclean())
#methods("as.Date")


aucPkg <-  pkg_search(query="AUC",size=200,format = "short")


glimpse(aucPkg)
## Observations: 66
## Variables: 14
## $ score                <dbl> 4536.9720, 2341.1100, 2113.6843, 1979.541...
## $ package              <chr> "caTools", "AUC", "pROC", "aucm", "maxadj...
## $ version              <pckg_vrs> 1.17.1.2, 0.3.0, 1.15.3, 2018.1.24, ...
## $ title                <chr> "Tools: moving window statistics, GIF, Ba...
## $ description          <chr> "Contains several basic utility functions...
## $ date                 <dttm> 2019-03-06 19:53:35, 2013-09-30 17:41:33...
## $ maintainer_name      <chr> "ORPHANED", "Michel Ballings", "Xavier Ro...
## $ maintainer_email     <chr> "ORPHANED", "Michel.Ballings@UGent.be", "...
## $ revdeps              <int> 31, 9, 79, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, ...
## $ downloads_last_month <int> 213594, 5788, 89924, 454, 259, 835, 361, ...
## $ license              <chr> "GPL-3", "GPL (>= 2)", "GPL (>= 3)", "GPL...
## $ url                  <chr> NA, NA, "http://expasy.org/tools/pROC/", ...
## $ bugreports           <chr> NA, NA, "https://github.com/xrobin/pROC/i...
## $ package_data         <I<list>> 2019-03-...., 2013-09-...., 2019-07-....
aucPkgShort <- aucPkg %>% as_tibble(aucPkg)%>%
        
  filter(maintainer_name != "ORPHANED", score > 20) %>%
  select(score, package, downloads_last_month) %>%
  arrange(desc(downloads_last_month))

head(aucPkgShort)
## # A tibble: 6 x 3
##   score package         downloads_last_month
##   <dbl> <chr>                          <int>
## 1 2114. pROC                           89924
## 2  115. rattle                         15064
## 3 2341. AUC                             5788
## 4  266. varImp                          2726
## 5  470. survAUC                         2071
## 6  391. PresenceAbsence                 1740

dlstats package can be used to retrieve CRAN or bioconductor package downloading statistics.

shortList <- c("mltools","precrec","plotROC","pROC","MLmetrics","yardstick")


dlstats::cran_stats(shortList) %>% group_by(package) %>% summarise(n=sum(downloads) )%>%
            arrange(desc(n))
## # A tibble: 6 x 2
##   package         n
##   <fct>       <int>
## 1 pROC      1446323
## 2 MLmetrics  364132
## 3 plotROC     62338
## 4 yardstick   56507
## 5 mltools     31904
## 6 precrec     20470
downloads <- dlstats::cran_stats(shortList) %>%
  
            arrange(desc(downloads))

head(downloads)
##        start        end downloads package
## 1 2019-09-01 2019-09-30     83147    pROC
## 2 2017-02-01 2017-02-28     58789    pROC
## 3 2019-03-01 2019-03-31     52335    pROC
## 4 2019-06-01 2019-06-30     50982    pROC
## 5 2019-05-01 2019-05-31     49750    pROC
## 6 2019-04-01 2019-04-30     49243    pROC

The graph below illustrates the popularity of some the common packages for finding AUC in R. The pROC looks to be the most popular. Some of these popular packages have issues such as breaking down when the input is a very large vector or the data very imbalanced, they can evaluate to different AUC values. The goal of this post is investigate which of these packages is robust and evaluates consistent AUC values. The Stack Overflow link here talks about some of these diferences between packages. fOR EXAMPLE the PRROC package computes the AUC for the random forest model trained in theis post as 0.5 wheras most of the other packages compute the AUC value as 0.984898. This snapshot shows an example where the MLmetrics package breaks down for a vector of size around 600,000.

ggplot(downloads, aes(end, downloads, group=package, color=package )) +
  
  geom_line() +
  
  
  geom_point(aes(shape=package)) +
  
  
  scale_y_continuous(trans = log10_trans(), labels =  scales::comma)+
  
  #scale_fill_viridis_d()+
  scale_color_viridis_d(option="B") +
  
   scale_shape_manual(values = 1:6 )  +
  
 #   theme_economist() +
  
  scale_x_date(labels = date_format("%d-%m-%Y"),date_breaks = "1 year") +
  theme(axis.text.x=element_text(angle=45, hjust=1))

shortList <- c("mltools","precrec","plotROC","pROC","MLmetrics","yardstick")
downloads <- dlstats::bioc_stats(shortList)
head(downloads)
##        start        end downloads package
## 1 2016-11-01 2016-11-30       285 mltools
## 2 2016-12-01 2016-12-31       272 mltools
## 3 2017-01-01 2017-01-31       793 mltools
## 4 2017-02-01 2017-02-28       179 mltools
## 5 2017-03-01 2017-03-31       190 mltools
## 6 2017-04-01 2017-04-30       382 mltools
ggplot(downloads, aes(end, downloads, group=package, color=package )) +
  
  geom_line() +
  
  
  geom_point(aes(shape=package)) +
  
  
  scale_y_continuous(trans = log10_trans(), labels =  scales::comma)+
  
  #scale_fill_viridis_d()+
  scale_color_viridis_d(option="D") +
  
   scale_shape_manual(values = 1:9 )  +
  
 #   theme_economist() +
  
  scale_x_date(labels = date_format("%d-%m-%Y"),date_breaks = "1 year") +
  theme(axis.text.x=element_text(angle=45, hjust=1))

Train Random Forest Model

library(randomForest) # basic implementation
library(ranger)
#install packages
#install.packages("ROSE")
library(ROSE)
## Warning: package 'ROSE' was built under R version 3.6.1
data(hacide)
glimpse(hacide.train)
## Observations: 1,000
## Variables: 3
## $ cls <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ x1  <dbl> 0.20079810, 0.01662009, 0.22872469, 0.12637877, 0.60082129...
## $ x2  <dbl> 0.67803761, 1.57655790, -0.55953375, -0.09381378, -0.29839...
table(hacide.train$cls)
## 
##   0   1 
## 980  20
#check classes distribution
prop.table(table(hacide.train$cls))
## 
##    0    1 
## 0.98 0.02
data.rose <- ROSE(cls ~ ., data = hacide.train, seed = 1)$data
table(data.rose$cls)
## 
##   0   1 
## 520 480
#build a random forest models

m1 <- randomForest(
  formula = cls ~ .,
  data    = data.rose
)

#make predictions on unseen data
pred.prob <- predict(m1, newdata = hacide.test,type = "prob")


pred.class <- predict(m1, newdata = hacide.test)



ROSE::roc.curve(hacide.test$cls,pred.prob[,2])

## Area under the curve (AUC): 0.985

AUC function

ROC_AUC <- function(y_predprob, y_true) {
 
  
  #sum of positive labels
    n_pos <- sum(y_true)
  #sum of negative labels
  n_neg = sum(y_true==0)
  A  <- sum(rank(y_predprob)[!y_true]) - n_neg * (n_neg + 1) / 2
  return(1 - A /( n_neg*  n_pos)  )
}


ROC_AUC(pred.prob[,2],as.double(hacide.test$cls)-1)
## [1] 0.984898

MLmetrics

MLmetrics::AUC(pred.prob[,2],hacide.test$cls)
## [1] 0.984898

pROC

library(pROC)
## Warning: package 'pROC' was built under R version 3.6.1
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
pROC_obj <- roc(hacide.test$cls,pred.prob[,2],
                smoothed = TRUE,
                # arguments for ci
                ci=TRUE, ci.alpha=0.9, stratified=FALSE,
                # arguments for plot
                plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE,
                print.auc=TRUE, show.thres=TRUE)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
sens.ci <- ci.se(pROC_obj)
plot(sens.ci, type="shape", col="lightblue")
## Warning in plot.ci.se(sens.ci, type = "shape", col = "lightblue"): Low
## definition shape.

#plot(sens.ci, type="bars")



pROC_obj <- roc(hacide.test$cls,pred.prob[,2])
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
pROC::auc(pROC_obj )
## Area under the curve: 0.9849

plotROC

library(plotROC)
## Warning: package 'plotROC' was built under R version 3.6.1
## 
## Attaching package: 'plotROC'
## The following object is masked from 'package:pROC':
## 
##     ggroc
df= tibble(labels=as.double(hacide.test$cls)-1,predictions=pred.prob[,2])%>%
       mutate()


rocplot <- ggplot(df, aes(m = predictions, d = labels))+ geom_roc(n.cuts=20,labels=FALSE)
rocplot + style_roc(theme = theme_grey) + geom_rocci(fill="pink") 

plotROC::calc_auc(rocplot)
##   PANEL group      AUC
## 1     1    -1 0.984898

precrec

library(precrec)
## Warning: package 'precrec' was built under R version 3.6.1
## 
## Attaching package: 'precrec'
## The following object is masked from 'package:pROC':
## 
##     auc
precrec_obj <- evalmod(scores = df$predictions, labels = df$labels)
autoplot(precrec_obj)

precrec::auc(precrec_obj)
##   modnames dsids curvetypes      aucs
## 1       m1     1        ROC 0.9848980
## 2       m1     1        PRC 0.6529288

mltools

library(mltools)
## Warning: package 'mltools' was built under R version 3.6.1
## 
## Attaching package: 'mltools'
## The following object is masked from 'package:tidyr':
## 
##     replace_na
auc_roc(df$predictions,df$labels)  # 0.875
## [1] 0.984898
auc_roc(df$predictions,df$labels, returnDT=TRUE)%>%head()
##     Pred CountFalse CountTrue CumulativeFPR CumulativeTPR AdditionalArea
## 1: 1.000          0         1   0.000000000           0.2    0.000000000
## 2: 0.998          0         1   0.000000000           0.4    0.000000000
## 3: 0.974          1         0   0.004081633           0.4    0.001632653
## 4: 0.970          1         1   0.008163265           0.6    0.002040816
## 5: 0.920          1         0   0.012244898           0.6    0.002448980
## 6: 0.914          1         0   0.016326531           0.6    0.002448980
##    CumulativeArea
## 1:    0.000000000
## 2:    0.000000000
## 3:    0.001632653
## 4:    0.003673469
## 5:    0.006122449
## 6:    0.008571429

yardstick

df2=  df %>% mutate(labels= as_factor(labels))
yardstick::roc_auc(df2,labels,predictions)
## Setting direction: controls > cases
## # A tibble: 1 x 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 roc_auc binary         0.985

PRROC

library(PRROC)
## Warning: package 'PRROC' was built under R version 3.6.1
## 
## Attaching package: 'PRROC'
## The following object is masked from 'package:ROSE':
## 
##     roc.curve
preds_pos <-  pred.prob[hacide.test$cls==1] #preds for true positive class
preds_neg <-  pred.prob[hacide.test$cls==0]  #preds for true negative class

plot(PRROC::roc.curve(preds_pos, preds_neg, curve = TRUE))

PRROC::roc.curve(preds_pos, preds_neg,curve=FALSE)
## 
##   ROC curve
## 
##     Area under curve:
##      0.5 
## 
##     Curve not computed ( can be done by using curve=TRUE )

comparing performance

bnch <- microbenchmark(
  yardstick=yardstick::roc_auc(df2,labels,predictions)$.estimate,
  mltools= mltools::auc_roc(df$predictions,df$labels),
  MLmetrics= MLmetrics::AUC(df$predictions,df$labels),
  ROC_AUC=ROC_AUC(df$predictions,df$labels)
  
)
bnch
## Unit: microseconds
##       expr      min        lq      mean    median        uq      max neval
##  yardstick 3631.784 4336.6365 5541.4402 5419.5160 6376.5280 13409.27   100
##    mltools 4301.650 5373.6500 6596.2318 6176.2085 7555.8345 12705.70   100
##  MLmetrics   48.640   69.7605  106.0827   91.9475  119.4670   823.04   100
##    ROC_AUC   39.254   65.7075  187.2773   79.3605   95.3605 10513.49   100
ggplot2::autoplot(bnch)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

results <- bench::mark(
   yardstick=yardstick::roc_auc(df2,labels,predictions)$.estimate,
  mltools= mltools::auc_roc(df$predictions,df$labels),
  MLmetrics= MLmetrics::AUC(df$predictions,df$labels),
  ROC_AUC=ROC_AUC(df$predictions,df$labels)
)
as_tibble(results)
## # A tibble: 4 x 13
##   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
##   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
## 1 yardstick    3.58ms   4.12ms      229.   140.6KB     2.03   113     1
## 2 mltools      5.07ms   6.45ms      154.   241.3KB     2.03    76     1
## 3 MLmetrics   39.25us  49.07us    18224.    14.3KB     4.16  8766     2
## 4 ROC_AUC     30.29us  36.69us    21905.    16.2KB     4.38  9998     2
##   total_time result    memory             time     gc                   
##     <bch:tm> <list>    <list>             <list>   <list>               
## 1      493ms <dbl [1]> <df[,3] [156 x 3]> <bch:tm> <tibble [114 x 3]>   
## 2      492ms <dbl [1]> <df[,3] [66 x 3]>  <bch:tm> <tibble [77 x 3]>    
## 3      481ms <dbl [1]> <df[,3] [12 x 3]>  <bch:tm> <tibble [8,768 x 3]> 
## 4      456ms <dbl [1]> <df[,3] [13 x 3]>  <bch:tm> <tibble [10,000 x 3]>
#summary(results, relative = TRUE)

ggplot2::autoplot(results)

library(tidyverse)
results %>%
  unnest() %>%
  filter(gc == "none") %>%
  ggplot(aes(x = mem_alloc, y = time, color = expression)) +
    geom_point() +
    scale_color_brewer(type = "qual", palette = 3) +
    geom_smooth(method = "lm", se = F, colour = "grey50")

The MLmetrics package and the ROC_AUC function appears to be much faster than the other packages. On the otherhand the mltools package did not experience any break down during its usage unlike the MLmetric package.