BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Weighted ROC and Murphy Curves in Cost Space and Applications - Ni
 kolai Kolev (Sao Paulo State University)
DTSTART:20250605T091000Z
DTEND:20250605T093000Z
UID:TALK230854@talks.cam.ac.uk
DESCRIPTION:This work\, jointly with Yuri Verges\, is motivated by the dif
 ferent methods developed by Hern&aacute\;ndes-Orallo et al. (2011\, 2013)\
 ,&nbsp\; Dimitriadis et al. (2021)\, Shao et al. (2023)\, Dimitriadis et a
 l. (2024)\, which complement each other.&nbsp\;\n&nbsp\;\nDimitriadis et a
 l. (2021) have introduced the&nbsp\; CORP approach based on&nbsp\; nonpara
 metric isotonic regression by using the traditional pool-adjacent-violator
 s (PAV) algorithm for calibration of probabilistic forecasts.&nbsp\; The C
 ORP approach generates&nbsp\;reliability curves\, being the graph of the P
 AV-(re)calibrated forecast probability.&nbsp\;\n&nbsp\;\nDimitriadis et al
 . (2024) proposed a triplet of diagnostic tools\, each with different capa
 bilities for performance evaluation of binary classifiers: reliability cur
 ves produced by CORP with the idea to diagnose calibration\, receiver oper
 ating characteristic (ROC) curves which evaluate discrimination ability an
 d Murphy curves for overall assessment&nbsp\; of predictive performance. I
 n their Theorem 3 the authors&nbsp\; show that if X and Z are calibrated p
 robabilistic forecasts for the binary outcome Y\, then the following state
 ments are&nbsp\; equivalent: (i) X is sharper than Z\; (ii) X dominates Z 
 in ROC sense and (iii) X dominates Z in Murphy sense.&nbsp\;\n&nbsp\;\nThe
 &nbsp\; area under the ROC curve (to be abbreviated&nbsp\; AUC) is a popul
 ar metric of the accuracy of quantitative diagnostic test. However\, the t
 raditional machine learning models trained with AUC are not well studied f
 or cost sensitive decision problems. The notable exeption is the work of H
 ern&aacute\;ndes-Orallo et al. (2013) who demonstrate that the&nbsp\; ROC 
 curves can be transformed into the cost space. This update is equivalent t
 o computing the area under the convex hull of ROC curves. Thus\, the&nbsp\
 ; AUC can be seen as the performance of the model with uniform cost distri
 bution\, being an unreasonable assumption for practical needs.\n&nbsp\;\nE
 xtending the idea of Hern&aacute\;ndes-Orallo et al. (2013)\, Shao et al. 
 (2023) introduced the&nbsp\; notion of weighted ROC curve in cost space jo
 ining the robustness of the model to the class distribution and cost distr
 ibution. In other words\, extending AUC to the non-uniform&nbsp\; cost-sen
 sitive learning. The authors&nbsp\; construct a new environment where the 
 costs are treated like a dataset to share out an arbitrary unknown cost di
 stribution and launch a weighted version of AUC (to be abbreviated WAUC) w
 here the cost distribution can be incorporated into its calculation via de
 cision threshold.&nbsp\;\n&nbsp\;\nThus\, Shao et al. (2023) develop the f
 ollowing&nbsp\; two-level algorithm&nbsp\; to bridge WAUC and cost: the in
 ner-level problem approximates the optimal threshold from sampling costs\,
  and the&nbsp\; outer-level problem minimizes the WAUC loss over the optim
 al threshold distribution. Such an advanced&nbsp\; approach fits better to
  the real world cost-sensitive scenario.&nbsp\;\n&nbsp\;\nTaking into acco
 unt the equivalent statements of Theorem 3 in Dimitriadis et al. (2024)\, 
 our goal is to apply the methodology suggested by Shao et al. (2023) in tw
 o directions:&nbsp\;\n&nbsp\;\n1. It is well-known that the Gini concentra
 tion index (to be denoted by G) is related to AUC as follows: G = 2AUC - 1
 . Our proposal is to use the Leimkhuler curve which\, in economics and rel
 iability context\, plots the cumulative proportion of total productivity a
 gainst the cumulative proportion of sourses arranged in decreasing order. 
 The area under the Leimkhuler curve\, to be denoted by AUL\, satisfies the
  same relation\, i.e.\, G = 2AUL - 1\, consult Burrell (1991) and Balakris
 hnan et al. (2010).&nbsp\; Moreover\, the Leimkhuler curve has similar sha
 pe as ROC curve: it begins at (0\,0) and terminates at (1\,1)\, being non-
 decreasing and concave.&nbsp\;\n&nbsp\;\nHence\, the Leimkhuler curve is a
  cumulative distribution function of a random variable having the Bradford
  distribution. We will use an analogous procedure as the one suggested by 
 Shao et al. (2023) with respect to the Leimkhuler curve to get the corresp
 onding cost space and compare their results obtained for WAUC procedure.\n
 &nbsp\;\n2. It is well known that the area below the Murphy curve is equal
  to the half of the mean Brier score and the area below the Brier curve (s
 ee Hern&aacute\;ndes-Orallo et al. (2011)) is equal to the&nbsp\; mean Bri
 er score. Then\, using again the methodology of Shao et al. (2023) we will
  extend the results of Hern&aacute\;ndes-Orallo et al. (2011) when the cos
 t distribution is different than the uniform one. Finally\, we will show t
 he corresponding weighted versions in cost space of Murphy curves.&nbsp\;&
 nbsp\;\n&nbsp\;\n&nbsp\;\nReferences:\n&nbsp\;\nBalakrishnan\, N.\, Sarabi
 a\, J.M. and Kolev\, N. (2010). A simple relation between Leimkhuler curve
  and the mean residual life.&nbsp\; Joutnal of Informetrics 4\, 602-607.\n
 &nbsp\;\nBurrell\, Q. (1991). The Bradford distribution and the Gini index
 . Scientometrics 21\, 181-194.&nbsp\;\n&nbsp\;\nDimitriadis T.\, Gneiting 
 T. and Jordan\, A. I. (2021). Stable reliability diagrams for probabilisti
 c classifiers. Proceedings of the National Academy of Sciences 118\, Artic
 le e2016191118.\n&nbsp\;\nDimitriadis\, T.\, Gneiting\, T.\, Jordan\, A. I
 .&nbsp\; and&nbsp\; Vogel\, P. (2024). Evaluating probabilistic classifier
 s: The triptych. International Journal of Forecasting 40\, 1101-1122.\n&nb
 sp\;\nHern&aacute\;ndez-Orallo J.\,&nbsp\; Flach P.&nbsp\; and&nbsp\; Ferr
 i C. (2011). Brier curves: A new cost-based visualizationof classifier per
 formance. In: Proceedings of the 28th International Conference on Machine 
 Learning.&nbsp\;\n&nbsp\;\nHern&aacute\;ndez-Orallo J.\,&nbsp\; Flach P.&n
 bsp\; and&nbsp\; Ferri C. (2013). ROC curves in cost space. Machine Learni
 ng&nbsp\; 93\, 71-91.&nbsp\;\n&nbsp\;\nShao\, H\, Qianqian X.\, Zhiyong Y.
 \, Peisong W.\, Peifeng G. and Qingming H.&nbsp\; (2023). Weighted ROC cur
 ve in cost space: Extending AUC to cost-sensitive learning.&nbsp\; In: Adv
 ances of Neural Information Processing Systems (NeurIPS 2023) 36\, 17257-1
 7368.&nbsp\;\n&nbsp\;
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR