Benchmarking and evaluation in contemporary machine learning
- đ¤ Speaker: Austin Tripp and Shoaib Siddiqui, University of Cambridge
- đ Date & Time: Wednesday 26 October 2022, 11:00 - 12:30
- đ Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38
Abstract
Abstract: Machine learning is primarily considered an empirical field, replying on experiments to compare methods and measure progress. These experiments often take the form of “benchmarks” with a standardized setup and set of evaluation criteria. In this reading group we will discuss the advantages and disadvantages of this approach, drawing largely from material from 3 papers (see below). These papers all describe different undesirable aspects of the interplay between benchmarks and the machine learning community, particularly how benchmarks may not reward ideas according to their “true” underlying potential. This calls for more care and thought when evaluating or judging any work based on the presented evidence in terms of benchmark results, especially during the peer-review process.
This reading group session will be a discussion (not a presentation) on benchmarking and evaluation in machine learning, drawing on content from 3 papers. While we encourage everybody to read all 3 papers (it should take under 2 hours), we have picked out the most important subsections of the different papers to make < 10 pages of light required reading (no math). Please do the reading before the reading group: the discussion will be much better if everybody is familiar with the key ideas of these papers. We’ve also shortlisted some “bonus” parts of the papers which are recommended but not required.
The discussion will be hybrid, but the audio quality in the CBL seminar group can sometimes be low, so be warned that if you join via Zoom it may be hard to participate fully in the discussion.
Reading:
1. Testing heuristics: We have it all wrong (https://link.springer.com/article/10.1007/BF02430364)- Required: [beginning, section 2). 3 pages
- Bonus: Section 4
- Required: sections 1, [2, 2.1), [4, 4.1), 5, [6, 6.1). 5 pages
- Bonus: section 7
- Required: abstract
- Bonus: sections [1, 3.1)
Where [A, B) means read from A until start of B (i.e. excluding B)
Series This talk is part of the Machine Learning Reading Group @ CUED series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Cambridge University Engineering Department, CBL Seminar room BE4-38
- Cambridge University Engineering Department Talks
- Centre for Smart Infrastructure & Construction
- Chris Davis' list
- Computational Continuum Mechanics Group Seminars
- custom
- Featured lists
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Journal Clubs
- Inference Group Summary
- Information Engineering Division seminar list
- Interested Talks
- Machine Learning Reading Group
- Machine Learning Reading Group @ CUED
- Machine Learning Summary
- ML
- ndk22's list
- ob366-ai4er
- Quantum Matter Journal Club
- Required lists for MLG
- rp587
- School of Technology
- Simon Baker's List
- TQS Journal Clubs
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Wednesday 26 October 2022, 11:00-12:30