University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > Benchmarking and evaluation in contemporary machine learning

Log in

Google

Microsoft

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Benchmarking and evaluation in contemporary machine learning

Download to your calendar using vCal

Austin Tripp and Shoaib Siddiqui, University of Cambridge
Wednesday 26 October 2022, 11:00-12:30
Cambridge University Engineering Department, CBL Seminar room BE4-38.

If you have a question about this talk, please contact Elre Oldewage .

Abstract: Machine learning is primarily considered an empirical field, replying on experiments to compare methods and measure progress. These experiments often take the form of “benchmarks” with a standardized setup and set of evaluation criteria. In this reading group we will discuss the advantages and disadvantages of this approach, drawing largely from material from 3 papers (see below). These papers all describe different undesirable aspects of the interplay between benchmarks and the machine learning community, particularly how benchmarks may not reward ideas according to their “true” underlying potential. This calls for more care and thought when evaluating or judging any work based on the presented evidence in terms of benchmark results, especially during the peer-review process.

This reading group session will be a discussion (not a presentation) on benchmarking and evaluation in machine learning, drawing on content from 3 papers. While we encourage everybody to read all 3 papers (it should take under 2 hours), we have picked out the most important subsections of the different papers to make < 10 pages of light required reading (no math). Please do the reading before the reading group: the discussion will be much better if everybody is familiar with the key ideas of these papers. We’ve also shortlisted some “bonus” parts of the papers which are recommended but not required.

The discussion will be hybrid, but the audio quality in the CBL seminar group can sometimes be low, so be warned that if you join via Zoom it may be hard to participate fully in the discussion.

Reading:

1. Testing heuristics: We have it all wrong (https://link.springer.com/article/10.1007/BF02430364)

Required: [beginning, section 2). _{3 pages}
Bonus: Section 4

2. The Benchmark Lottery (http://arxiv.org/abs/2107.07002)

Required: sections 1, [2, 2.1), [4, 4.1), 5, [6, 6.1). 5 pages
Bonus: section 7

3. The hardware lottery: http://arxiv.org/abs/2009.06489

Required: abstract
Bonus: sections [1, 3.1)

Where [A, B) means read from A until start of B (i.e. excluding B)

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Benchmarking and evaluation in contemporary machine learning

📅 Download to calendar (vCal)

👤 Speaker: Austin Tripp and Shoaib Siddiqui, University of Cambridge
📅 Date & Time: Wednesday 26 October 2022, 11:00 - 12:30
📍 Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38

Questions? Contact Elre Oldewage

Abstract

The discussion will be hybrid, but the audio quality in the CBL seminar group can sometimes be low, so be warned that if you join via Zoom it may be hard to participate fully in the discussion.

Reading:

1. Testing heuristics: We have it all wrong (https://link.springer.com/article/10.1007/BF02430364)

Required: [beginning, section 2). _{3 pages}
Bonus: Section 4

2. The Benchmark Lottery (http://arxiv.org/abs/2107.07002)

Required: sections 1, [2, 2.1), [4, 4.1), 5, [6, 6.1). 5 pages
Bonus: section 7

3. The hardware lottery: http://arxiv.org/abs/2009.06489

Required: abstract
Bonus: sections [1, 3.1)

Where [A, B) means read from A until start of B (i.e. excluding B)

Series This talk is part of the Machine Learning Reading Group @ CUED series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Benchmarking and evaluation in contemporary machine learning

This talk is included in these lists:

Benchmarking and evaluation in contemporary machine learning

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Benchmarking and evaluation in contemporary machine learning

This talk is included in these lists:

Other lists

Other talks

Benchmarking and evaluation in contemporary machine learning

Abstract

Included in Lists