BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Benchmarking and evaluation in contemporary machine learning - Aus
 tin Tripp and Shoaib Siddiqui\, University of Cambridge
DTSTART:20221026T100000Z
DTEND:20221026T113000Z
UID:TALK189563@talks.cam.ac.uk
CONTACT:Elre Oldewage
DESCRIPTION:**Abstract**: Machine learning is primarily considered an empi
 rical field\, replying on experiments to compare methods and measure progr
 ess. These experiments often take the form of "benchmarks" with a standard
 ized setup and set of evaluation criteria. In this reading group we will d
 iscuss the advantages and disadvantages of this approach\, drawing largely
  from material from 3 papers (see below). These papers all describe differ
 ent undesirable aspects of the interplay between benchmarks and the machin
 e learning community\, particularly how benchmarks may not reward ideas ac
 cording to their "true" underlying potential. This calls for more care and
  thought when evaluating or judging any work based on the presented eviden
 ce in terms of benchmark results\, especially during the peer-review proce
 ss.\n\n\nThis reading group session will be a discussion (not a presentati
 on) on benchmarking and evaluation in machine learning\, drawing on conten
 t from 3 papers. While we encourage everybody to read all 3 papers (it sho
 uld take under 2 hours)\, we have picked out the most important subsection
 s of the different papers to make < 10 pages of light required reading (no
  math). Please do the reading before the reading group: the discussion wil
 l be much better if everybody is familiar with the key ideas of these pape
 rs. We've also shortlisted some "bonus" parts of the papers which are reco
 mmended but not required.\n\nThe discussion will be hybrid\, but the audio
  quality in the CBL seminar group can sometimes be low\, so be warned that
  if you join via Zoom it may be hard to participate fully in the discussio
 n.\n\n**Reading**:\n\n1. Testing heuristics: We have it all wrong (https:/
 /link.springer.com/article/10.1007/BF02430364)\n* Required: [beginning\, s
 ection 2). ~3 pages\n* Bonus: Section 4\n\n2. The Benchmark Lottery (http:
 //arxiv.org/abs/2107.07002)\n* Required: sections 1\, [2\, 2.1)\, [4\, 4.1
 )\, 5\, [6\, 6.1). ~5 pages\n* Bonus: section 7\n\n3. The hardware lottery
 : http://arxiv.org/abs/2009.06489\n* Required: abstract\n* Bonus: sections
  [1\, 3.1)\n\nWhere [A\, B) means read from A until start of B (i.e. exclu
 ding B)
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38
END:VEVENT
END:VCALENDAR