BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Making the Most of Massive Clusters - Fiodar Kazhamiaka\, Stanford
DTSTART:20211202T150000Z
DTEND:20211202T160000Z
UID:TALK163282@talks.cam.ac.uk
CONTACT:Srinivasan Keshav
DESCRIPTION:Resource management systems play an important role in today’
 s large clusters\, allocating jobs/containers to compute resources while b
 alancing metrics like fairness\, efficiency\, and fault tolerance. Existin
 g management policies in systems such as Kubernetes\, VMWare’s DRS\, and
  Red Hat’s OpenShift rely on heuristic-based schedulers which often scal
 e well but are typically sub-optimal. This problem is made worse by the gr
 owing trend of heterogeneous clusters -- composed of a mix of several gene
 rations of CPUs\, GPUs\, etc. -- where existing heuristics perform poorly.
 \n\nThis talk will emphasize the environmental footprint of large resource
  clusters as a key motivation. I’ll first describe our work on allocatin
 g ML training jobs in heterogeneous clusters. A key insight is that many p
 opular scheduling objectives can be cast as mathematical optimization prob
 lems whose solutions can maximize cluster efficiency\; other systems take 
 a similar approach\, for example TetriSched and Facebook's RAS. However\, 
 optimization-based techniques are notorious for scaling poorly to massive 
 systems. To address this issue\, I will describe POP: a technique to parti
 tion the problem and quickly approximate the optimal allocation. POP reduc
 es solve times by several orders of magnitude with minimal performance los
 s across a wide range of problem domains\, including cluster scheduling an
 d network traffic engineering.\n\nBio:\nFiodar is currently a postdoc fell
 ow at the Stanford Future Data Systems lab\, working with Matei Zaharia an
 d Peter Bailis. His research interests span ML systems\, energy systems\, 
 and data science\, with a focus on finding practical solutions to fundamen
 tal problems. He obtained his PhD from the University of Waterloo\, where 
 his thesis on the optimization of solar panel and battery systems was reco
 gnized through the Cheriton Distinguished Dissertation award.
LOCATION:FW11 and https://cl-cam-ac-uk.zoom.us/j/97216272378?pwd=M2diTFhMT
 nppckJtNWhFVTBKK0REZz09
END:VEVENT
END:VCALENDAR
