BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Multi-fidelity machine learning models for improved high-throughpu
 t screening predictions - David Buterez
DTSTART:20220531T121500Z
DTEND:20220531T131500Z
UID:TALK173027@talks.cam.ac.uk
CONTACT:Mateja Jamnik
DESCRIPTION:"Join us on Zoom":https://zoom.us/j/99166955895?pwd=SzI0M3pMVE
 kvNmw3Q0dqNDVRalZvdz09\n\nHigh throughput screening (HTS) is one of the le
 ading techniques for hit identification in drug discovery\, being widely a
 dopted in academia and industry. However\, HTS is still regarded as a brut
 e-force approach\, with substantial costs and complexity involved in runni
 ng large-scale screening campaigns. Even as industry-leading laboratories 
 produce millions of measurements per HTS project\, the resulting data are 
 not fully understood and are usually not used as part of modern computatio
 nal pipelines. Thus\, these challenges require an interdisciplinary approa
 ch\, and in particular it is desirable to leverage modern machine learning
  techniques to optimise the current workflows.\n\nIn this talk\, I will di
 scuss how we studied real-world HTS data from the public domain as well as
  in-house AstraZeneca data\, aiming to answer questions regarding the bene
 fits of integrating HTS data exhibiting different levels of noise (an aspe
 ct called 'multi-fidelity')\, as well as relating the computational insigh
 ts with experimental details. As a first step\, we assembled and curated a
  diverse collection of 60 public multi-fidelity datasets from PubChem\, de
 signed as a benchmark for machine learning applications in HTS. With the h
 elp of previously unexplored data and graph neural networks\, we can now m
 odel a large and varied chemical space (up to 3 orders of magnitude higher
  than existing efforts) and integrate these signals into models of bioacti
 vity prediction. I will present results showing that this integration lead
 s to significant improvements in the majority of datasets\, as well as dis
 cuss several effects and unique aspects of the proposed workflow. Finally\
 , I will link conclusions made from modelling multi-million scale datasets
  with our recent work in graph representation learning.
LOCATION:Zoom
END:VEVENT
END:VCALENDAR