Multi-fidelity machine learning models for improved high-throughput screening predictions
- đ¤ Speaker: David Buterez
- đ Date & Time: Tuesday 31 May 2022, 13:15 - 14:15
- đ Venue: Zoom
Abstract
High throughput screening (HTS) is one of the leading techniques for hit identification in drug discovery, being widely adopted in academia and industry. However, HTS is still regarded as a brute-force approach, with substantial costs and complexity involved in running large-scale screening campaigns. Even as industry-leading laboratories produce millions of measurements per HTS project, the resulting data are not fully understood and are usually not used as part of modern computational pipelines. Thus, these challenges require an interdisciplinary approach, and in particular it is desirable to leverage modern machine learning techniques to optimise the current workflows.
In this talk, I will discuss how we studied real-world HTS data from the public domain as well as in-house AstraZeneca data, aiming to answer questions regarding the benefits of integrating HTS data exhibiting different levels of noise (an aspect called ‘multi-fidelity’), as well as relating the computational insights with experimental details. As a first step, we assembled and curated a diverse collection of 60 public multi-fidelity datasets from PubChem, designed as a benchmark for machine learning applications in HTS . With the help of previously unexplored data and graph neural networks, we can now model a large and varied chemical space (up to 3 orders of magnitude higher than existing efforts) and integrate these signals into models of bioactivity prediction. I will present results showing that this integration leads to significant improvements in the majority of datasets, as well as discuss several effects and unique aspects of the proposed workflow. Finally, I will link conclusions made from modelling multi-million scale datasets with our recent work in graph representation learning.
Series This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series.
Included in Lists
- All Talks (aka the CURE list)
- Artificial Intelligence Research Group Talks (Computer Laboratory)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Department of Computer Science and Technology talks and seminars
- Guy Emerson's list
- Hanchen DaDaDash
- Interested Talks
- Martin's interesting talks
- ndk22's list
- ob366-ai4er
- PhD related
- rp587
- School of Technology
- Speech Seminars
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
- Zoom
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Tuesday 31 May 2022, 13:15-14:15