A Consortium for Biologics Property Prediction via Federated and Active Learning
AbbVie • Amgen • AstraZeneca • Johnson & Johnson • UCB
Motivation and Opportunity
Biologics are complex and varied, making it hard to predict their developability and immunogenicity. Simply put, we need more extensive and diverse data to train better predictive models. However, the high costs of generating data for certain properties, like high-concentration viscosity measurements, mean that no single company has all the necessary data to produce highly accurate models. Federated learning (FL) offers a solution by enabling collaborative model training across multiple organizations without the need to share sensitive data. Active Learning, in turn, enables cost sharing on the acquisition of new data.
The FAITE consortium, will leverage federated and active learning to train models for predicting properties of biologics. It addresses data privacy challenges and enhances model performance, thereby reducing discovery costs and minimizing development risks by avoiding designs prone to late-stage failures. The consortium’s collaborative nature fosters innovation and optimizes resources, leading to the development of safer, more effective biologic therapies. This consortium positions the participating companies at the forefront of AI-driven drug discovery, significantly benefiting patient care and public health.
12 Month Pilot Project Underway
Biophysical Properties
- Viscosity
- Aggregation
- Thermal Stability
- Chemical Liability
Key Questions
- Is Federated Learning beneficial?
- Does Federated Learning outperform each company's baseline models?
- Can the effects of varying assay conditions be mitigated via multi-modality training?
- What is the relative return on investment (ROI) to a Federated Learning approach versus companies independently acquiring more training data?