Model Update — AstraROLE2 + AstraSUIT2
Aug 18, 2025
When you work with AI to design or modify proteins, it is easy to drift away from macroscopic target properties. Looking at the protein on amino-acid level, you can miss the forest for the trees, ending up with something that is well-optimised on residue level but lost key protein-level functions. This is why we created AstraROLE and AstraSUIT - protein-level prediction models that provide key context and sense-check for a protein based on its sequence. Within the Astra family of models, they serve as a grounding force, providing feedback on the outputs of other models, - and they have just gotten a boost! 🚀
Now with even more labels and improved performance, AstraROLE2 and AstraSUIT2 predict key protein-level properties, including:
Pathway membership and GO terms
Enzymatic activity and top-level EC class
Associated organism category
Subcellular localisation and membrane association type
Cofactor binding
Quaternary structure and stoichiometry
AstraROLE2 and AstraSUIT2 showed very good performance, often superior to state-of-the-art AI models with similar output categories. They also show better performance versus the first ROLE and SUIT versions, and have a substantially expanded library of labels, including more granular cofactor identification and quaternary structure properties. 🚀 Notably, the models performed well even when tested on highly novel proteins that were not included in the training data.
These models are positioned to provide a top-line overview of a protein based on the amino acid sequence. This is important in various fields including environmental DNA analysis, dark genome exploration, checking novel or modified amino-acid sequences, as well as to provide context to more complicated protein AI models. 💡
Watch this space for case studies and further updates! Check the comments for a link to the BioRxiv preprint and to the AstraROLE2 and AstraSUIT2 demo.
BioRxiv preprint: https://www.biorxiv.org/content/10.1101/2025.06.21.660734v2