Astra on Enzymes: A 6,304-Enzyme Benchmark

Enzymes are the catalysts of biology and the most heavily exploited drug-target class in the history of medicine — the kinase inhibitors of oncology, the protease inhibitors of antiviral therapy, and the metabolic-enzyme blockers behind the statins and the gliptins all act on this class. Enzymes are defined not by a shared fold but by the chemistry they perform, organized into the seven top-level classes of the Enzyme Commission. Three properties make them a distinctive computational target: function is the defining question (which reaction, not which shape); catalysis localizes to a handful of active-site residues; and the class is predominantly soluble.

So the practical question for an enzyme-engineering team is concrete: can AI predict the things that actually move a campaign — function, the active site, and stability — well enough to act on?

To answer it, we ran the entire Astra AI Suite across 6,304 reviewed enzymes from UniProt Swiss-Prot and scored every output against the strongest publicly available experimental reference. This is the class where the suite is strongest. The full per-model breakdown is the Enzymes volume of the Orbion Model Performance Series, linked at the end.

Key Takeaways

6,304 catalytic proteins — the class where the suite performs best.
Thermostability is the flagship result: on a deep scan of bacteriophage T4 lysozyme — the canonical protein-stability reference — Spearman ρ 0.93 (n=315); across the full enzyme set, ρ 0.88 with 90% directional accuracy on strong-effect mutations.
Function leads with the EC head: 95.3% recognized as enzymatic across all seven EC classes; GO molecular-function in the top-5 for 94.2% of enzymes.
Active-site pockets are carried by cofactor chemistry: 62% recall and 82% pocket-success on 2,318 co-crystal enzymes, up to residue F1 1.00 on well-defined cofactor and nucleotide sites.
Phosphorylation — the dominant enzyme modification — is ranked at AUROC 0.95.

The Benchmark

Cohort. 6,304 reviewed (Swiss-Prot) enzyme-associated proteins.
References. Swiss-Prot annotation for sequence-level features; PDB co-crystal contacts at 4 Å for ligand binding; and 644 mutations across deeply-scanned enzymes with curated experimental thermal-shift data for stability.
Reporting. Each capability is shown with its headline metric and its failure modes.

The classification, topology, and PTM metrics describe production behavior on real enzyme targets, including proteins drawn from their training corpora. Thermostability is the exception — it's scored against independent experimental data.

Thermostability: ρ 0.93 — The Suite's Strongest Result

The single most valuable capability for enzyme engineering is also where the suite reaches its strongest correlation anywhere. On a deep mutational scan of bacteriophage T4 lysozyme — the canonical protein-stability reference — predicted ΔTm tracks measured ΔTm at Spearman ρ 0.93 (n=315). Across the full enzyme set, the correlation holds at ρ 0.88, with 90% directional accuracy on strong-effect mutations.

For a stability-engineering campaign, that is the difference between guessing and ranking: a reliable pre-screen that puts the mutations most likely to stabilize (or destabilize) at the top of the list before any thermal-shift assay is booked.

Function and EC Classification: 95.3%

Function is the defining question for an enzyme, and it leads with the right tool. 95.3% of the cohort is recognized as enzymatic via the EC head, distributed across all seven EC classes in proportions that track the enzyme universe (transferases and hydrolases dominate). GO molecular-function accuracy is top-5 94.2%, top-1 84.0%.

Active Sites and Pockets: 82%

Catalysis localizes to a handful of residues, and pocket prediction is carried by the chemically distinct sites that define catalytic centers. Across the 2,318 enzymes with co-crystal data, the model reaches 62% ligand-identity recall and 82% pocket-success, and is strongest on the well-defined cofactor and nucleotide pockets — iron/2-oxoglutarate dioxygenases, kinase ATP sites, FAD/NAD flavoenzymes — where residue F1 reaches 1.00. On the PTM side, phosphorylation, the dominant enzyme modification (over 11,000 sites across the cohort), is ranked at AUROC 0.95.

Topology

The class is predominantly soluble — ~78% carry no transmembrane segment — and the model routes them as such rather than imposing a membrane template. On the 1,345 membrane enzymes, per-residue topology agrees with UniProt annotation at AUROC 0.95 (median 0.99), uniform across all seven EC classes.

Where It's Uneven

Enzymes are where the suite is strongest, but the report still marks the edges: rare PTM classes, the broad protein-category head, and extreme-effect ΔTm outliers are weaker, and we document each directly rather than averaging them in.

One Enzyme, End-to-End: FYN kinase

The whitepaper walks the whole suite through a single target — the tyrosine-protein kinase FYN (UniProt P06241), a Src-family signaling enzyme — to show what a program team does with the integrated output:

Classify the enzyme and its EC family from sequence.
Localize the ATP-binding active site for the inhibitor campaign.
Flag the phosphorylation and regulatory modification sites.
Pre-screen stabilizing mutations using the ΔTm model before booking assays.

(FYN is a cytoplasmic, non-receptor kinase — a useful case for showing how the suite separates a signaling enzyme from a membrane receptor.)

Why This Matters for Enzyme Engineering

Stability is the bottleneck in most enzyme-engineering campaigns — for biocatalysis, for therapeutics, for diagnostics. A ΔTm model that reaches ρ 0.93 on the canonical stability benchmark turns a months-long iterative scan into a ranked shortlist, and pairs it with a correct read of the enzyme's function and active site.

Read the Full Benchmark

The Enzymes volume reports every model's headline performance, its failure modes, and the per-EC-class breakdown behind the headline numbers.

→ Read the full Enzymes benchmark: https://www.orbion.life/research/enzyme-performance

Part of the Orbion Model Performance Series — alongside GPCRs, Transporters, and Ion Channels.

References & Sources

Matthews, B. W. Studies on protein stability with T4 lysozyme. Advances in Protein Chemistry 46:249–278 (1995). doi:10.1016/S0065-3233(08)60337-X — the canonical T4 lysozyme stability reference.
McDonald, A. G. & Tipton, K. F. Enzyme nomenclature and classification: the state of the art. FEBS Journal 290(9):2214–2231 (2023). doi:10.1111/febs.16274 — EC classification.
Stourac, J. et al. FireProtDB: database of manually curated protein stability data. Nucleic Acids Research 49(D1):D319–D324 (2021). doi:10.1093/nar/gkaa981 — curated stability data.
Nikam, R. et al. ProThermDB: thermodynamic database for proteins and mutants revisited. Nucleic Acids Research 49(D1):D420–D424 (2021). doi:10.1093/nar/gkaa1035 — thermodynamic data.
Niesen, F. H., Berglund, H. & Vedadi, M. The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability. Nature Protocols 2(9):2212–2221 (2007). doi:10.1038/nprot.2007.321 — the thermal-shift (DSF) method.
The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research 51(D1):D523–D531 (2023). doi:10.1093/nar/gkac1052 — Swiss-Prot reference annotations.
Burley, S. K. et al. RCSB Protein Data Bank. Nucleic Acids Research 47(D1):D464–D474 (2019). doi:10.1093/nar/gky1004 — co-crystal ligand-contact references.
Full per-model methodology and metrics: Orbion, Astra AI on Enzymes — Model Performance Series (2026) — https://www.orbion.life/research/enzyme-performance