Astra on GPCRs: A 2,615-Receptor Benchmark

G protein-coupled receptors are the most-drugged protein family in biology and one of the hardest to characterize computationally. Roughly a third of marketed small-molecule drugs act through a GPCR — yet the family is riddled with deep, buried orthosteric pockets, conformationally plastic transmembrane bundles, dense post-translational regulation, and thermal-stability assays that don't scale to the mutational landscapes drug programs actually need to explore.

So the practical question for a GPCR team isn't "can AI say something about my receptor?" It's "can AI say something I can trust enough to spend reagents and assay time on?"

To answer that, we ran the entire Astra AI Suite across 2,615 reviewed GPCRs from UniProt Swiss-Prot and scored every model output against the strongest publicly available experimental reference. This post is the short version. The full per-model breakdown is the GPCR volume of the Orbion Model Performance Series, linked at the end.

Key Takeaways

2,615 reviewed GPCRs, with every model output scored against the strongest public experimental reference for its task.
Topology and function are reliable: transmembrane topology at AUROC 0.97 (F1 0.91); 99% identified as receptors and the GO function term in the top-5 for 99.2%.
The harder predictions hold up: PTM sites at F1 up to 0.94 across 39 modification classes; binding-pocket triage at 72% pocket-success on co-crystal receptors.
Thermostability is honest about its limit: 82% sign accuracy on strong destabilizers (ΔTm < −2 °C), with no claim made inside the ±2 °C assay-precision floor.
The weaknesses are documented openly — by receptor family and by prediction type.

The Benchmark

Cohort. 2,615 reviewed (Swiss-Prot) GPCRs.
References. Each output was compared to the strongest public experimental ground truth available for that task: Swiss-Prot annotation for sequence-level features (topology, disorder, PTM sites, function); PDB co-crystal contacts at 4 Å for ligand binding; and 154 mutations across 4 receptors with measured thermal-shift data for stability.
Reporting. Each capability is shown with its headline metric and its failure modes — we don't average the weaknesses away.

One distinction matters for reading the numbers: the classification, topology, and PTM metrics describe how the deployed models behave on the GPCR class in production, including proteins drawn from their training corpora. Thermostability is the exception — it's scored against independent experimental data.

Structure and Function: AUROC 0.97

The structural and functional calls are the most reliable.

Transmembrane topology agrees with UniProt-annotated TM segments at AUROC 0.97, F1 0.91. The disorder model reaches AUROC 0.93 on receptors with annotated disordered regions.
Function and family: 99% of the cohort is correctly identified as receptors; the correct GO molecular-function term lands in the top-5 for 99.2% of receptors and top-1 for 89.3%.

These are dependable triage calls on curated sequences — the backbone everything downstream relies on.

PTM Sites and Pockets: 0.94 and 72%

This is where computational characterization usually starts to fail.

PTM sites: F1 up to 0.94 for N-linked glycosylation and disulfide bonds, 0.78 for S-palmitoylation, across all 39 modification classes — each reported at two operating points (high-precision and high-recall).
Binding pockets: 72% pocket-success (predicted pocket overlaps a known ligand-contact residue set) on the 223 receptors with co-crystal data, and 64% recall on PDB-observed ligand identities.

This is pocket-level triage, not atomic-contact prediction — the output tells you which sites and pockets to spend reagents on first, not a docking pose.

Thermostability: 82% Where It Counts

Thermostability is scored against independent experimental data, and the answer has a boundary in it.

On strong destabilizing mutations (experimental ΔTm < −2 °C, n = 50), the model predicts the sign of the effect with 82% accuracy — a pre-screen to drop risky variants before committing to CPM-style thermal-shift assays. Across all 154 mutations on four receptors, the aggregate is MAE 2.71 °C and 69% sign accuracy — on the order of the experimental uncertainty of the assay itself.

The boundary: below ±2 °C, the prediction isn't actionable — not because the model fails, but because the measurement does. Thermal-shift assays on detergent-solubilized GPCRs have a precision floor around ±2 °C; inside that band there's no reliable ground truth. The model is accurate exactly where the assay can measure, which is the part that matters for ranking stabilizing mutations.

Where It's Uneven

The story below the headlines isn't uniform:

Aminergic Class A receptors are the strongest family for binding-pocket prediction and the weakest for phosphorylation-site prediction.
Glycoprotein-hormone and adhesion receptors — where ligand binding happens in an extracellular domain, not the canonical TM pocket — are outside the current scope of the binding model.
Below |ΔTm| < 2 °C, thermostability predictions aren't actionable: a limit of the assay, not the model.

One Receptor, End-to-End: Adenosine A2A

Aggregate metrics are abstract. The whitepaper walks the whole suite through a single receptor — the adenosine A2A receptor (P29274) — to show what a program team does with the integrated output:

Confirm the 7-transmembrane topology and map the regulation-prone regions.
Flag the glycosylation and palmitoylation sites that drive trafficking.
Triage the orthosteric pocket for the binding campaign.
Pre-screen stabilizing mutations before booking thermal-shift assays.

The value isn't replacing the experiment — it's compressing the read on a target from months of trial-and-error to a prioritized starting point, so reagents and assay time go where they're most likely to pay off.

Why This Matters for GPCR Programs

Much of the GPCR family stays underexploited for structural reasons, not biological ones. Over a hundred receptors remain orphans; many therapeutically interesting targets have wild-type sequences too unstable to express, purify, or crystallize — so no structure and no screening campaign ever begins. The difference between an intractable target and a pipeline asset often comes down to a handful of stabilizing mutations and a correct read of its pockets and modification sites.

Read the Full Benchmark

The GPCR volume reports every model's headline performance, its failure modes, and a decision framework mapping common GPCR program questions to the model combinations that answer them.

→ Read the full GPCR benchmark: https://www.orbion.life/research/gpcr-performance

First volume of the Orbion Model Performance Series. Transporters, Enzymes, and Ion Channels follow.

References & Sources

Sriram, K. & Insel, P. A. G Protein-Coupled Receptors as Targets for Approved Drugs: How Many Targets and How Many Drugs? Molecular Pharmacology 93(4):251–258 (2018). doi:10.1124/mol.117.111062 — GPCR share of approved drugs.
Alexandrov, A. I. et al. Microscale fluorescent thermal stability assay for membrane proteins. Structure 16(3):351–359 (2008). doi:10.1016/j.str.2008.02.004 — the CPM thermal-shift assay.
Magnani, F. et al. Co-evolving stability and conformational homogeneity of the human adenosine A2A receptor. PNAS 105(31):10744–10749 (2008). doi:10.1073/pnas.0804396105 — experimental thermostability reference (A2A).
Warne, T. et al. Structure of a β1-adrenergic G-protein-coupled receptor. Nature 454:486–491 (2008). doi:10.1038/nature07101 — experimental thermostability reference (β1-AR).
The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research 51(D1):D523–D531 (2023). doi:10.1093/nar/gkac1052 — Swiss-Prot reference annotations.
Burley, S. K. et al. RCSB Protein Data Bank. Nucleic Acids Research 47(D1):D464–D474 (2019). doi:10.1093/nar/gky1004 — co-crystal ligand-contact references.
Full per-model methodology and metrics: Orbion, Astra AI on GPCRs — Model Performance Series (2026) — https://www.orbion.life/research/gpcr-performance