Blog

From Structure to Experiment: What Computational Biologists Miss

Jan 30, 2026

Your collaborator ran AlphaFold. They sent you a beautiful PDB file and a confident email: "Here's the structure. Let me know when you have crystals." Three months later, you still don't have soluble protein. The structure prediction was perfect. Everything else was missing.

This is the translation gap—the space between computational predictions and experimental reality. It's where projects stall, collaborations fracture, and beautiful in silico work dies in the test tube.

Key Takeaways

A structure is not a protocol: Knowing the 3D fold doesn't tell you how to make the protein
Context is everything: Expression system, construct boundaries, purification strategy—none of this comes from structure prediction
Computational biologists optimize for prediction accuracy: Wet lab success requires optimizing for expressibility, stability, and yield
The handoff fails when assumptions go unstated: Both sides need shared understanding of requirements
Translation requires translation: Someone has to convert computational output into experimental input

The Gap Nobody Talks About

The Computational View

From a computational perspective, the problem is solved:

Structure predicted with high confidence
Binding site identified
Mutations suggested for optimization
Paper-ready figures generated

Deliverables produced: PDB file, prediction confidence metrics, annotated images.

The Experimental View

From an experimental perspective, the work hasn't started:

What expression system do I use?
Where should I truncate the sequence?
What tag do I add, and where?
What's the purification strategy?
How do I know if the protein is functional?

Deliverables needed: Expressible construct, purification protocol, functional assay, milligrams of pure protein.

The Gap

The structure prediction provides the endpoint (what the protein looks like). The experiment requires the pathway (how to get there).

This gap is not technical—it's conceptual. Computational and experimental biologists often speak different languages about the same protein.

The Five Things Computational Predictions Don't Tell You

1. How to Express the Protein

What the prediction shows:

The folded structure of the full-length sequence
Confidence scores (pLDDT, PAE)
Domain architecture

What it doesn't show:

Whether E. coli, yeast, insect, or mammalian cells are required
Whether the full-length protein will express at all
What happens to disordered regions during expression
Which expression conditions (temperature, media, induction) to use

The consequence:

A collaborator receives a beautiful structure of a glycoprotein. They assume (because nothing says otherwise) that E. coli will work. They spend 8 weeks making expression constructs that will never produce functional protein because the glycans are essential for folding.

What should be communicated:

"This protein has N-glycosylation sites at positions X, Y, Z. It will require eukaryotic expression."
"Residues 1-50 are disordered and aggregation-prone. Consider truncation."
"This is a membrane protein. Standard expression will produce inclusion bodies."

2. Where to Set Construct Boundaries

What the prediction shows:

The full-length structure
Which regions are confident vs. uncertain (pLDDT)
Domain boundaries (sometimes)

What it doesn't show:

Whether disordered termini are functional or just aggregation liabilities
Where exactly to cut without hitting structured regions
Whether domain boundaries are cleavable or essential for folding

The consequence:

A structure shows a protein with a long N-terminal tail (pLDDT < 30). The experimentalist interprets this as "disordered, remove it." But the tail contains a regulatory phosphorylation site that the computational analysis didn't highlight. The truncated protein is stable but unregulated—and the data is uninterpretable.

What should be communicated:

"Residues 1-45 are disordered. Consider truncation, BUT residues 23-25 are a known regulatory motif."
"The interdomain linker (residues 180-210) is flexible but essential for domain communication."
"Construct boundaries should be residues 46-350 for stable, functional protein."

3. What Tags and Fusions to Use

What the prediction shows:

The protein structure in isolation
N and C-terminal accessibility

What it doesn't show:

Whether an N-terminal tag will interfere with function
Whether the C-terminus is buried (tag will disrupt fold)
Which fusion partners improve expression vs. cause problems
Whether tags can be cleaved (accessibility of protease site)

The consequence:

A collaborator says "add a His-tag for purification." The experimentalist adds an N-terminal His-tag. But the N-terminus is essential for membrane insertion—now the protein doesn't localize correctly. The purification "works" (protein is pure), but the functional assay fails.

What should be communicated:

"N-terminus inserts into membrane. Use internal tag or C-terminal tag."
"MBP fusion recommended for solubility. Place TEV site at residue 45."
"This protein tends to aggregate. Consider SUMO fusion with cleavable linker."

4. How to Purify and Store the Protein

What the prediction shows:

Protein structure
Surface properties (charge, hydrophobicity) if analyzed

What it doesn't show:

What buffers stabilize the protein
What detergents are required (for membrane proteins)
Whether the protein survives freeze-thaw
What concentration is achievable before aggregation
Whether cofactors need to be added

The consequence:

The structure is predicted. The experimentalist expresses the protein, runs affinity purification, elutes into standard buffer. The protein crashes out during concentration. Or it survives purification but is dead by the next morning. The computational collaborator asks "why is this taking so long?"

What should be communicated:

"This protein has a zinc-binding site. Include zinc in purification buffer."
"Predicted aggregation above 2 mg/mL. Keep dilute."
"Flash-freeze in 10% glycerol immediately after purification."

5. What Functional Assay Validates Success

What the prediction shows:

Structure (what it looks like)
Binding sites (where it might act)
Annotations (GO terms, EC numbers)

What it doesn't show:

How to measure that the protein is active
What substrate to use
What positive control validates the assay
Whether the predicted binding site is accessible in your construct

The consequence:

Protein is expressed, purified, concentrated, and handed back to the computational collaborator. "Great, now confirm the binding site with mutagenesis." But the experimentalist has no idea how to measure binding. There's no established assay. The protein sits in the freezer while both sides wait for the other to figure it out.

What should be communicated:

"This is a kinase. Activity can be measured with ADP-Glo assay using peptide substrate X."
"This is a GPCR. Functional reconstitution requires G-protein coupling or radioligand binding."
"Binding affinity can be measured by SPR with ligand Y."

The Translation Checklist

Before handing off a computational prediction to an experimentalist, provide:

Essential Context

[ ] Expression system recommendation (and why)
[ ] Construct boundaries (start/end residues, domain definitions)
[ ] PTM requirements (glycosylation, phosphorylation, disulfides)
[ ] Tag placement (N-term, C-term, internal, or none)
[ ] Known instability issues (aggregation risk, cofactor requirements)

Helpful Additions

[ ] Buffer recommendations (pH, salt, additives)
[ ] Purification strategy (affinity, ion exchange, gel filtration order)
[ ] Expected behavior (monomeric? oligomeric? membrane-associated?)
[ ] Functional assay (how to validate the protein works)
[ ] Positive controls (known binders, substrates, activities)

Reality Check

[ ] What could go wrong (top 3 failure modes)
[ ] Backup plans (if E. coli fails, try insect cells; if full-length fails, try domain)
[ ] Success criteria (how do we know it worked?)

Common Miscommunications

"The structure looks great"

What the computational biologist means: AlphaFold confidence is high. The prediction is reliable.

What the experimentalist hears: The protein will be easy to produce.

Reality: Prediction confidence has no correlation with expression difficulty. A confidently predicted GPCR is still a GPCR—notoriously difficult to express.

"The protein should be stable"

What the computational biologist means: The fold is thermodynamically favorable.

What the experimentalist hears: The protein won't aggregate or lose activity.

Reality: Thermodynamic stability (the fold is favorable) is different from kinetic stability (the protein survives handling). A stable fold can still unfold during expression, aggregation during purification, or deactivate during storage.

"Just add a His-tag"

What the computational biologist means: Affinity purification is the standard approach.

What the experimentalist hears: Any His-tag placement will work.

Reality: Tag placement matters enormously. N-terminal, C-terminal, and internal tags all have different effects on expression, folding, and function. The "just" implies a simplicity that doesn't exist.

"The binding site is here"

What the computational biologist means: Based on prediction/annotation, residues X-Y form the binding site.

What the experimentalist hears: Mutagenesis of residues X-Y will confirm binding.

Reality: Predicted binding sites may not be accessible in all constructs/conformations. The experimentalist needs to know whether the site is surface-exposed, occluded, or only formed upon conformational change.

"Let me know when you have protein"

What the computational biologist means: Call me when you're ready for the next computational step.

What the experimentalist hears: Figure out expression yourself; it's not my problem.

Reality: Expression optimization may require iterative computational input (new constructs, fusion designs, mutation suggestions). A single handoff rarely works.

Bridging the Gap

For Computational Biologists

Before sending predictions:

State your assumptions explicitly
- "I'm assuming you have access to insect cell expression."
- "This analysis assumes you want full-length protein."
- "I'm not sure about the N-terminal region; it might need truncation."
Provide context, not just coordinates
- Include PTM predictions
- Flag disordered regions
- Note aggregation risks
- Suggest expression system
Offer iterative support
- "If this construct doesn't express, let me know—I can suggest alternatives."
- "If you need to truncate, I can analyze where the stable boundaries are."
Learn basic expression constraints
- Know when E. coli won't work
- Understand why membrane proteins are hard
- Appreciate that purification is non-trivial

For Experimentalists

Before starting work:

Ask clarifying questions
- "What expression system do you recommend?"
- "Are there PTMs I need to consider?"
- "Is the full-length protein necessary, or can I truncate?"
Communicate constraints
- "I don't have access to mammalian cell expression."
- "Our lab only does E. coli."
- "I need 10 mg for crystallization—is that realistic?"
Report failures informatively
- "The protein expressed but aggregated during concentration."
- "I got soluble protein, but activity assay showed no function."
- "Expression was low—I suspect the disordered N-terminus."
Request specific computational help
- "Can you predict where to truncate?"
- "What mutations might improve solubility?"
- "Is there an alternative construct that might express better?"

For Everyone

Establish shared understanding:

What does success look like?
Who is responsible for what?
What's the timeline and decision points?
How will we communicate problems?

Case Study: The Collaborative Translation

The Setup

Target: Human membrane protein implicated in neurodegeneration Goal: Structural characterization for drug discovery Team: Computational group + structural biology lab

The Failed Approach

Email from computational:

"Attached is the AlphaFold structure. The binding site is in the extracellular domain. Let me know when you have crystals."

What happened:

Structural lab tried E. coli expression (failed—membrane protein)
Tried insect cells (low yield, aggregation)
Tried mammalian cells (slight improvement)
8 months of troubleshooting with no input from computational side

The Successful Approach

Revised communication:

"This is a single-pass membrane protein. Expression recommendations:
System: Sf9 insect cells with baculovirus (HEK293 as backup)
Construct: Residues 1-350 (full TM + extracellular domain)
Tag: C-terminal His8 (N-terminus has signal peptide)
Note: Extracellular domain alone (residues 51-350) is an alternative if full-length fails
Risks: Two N-glycosylation sites (N123, N245) are predicted—may need mutations to N123Q/N245Q for crystallization
Binding site: Residues 180-220 form the predicted pocket—keep these intact
Validation: Ligand X binds with ~100 nM Kd (use for functional assay)"

What happened:

First attempt (Sf9, full-length): Low yield, microcrystals only
Second attempt (HEK293, glycan mutants): Better yield, diffraction to 2.5 Å
Structure solved in 4 months

The Difference

The successful approach translated computational output into experimental parameters. It anticipated problems, provided alternatives, and enabled rapid iteration.

The Translation Protocol

Step 1: Generate Computational Analysis

Structure prediction
PTM prediction
Binding site identification
Disorder and aggregation assessment
Expression suitability analysis

Step 2: Translate to Experimental Parameters

Computational Output	Experimental Translation
pLDDT < 50 regions	Candidate truncation points
PTM sites predicted	Expression system requirements
Membrane topology	Detergent and lipid needs
Aggregation hotspots	Solubility optimization priority
Binding site residues	Mutagenesis targets for validation
Cofactor requirements	Buffer additives

Step 3: Generate Experimental Recommendations

Specific construct (residues X to Y)
Expression system (with justification)
Tag type and placement
Purification strategy outline
Storage recommendations
Functional assay suggestion

Step 4: Iterate

Experimentalist attempts production
Reports outcome (success/failure/partial)
Computational side analyzes failure mode
Generates revised recommendations
Repeat until success

The Emerging Standard

What Modern Platforms Should Provide

The gap between prediction and experiment isn't being solved by better predictions. It's being solved by better context.

Modern protein research platforms should deliver:

Integrated analysis - Not just structure, but PTMs, topology, disorder, aggregation, and expression suitability
Explicit recommendations - Not just "here's what the protein looks like," but "here's how to make it"
Protocol generation - Bench-ready outputs that translate computational insights into experimental steps
Iterative support - Tools to rapidly generate alternative constructs when the first attempt fails

The Goal

The goal isn't to replace experimental expertise. It's to give experimentalists the context they need to succeed on the first attempt—or at least the second, rather than the tenth.

The Bottom Line

Structure prediction was never the finish line. It's the starting line.

What Prediction Gives You	What Experiment Requires
Coordinates	Expressible construct
Confidence scores	Purification protocol
Annotations	Functional validation
Pretty pictures	Milligrams of pure protein

The gap between these two columns is where projects fail. Bridging that gap requires:

Explicit communication of context and assumptions
Translation of computational outputs into experimental parameters
Iterative collaboration, not one-time handoffs
Tools that provide context, not just predictions

The most common reason structural biology projects fail isn't bad predictions or bad experiments. It's bad translation between the two.

Context-Aware Experimental Planning

For researchers working across the computational-experimental divide, platforms like Orbion aim to close this gap by providing:

Integrated context analysis that combines structure, PTMs, binding sites, and stability predictions
Expression suitability assessment that recommends appropriate systems based on protein properties
Protocol generation that translates computational insights into bench-ready experimental plans
Iterative optimization through construct and mutation analysis when initial attempts fail

The goal is to make "here's the structure" into "here's how to make the protein"—before the experimentalist spends months discovering the hard way what the computational analysis could have predicted.

References

Jumper J, et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596:583-589. Link
Tunyasuvunakool K, et al. (2021). Highly accurate protein structure prediction for the human proteome. Nature, 596:590-596. Link
Structural Genomics Consortium. (2008). Protein production and purification. Nature Methods, 5:135-146. Link
Gräslund S, et al. (2008). Protein production and purification. Nature Methods, 5:135-146. Link
Savitsky P, et al. (2010). High-throughput production of human proteins for crystallization: The SGC experience. Journal of Structural Biology, 172(1):3-13. Link
Rosano GL & Ceccarelli EA. (2014). Recombinant protein expression in Escherichia coli: advances and challenges. Frontiers in Microbiology, 5:172. Link

‹ Why Your Protein Loses Activity After Purification (Even Though It's Pure)

Why Stabilizing Mutations Sometimes Make Everything Worse ›