Product

Blog

Orbion Team

How to Choose Expression Systems for Your Protein's PTM Requirements

Dec 2, 2025

PTMs Occurring on Proteins Based on the System They're Expressed In

The fundamental question in protein engineering: Which expression system should you use? The answer isn't about convenience—it's about post-translational modifications (PTMs).

Post-translational modifications determine whether your protein will work. They're not decorative—they're functional. PTMs affect folding, stability, localization, and activity. Get them wrong, and your beautifully designed construct becomes an expensive pile of misfolded aggregates.

Key Takeaways

Start simple: Use E. coli unless you have a specific PTM requirement (glycosylation, complex disulfides)
Match biology to biology: Don't use mammalian cells for proteins that don't need eukaryotic PTMs
Know your protein first: Experimental PTM data (UniProt) has gaps—use AI prediction for complete coverage
The cost ladder: E. coli (1×) → Yeast (3-5×) → Insect (10-20×) → Mammalian (20-50×)
Critical decision point: Does your protein require glycosylation for folding? If yes, skip E. coli

But here's the problem: not all expression systems can produce all PTMs. Express a heavily glycosylated human protein in E. coli, and you'll get an unglycosylated mess that likely won't fold. Express a simple bacterial protein in mammalian cells, and you're wasting time and money on unnecessary complexity.

Choosing the right expression system means matching your protein's PTM requirements to what the host cell can actually deliver. Let me walk you through how to make that decision.

Diagram Showing Expression Systems' Effects on PTMs

Quick Comparison: Expression Systems at a Glance

System	Timeline	Cost	Yield	Best For	Deal Breaker
E. coli	1-2 days	$	1-10 g/L	Cytoplasmic proteins, no glycosylation	Needs N-glycosylation
Yeast	3-5 days	$$$	0.1-1 g/L	Secreted proteins, disulfides	Needs human glycans
Insect	5-7 days	$$$$	1-10 mg/L	Membrane proteins, GPCRs	Needs full sialylation
Mammalian	10-14 days	$$$$$	0.05-5 g/L	Therapeutics, human PTMs	High cost/low throughput

The Expression System Ladder

Think of expression systems as a ladder. You want to start at the bottom (fastest, cheapest) and only climb higher when you need to.

E. coli: Fast and Cheap, But Limited

Timeline: 24-48 hours from transformation to protein
Cost: $$ (baseline)
Yield: 1-10 g/L for well-expressing proteins

Best for:

Cytoplasmic proteins without disulfide bonds
Proteins that don't require eukaryotic PTMs
High-throughput screening (need to test 50+ variants quickly)
Budget-constrained academic projects
Enzymes, structural domains, fluorescent proteins

What E. coli can do:

Phosphorylation (limited - mostly His/Asp on two-component systems; minimal Ser/Thr/Tyr)
Methylation (PrmA, PrmB, PrmC methyltransferases for ribosomal proteins)
Acetylation (both enzymatic via Pat/AcP and non-enzymatic)
Some lipidation (N-terminal myristoylation possible with NMT expression; no eukaryotic palmitoylation)

What E. coli cannot do:

N-glycosylation (no machinery)
O-glycosylation (no machinery)
Complex disulfide bonds (cytoplasm is reducing environment)
Tyrosine sulfation
Most eukaryotic PTMs

Common failures:

Membrane proteins (aggregate without proper folding)
Antibodies (require disulfide bonds, need periplasmic expression or refolding)
Secreted human proteins (usually need glycosylation to fold)

When it works: You're making interleukins, simple enzymes, fluorescent proteins, structural domains without complex PTMs. E. coli is your workhorse. Fast, cheap, scalable.

When it fails: You need glycosylation, you're working with a GPCR, or you're making a therapeutic antibody. Don't waste weeks trying to make E. coli work for a protein it fundamentally can't handle.

Escherichia Coli Expression System Details Diagram

Yeast: The Middle Ground

Timeline: 3-5 days from transformation to protein
Cost: $$$ (3-5× E. coli)
Yield: 0.1-1 g/L (can reach higher with optimization)

Two main options:

Pichia pastoris (methylotrophic yeast, high expression, methanol-inducible)
S. cerevisiae (baker's yeast, well-characterized, GRAS status for therapeutics)

Best for:

Secreted proteins (uses signal peptides efficiently)
Proteins requiring disulfide bonds
Moderate glycosylation requirements (function tolerates non-human glycans)
Stepping up from E. coli without going full mammalian

What yeast can do:

N-glycosylation (but hypermannose, not human-like)
O-glycosylation (limited)
Disulfide bond formation (ER pathway)
GPI anchors
Phosphorylation (mostly functional)
Farnesylation, palmitoylation

What yeast cannot do:

Complex human glycosylation patterns
Sialylation (no sialyltransferases)
Tyrosine sulfation
Gamma-carboxylation

The glycosylation caveat: Yeast glycosylates, but differently. It adds high-mannose N-glycans that can be immunogenic for therapeutics. For basic research, this often doesn't matter. For drug development, it's a dealbreaker.

When it works: You're expressing secreted enzymes, some antibody fragments, vaccine antigens. You need disulfide bonds but don't need human-identical glycosylation.

When it fails: You're developing a therapeutic that requires specific human glycan structures. Your protein's function depends on sialylation or complex branched glycans.

Insect Cells: Closer to Mammalian

Timeline: 5-7 days from transfection to protein
Cost: $$$$ (10-20× E. coli)
Yield: 1-10 mg/L (lower than E. coli, but higher quality for membrane proteins)

Common systems:

Sf9, Sf21 (Spodoptera frugiperda - fall armyworm, most common)
Hi5 (Trichoplusia ni - cabbage looper, higher expression)
Baculovirus expression vector system (BEVS) - highly efficient protein production

Best for:

Membrane proteins (especially GPCRs for structural biology)
Large multi-protein complexes (proper assembly machinery)
Proteins requiring authentic eukaryotic processing
Structural biology (cryo-EM, X-ray crystallography)

What insect cells can do:

N-glycosylation (simpler than mammalian, but closer than yeast)
O-glycosylation
Phosphorylation (full complement)
Most lipidation pathways
Disulfide bonds (robust ER)
Better membrane protein folding machinery

What insect cells cannot do (or do poorly):

Sialylation (lack α2,6-sialyltransferases; some Sf9 lines have minimal α2,3-sialylation)
Tyrosine sulfation (minimal TPST activity)
Complex terminal glycan processing (no complex branching enzymes)
Some mammalian-specific modifications

When it works: You're working with GPCRs for structural studies. You need a multi-subunit complex assembled correctly. You're doing cryo-EM and need native-like protein.

When it fails: You need fully sialylated glycoproteins. You're producing a therapeutic requiring human glycosylation. Cost is prohibitive for your throughput needs.

Mammalian Cells: Authentic, But Expensive

Timeline: 10-14 days for transient expression; months for stable cell lines
Cost: $$$$$ (20-50× E. coli)
Yield: 0.05-0.5 g/L transient; 1-5 g/L stable lines (CHO can reach higher)

Common systems:

HEK293 (human embryonic kidney - fast growth, excellent transfection, research gold standard)
CHO (Chinese hamster ovary - FDA-preferred for biologics, ~70% of therapeutic proteins)
Expi293 (high-yield suspension HEK293, optimized for transient expression)
HEK293-F, FreeStyle 293 (suspension-adapted for scalability)

Best for:

Therapeutic antibodies and proteins (regulatory approval pathway)
Proteins requiring human-identical glycosylation
Complex secreted proteins (growth factors, cytokines, enzymes)
Final-stage validation before IND filing

What mammalian cells can do:

Everything. Full human PTM machinery.
N-glycosylation with complex branching and sialylation
O-glycosylation with appropriate terminal modifications
All phosphorylation, acetylation, methylation
Gamma-carboxylation
Proper proteolytic processing
Authentic membrane protein folding

What mammalian cells cost:

10-50x more expensive than E. coli
Slower growth (days vs hours)
More complex culture requirements
Lower expression levels (usually)

When it works: You're making a therapeutic. You need regulatory-compliant production. Function absolutely requires human PTMs.

When it fails: You're doing high-throughput screening of 100 variants. Budget won't support it. Simple proteins that don't need the complexity.

Mammalian Cell Expression System Diagram

The PTM × Expression System Matrix

Here's the reference table. Bookmark this.

PTM Type	E. coli	Yeast	Insect	Mammalian
N-glycosylation	✗	✓ (hypermannose)	✓ (partial)	✓ (full)
O-glycosylation	✗	△ (limited)	✓	✓
Phosphorylation (S/T/Y)	△ (limited)	✓	✓	✓
Disulfide bonds	△ (periplasm only)	✓	✓	✓
Acetylation	✓	✓	✓	✓
Methylation	✓	✓	✓	✓
Ubiquitination	✗	✓	✓	✓
Sumoylation	✗	✓	✓	✓
Palmitoylation	✗	✓	✓	✓
Myristoylation	✗	✓	✓	✓
GPI anchor	✗	✓	✓	✓
Sialylation	✗	✗	△ (weak)	✓
Sulfation	✗	✗	△ (weak)	✓
Gamma-carboxylation	✗	✗	✗	✓

Legend:

✓ = Robustly supported
△ = Partially supported / limited capacity
✗ = Not supported / minimal activity

The Decision Framework: 3 Steps to Choose Your Expression System

Decision Framework Diagram for Expression System Selection Based on PTMs

Step 1: Map Your Protein's PTM Requirements

Before you choose a system, know what your protein actually needs.

The Manual Approach (Traditional Workflow)

Sequence analysis:

UniProt annotations - experimentally validated PTMs when available (often incomplete)
NetNGlyc 1.0 - predict N-glycosylation sites (Asn-X-Ser/Thr motifs)
NetOGlyc 4.0 - predict O-glycosylation sites
DISULFIND / DiANNA - predict disulfide bonds from sequence
CSS-Palm 4.0 - predict palmitoylation sites

Structural analysis:

Check PDB for homologous structures showing disulfide bonds
Look for membrane-spanning regions (transmembrane helices = expression challenge)
Identify disordered regions (may need tags for stability)

The problem with this approach:

UniProt has gaps: Most proteins lack comprehensive experimental PTM annotations
Fragmented tools: You need 5+ different prediction servers with different formats
Time-consuming: Checking databases + running predictions takes 1-2 hours per protein
Incomplete picture: Experimental data captures what's been studied, not what exists

The Modern AI Solution

Use machine learning models trained on structural and sequence data to predict the complete PTM landscape, even when experimental data is absent. AI models can infer PTMs from evolutionary patterns, structural context, and sequence motifs—often with higher coverage than databases alone.

How this works in practice (using Orbion's Characterize module as an example):

Input sequence → AI analyzes structural features and evolutionary patterns
Comprehensive PTM prediction:
- N- and O-glycosylation sites (beyond simple consensus motifs)
- Phosphorylation sites (Ser/Thr/Tyr) with confidence scores
- Lipidation sites (palmitoylation, myristoylation, prenylation)
- Disulfide bonds from structural context
- Membrane topology, signal peptides, domain boundaries
Automated expression system recommendation:
- Glycosylation required? → Rules out E. coli
- Complex disulfides? → Suggests yeast or higher
- Membrane protein? → Recommends insect or mammalian
- Therapeutic context? → Prioritizes CHO/HEK293

This gives you a complete, actionable PTM profile in minutes—not the hours required for manual database searches.

Step 2: Match Requirements to Capabilities

Ask these questions:

Does it require glycosylation?
- No → E. coli is on the table
- Yes, simple → Yeast might work
- Yes, complex → Insect or mammalian
Are there disulfide bonds?
- No → E. coli cytoplasm is fine
- Yes, simple (1-2 bonds) → E. coli periplasm or yeast
- Yes, complex → Yeast, insect, or mammalian
Is it a membrane protein?
- No → More flexibility
- Yes → Yeast (if simple), insect (if GPCR), mammalian (if complex)
Is this for therapeutic development?
- No → Use cheapest option that works
- Yes → Probably need mammalian (regulatory considerations)

Step 3: Start Low, Move Up as Needed

The default strategy: Start with E. coli. Only move up the ladder when you have a specific reason.

Don't skip steps based on assumptions. Yes, it's a mammalian protein. But maybe it'll express in E. coli anyway. Try before you commit to expensive systems.

Exception: If the protein requires glycosylation for folding (common for secreted proteins), don't waste time on E. coli. Start with yeast or higher.

Common Mistakes

Diagram on Mistakes on Choosing Expression Systems

Mistake 1: Using Mammalian Cells for Everything

"It's a human protein, so I'll use HEK293."

This wastes time and money. Many human proteins express perfectly in E. coli. Try the simple system first unless you have a specific PTM requirement.

Mistake 2: Forcing E. coli When It Won't Work

"We've always used E. coli, so we'll make it work."

If your protein needs glycosylation or has complex disulfide bonds, no amount of codon optimization or tag engineering will make E. coli work. Move up the ladder.

Mistake 3: Ignoring Topology

Membrane proteins are special. Transmembrane regions, lipid requirements, proper folding machinery - these matter. Don't treat a GPCR like a soluble enzyme.

Mistake 4: Assuming Yeast Glycosylation Is Good Enough

For basic research, it usually is. For therapeutics, it's not. Hypermannose glycans can be immunogenic. Know the difference.

Real-World Examples

Example 1: Green Fluorescent Protein (GFP)

PTMs needed: None
System choice: E. coli
Why it works: Simple, cytoplasmic, self-folding
Expression level: Very high (grams per liter)

Example 2: β2-Adrenergic Receptor (GPCR)

PTMs needed: Palmitoylation (Cys341), phosphorylation (multiple Ser/Thr in C-terminus), N-glycosylation (Asn6, Asn15)
System choice: Insect cells (Sf9 with baculovirus)
Why E. coli fails: Seven-transmembrane protein, needs ER insertion machinery, lipid environment
Key modification: Often use T4-lysozyme fusion (ICL3 replacement) for crystallization
Expression level: 1-5 mg/L (lower yield, but functional and properly folded)

Beta-2 Adrenergic Receptor's High Confidence PTM Predictions on the Orbion Platform

Example 3: Monoclonal Antibody (IgG)

PTMs needed: 16 disulfide bonds (12 intrachain, 4 interchain), N-glycosylation at Asn297 (Fc region)
System choice: CHO cells (industry standard for >70% of therapeutic mAbs)
Why yeast fails: Glycan structure affects FcγR binding, ADCC, CDC; hypermannose glycans are immunogenic
Critical factor: Glycosylation at Asn297 modulates effector function—must be human-compatible
Expression level: 3-5 g/L in fed-batch CHO; 10+ g/L in optimized perfusion systems

Example 4: Insulin

PTMs needed: Disulfide bonds, proteolytic processing
System choice: E. coli (surprisingly!)
How: Express as inclusion bodies, refold, process in vitro
Why it works: Small, well-characterized refolding protocol, cost-effective at scale

The Bottom Line: Match Biology to Biology

Expression system choice isn't about prestige or default assumptions. It's about matching your protein's biology to the host cell's capabilities.

Quick Reference Decision Tree:

Does your protein need glycosylation?
├─ No → Try E. coli first
└─ Yes → Does it need human-identical glycans?
    ├─ No → Yeast or Insect cells
    └─ Yes → Mammalian cells (HEK293/CHO)

Does your protein have disulfide bonds?
├─ No → E. coli cytoplasm works
├─ 1-2 simple bonds → E. coli periplasm or yeast
└─ Complex/multiple bonds → Yeast, insect, or mammalian

Is it a membrane protein?
├─ No → More options available
└─ Yes → What type?
    ├─ Bacterial homolog → E. coli might work
    ├─ Single-pass → Yeast or mammalian
    └─ GPCR/multi-pass → Insect or mammalian

Does your protein need glycosylation?
├─ No → Try E. coli first
└─ Yes → Does it need human-identical glycans?
    ├─ No → Yeast or Insect cells
    └─ Yes → Mammalian cells (HEK293/CHO)

Does your protein have disulfide bonds?
├─ No → E. coli cytoplasm works
├─ 1-2 simple bonds → E. coli periplasm or yeast
└─ Complex/multiple bonds → Yeast, insect, or mammalian

Is it a membrane protein?
├─ No → More options available
└─ Yes → What type?
    ├─ Bacterial homolog → E. coli might work
    ├─ Single-pass → Yeast or mammalian
    └─ GPCR/multi-pass → Insect or mammalian

Does your protein need glycosylation?
├─ No → Try E. coli first
└─ Yes → Does it need human-identical glycans?
    ├─ No → Yeast or Insect cells
    └─ Yes → Mammalian cells (HEK293/CHO)

Does your protein have disulfide bonds?
├─ No → E. coli cytoplasm works
├─ 1-2 simple bonds → E. coli periplasm or yeast
└─ Complex/multiple bonds → Yeast, insect, or mammalian

Is it a membrane protein?
├─ No → More options available
└─ Yes → What type?
    ├─ Bacterial homolog → E. coli might work
    ├─ Single-pass → Yeast or mammalian
    └─ GPCR/multi-pass → Insect or mammalian

Key Principles:

Start simple (E. coli) unless you have a specific reason not to
Know your protein's PTM requirements before you start (use UniProt + prediction tools)
Don't waste time on systems that fundamentally can't deliver what you need
Move up the ladder only when necessary—each step increases cost and time
Cost and speed matter—use the simplest system that works

PTMs aren't optional decorations. They're functional requirements. Choose your expression system accordingly.

Let Orbion Predict PTMs and Choose Your Expression System

Manually checking UniProt, running prediction servers, cross-referencing literature—this takes hours per protein. Worse, experimental databases have significant gaps: most proteins lack comprehensive PTM annotations because they haven't been systematically studied.

Orbion solves both problems:

1. Higher-Quality PTM Prediction

Orbion's AI models don't just look up what's in databases—they predict the complete PTM landscape from sequence and structure, even when experimental data is absent.

✓ N- and O-glycosylation sites - comprehensive prediction beyond consensus motifs
✓ Phosphorylation sites - Ser/Thr/Tyr with confidence scores
✓ Lipidation - palmitoylation, myristoylation, prenylation
✓ Disulfide bonds - predicted from structural context and evolutionary conservation
✓ Topology - membrane regions, signal peptides, domain boundaries
✓ Binding sites - active sites, allosteric sites, protein-protein interfaces

Why this matters: UniProt might show 2 experimentally validated glycosylation sites. Orbion predicts 5 additional sites that haven't been experimentally characterized yet—but will affect your expression strategy.

2. Automated Expression System Recommendation

Based on the complete PTM profile, Orbion automatically recommends the optimal expression system:

Detects glycosylation requirements → rules out E. coli
Identifies complex disulfides → suggests yeast or higher
Recognizes membrane topology → recommends insect or mammalian
Flags therapeutic context → prioritizes CHO/HEK293

The result: Get from sequence to informed construct design in minutes, not days—with higher confidence than manual database searches.

Beta-2 Adrenergic Receptor Expression Protocol Diagram for Beta-2 Adrenergic Receptor on the Orbion Platform

Beta-2-adrenergic receptor protocol prediction on the Orbion platform: baculovirus, Sf9 suspension, a stabilizing ligand, and a carefully engineered construct. The expression system isn’t just ‘insect’—it’s an entire strategy built around the receptor’s PTMs, folding, and stability.

Ready to try it on your target?

Book a 20-Minute Demo

Sign up free for unlimited Overview runs — summary, sequence-based analysis, homology search. For the full Characterization — PTMs, binding sites, stability variants, construct design — book a demo and we'll run your target live.

Book a Demo →Sign Up

‹ AlphaFold's Limitations: What It Can't Predict (And How to Fill the Gaps)

GPCRs: Bridging the Gap in the Druggable Genome ›

Try Orbion on your own protein

Summary, sequence-based analysis, homology search — free, unlimited.