Blog
Orbion Team
Glycosylation Matters More Than You Think

You expressed your therapeutic target in E. coli. The protein folds, purifies, and runs as a monodisperse peak on SEC. You screen compounds against it. You find a potent inhibitor. You move to cell-based validation—and nothing works. After three months of troubleshooting, someone runs your target through a glycosylation predictor. Four N-glycosylation sites, all near the binding pocket. Your E. coli protein lacked glycans that sterically occlude the binding site in vivo. Every compound you found binds an epitope that doesn't exist on the real protein.
Glycosylation is the most common post-translational modification in eukaryotes, and its impact on protein folding, stability, solubility, and drug accessibility is routinely underestimated.
Key Takeaways
Over 50% of human proteins are glycosylated, and glycans are often essential for folding, stability, and function—not just decorations
E. coli cannot glycosylate: proteins expressed in bacteria lack all glycans, which can fundamentally change structure and behavior
Different expression systems produce different glycans: insect cells make simpler glycans than mammalian cells, which matters for therapeutic proteins
Glycans sterically occlude surfaces: binding sites, epitopes, and protein-protein interaction interfaces can be masked by glycans that are invisible in E. coli-expressed protein
Predicting glycosylation sites from sequence is reliable for N-linked glycosylation (NxS/T motif) but harder for O-linked

The Scale of the Problem
How Common Is Glycosylation?
Glycosylation is everywhere in eukaryotic biology:
~50% of all human proteins are glycosylated (Apweiler et al., 1999)
~70% of the secretory proteome carries glycans
~90% of plasma proteins are glycosylated
All antibodies are glycosylated (N297 in Fc is critical for effector function)
Most receptors have extracellular glycosylation
Types of Glycosylation
Type | Linkage | Consensus Sequence | Where Found | Predictability |
|---|---|---|---|---|
N-linked | Asn-GlcNAc | NxS/T (x ≠ Pro) | ER, secreted proteins | High (motif is reliable) |
O-linked (mucin-type) | Ser/Thr-GalNAc | No strict consensus | Mucins, extracellular | Low (no clear motif) |
O-GlcNAc | Ser/Thr-GlcNAc | No consensus | Cytoplasmic/nuclear | Very low |
C-mannosylation | Trp-Man | WxxW | Secreted proteins | Moderate |
GPI anchor | C-terminal | Complex signal | Membrane-associated | Moderate |
N-linked glycosylation is the most structurally important and most predictable from sequence.
The N-Linked Glycosylation Pathway
N-linked glycosylation occurs co-translationally in the endoplasmic reticulum:
A preassembled Glc₃Man₉GlcNAc₂ oligosaccharide is transferred to Asn in the NxS/T motif
Glucose residues are trimmed by glucosidases I and II
The glycoprotein interacts with calnexin/calreticulin (lectin chaperones that monitor folding)
If properly folded → exit ER → Golgi for further processing
If misfolded → re-glucosylation → another round of calnexin/calreticulin quality control
If persistently misfolded → ERAD (ER-associated degradation)
Critical insight: Steps 3–6 mean that glycosylation is part of the protein folding quality control system. Removing glycosylation doesn't just remove a "decoration"—it removes a folding checkpoint. Some proteins simply cannot fold without it.

Why Glycans Aren't Just Decorations
Role 1: Protein Folding
For many glycoproteins, N-glycans are essential for folding (Hebert et al., 2014):
The calnexin/calreticulin cycle is glycan-dependent
Glycans promote folding by increasing local hydrophilicity, preventing aggregation of exposed hydrophobic surfaces during folding
~30% of glycoproteins fail to fold when glycosylation is inhibited by tunicamycin treatment
Example: Influenza hemagglutinin
Hemagglutinin has 7 N-glycosylation sites. Remove them all, and the protein doesn't fold—it accumulates in the ER as aggregated, misfolded material. Remove individual sites, and folding efficiency drops proportionally (Daniels et al., 2003).
Role 2: Protein Stability
Glycans significantly enhance thermodynamic stability:
ΔTm increases of +5–15°C are common when comparing glycosylated vs non-glycosylated forms
Glycans increase solubility by adding hydrophilic mass to the protein surface
Glycans protect against proteolysis by sterically blocking protease access
Glycosylated proteins have serum half-lives 2–10x longer than their non-glycosylated counterparts
Example: EPO (erythropoietin)
Native EPO has 3 N-glycosylation sites and 1 O-glycosylation site. The glycans constitute ~40% of the molecule's mass. Non-glycosylated EPO (from E. coli) has:
Dramatically reduced serum half-life (minutes vs hours)
Reduced biological activity in vivo
Increased aggregation propensity
Role 3: Steric Shielding
Glycans create a "sugar shield" on the protein surface:
Binding site occlusion: Glycans near binding pockets can restrict access to small molecules or antibodies
Immune evasion: HIV envelope protein gp120 has ~25 N-glycosylation sites creating a "glycan shield" that blocks antibody recognition (Stewart-Jones et al., 2016)
Receptor selectivity: Glycans can determine which receptors a protein interacts with (e.g., Fc glycosylation determines FcγR selectivity)
Drug discovery implication: If your target has glycans near the binding site, compounds identified against E. coli protein may not access the site on the native glycoprotein.
Role 4: Solubility and Aggregation Resistance
Glycans are nature's solubility enhancers:
Each N-glycan adds ~2–3 kDa of highly hydrophilic mass
This mass reduces surface hydrophobicity and prevents protein-protein aggregation
Removing glycosylation sites from antibodies increases aggregation propensity by 2–5x

The Expression System Problem
What Each System Produces
Expression System | N-Glycan Type | Complexity | Impact |
|---|---|---|---|
E. coli | None | — | No glycosylation at all |
Yeast (Pichia, S. cerevisiae) | High-mannose (Man₈₋₁₄GlcNAc₂) | Hypermannosylated | Immunogenic in humans; rapid clearance |
Insect cells (Sf9, Hi5) | Paucimannose (Man₃GlcNAc₂) | Simple, no sialic acid | Not human-like; shorter serum half-life |
CHO cells | Complex (with sialylation) | Human-like | Industry standard for therapeutics |
HEK293 | Complex (with sialylation) | Most human-like | Best for research; expensive |
Plant cells | Complex (with xylose, fucose) | Plant-specific sugars | Immunogenic; requires glycoengineering |

The CHO Standard
For therapeutic proteins, Chinese Hamster Ovary (CHO) cells are the industry standard because they produce glycans closest to human:
Core fucosylation
Bi-antennary complex glycans
Terminal sialylation (important for half-life)
When Glycosylation Type Matters
Application | Glycosylation Requirement | Recommended System |
|---|---|---|
Structural biology (crystallization) | Often better WITHOUT glycans (heterogeneity) | E. coli, or deglycosylate after mammalian expression |
Structural biology (cryo-EM) | Glycans visible; can help or hurt | Mammalian if glycans are part of the biology |
Drug discovery (screening) | Must match in vivo glycosylation | Mammalian cells for target protein |
Therapeutic production | Human-like glycans essential | CHO or HEK293 |
Activity assays (basic research) | Depends on whether glycans affect activity | Test both glycosylated and non-glycosylated |
Antibody production | Fc glycosylation affects effector function | CHO (with engineered glycans for ADCC optimization) |
Predicting Glycosylation from Sequence
N-Linked Glycosylation: Predictable
The NxS/T motif (where x ≠ Pro) is necessary but not sufficient for N-glycosylation:
~90% of NxT sites are glycosylated
~60% of NxS sites are glycosylated
Sites must be in the ER lumen (after a signal peptide) to be modified
Not all sequons are accessible: buried sites or sites in disordered regions may not be glycosylated
Prediction tools:
Tool | Method | Accuracy | Notes |
|---|---|---|---|
NetNGlyc | Neural network | ~75–80% | Considers sequence context beyond the motif |
Sequon scanning (NxS/T) | Pattern matching | ~70% (high false positive) | Simple but effective first pass |
GlyGen | Database integration | High (known sites) | Curates experimentally validated sites |
O-Linked Glycosylation: Hard to Predict
O-linked glycosylation has no strict consensus sequence, making prediction much harder:
Occurs in Ser/Thr-rich regions (mucin domains)
Often in disordered regions
Difficult to predict without experimental data
Prediction tools: NetOGlyc, ISOGlyP—but accuracy is significantly lower than N-linked prediction (~60%).

Practical Consequences for Protein Scientists
Consequence 1: Your E. coli Protein May Not Fold Like the Native Protein
If your target requires glycan-dependent folding:
The E. coli-expressed protein may be misfolded even if it appears soluble
It may be in a non-native conformation that binds different ligands
Activity assays may give misleading results
How to check: Compare activity/binding of E. coli protein vs mammalian-expressed protein. If they differ significantly, glycosylation-dependent folding is likely involved.
Consequence 2: Crystallization May Require Deglycosylation
Glycans are heterogeneous—each glycosylation site can carry different glycan structures. This heterogeneity prevents crystal packing.
Solutions:
Express in GnTI⁻ HEK293S cells (produce homogeneous Man₅GlcNAc₂ glycans)
Treat with Endo H or PNGase F to remove glycans after purification
Mutate specific glycosylation sites (NxS/T → NxA or QxS/T)
Express in E. coli if the protein folds without glycans
Caution: Removing glycans may destabilize the protein or alter its conformation. Always verify that the deglycosylated protein retains the biologically relevant structure.
Consequence 3: Antibody Glycosylation Directly Affects Therapeutic Function
For therapeutic antibodies, the glycan at N297 in the Fc region is critical (Wang et al., 2017):
Glycan Feature | Effect on Function |
|---|---|
Core fucose | Reduces ADCC (afucosylated antibodies have 50–100x better ADCC) |
Galactose | Affects CDC activity |
Sialic acid | Anti-inflammatory effects; affects half-life |
High mannose | Increased clearance rate; reduced half-life |
Bisecting GlcNAc | Enhanced ADCC |
Engineering glycosylation is now a standard approach for optimizing therapeutic antibodies:
Afucosylated antibodies (mogamulizumab, obinutuzumab) for enhanced ADCC
Glycoengineered cell lines (CHO with FUT8 knockout) for consistent afucosylation
Consequence 4: Vaccine Antigen Design
Many viral surface proteins are heavily glycosylated:
HIV gp120: ~25 N-glycosylation sites (>50% of mass is glycan)
SARS-CoV-2 Spike: 22 N-glycosylation sites per monomer
Influenza HA: 5–11 sites (varies by strain)
For vaccine design, the expression system must recapitulate native glycosylation to present the correct epitopes. E. coli-expressed viral proteins lack glycans and present non-native surfaces that elicit non-neutralizing antibodies.

When to Worry About Glycosylation
The Decision Framework
Step 1: Is your protein predicted to be glycosylated?
Scan for NxS/T motifs
Check UniProt annotations
Run NetNGlyc
Step 2: Are the glycosylation sites near functional regions?
Map sites onto the AlphaFold model
Check proximity to active sites, binding sites, or interaction interfaces
Sites >20 Å from functional regions may be less impactful
Step 3: Is the protein secreted or membrane-associated?
If yes → glycosylation likely occurs in vivo and matters
If cytoplasmic → only O-GlcNAc is possible (and often dispensable for in vitro studies)
Step 4: What is your downstream application?
Structure determination → may want to remove glycans for homogeneity
Drug screening → need glycosylated protein to avoid false hits
Therapeutic development → need human-like glycans from mammalian expression

Quick Reference
Situation | Glycosylation Concern | Action |
|---|---|---|
Target has 0 NxS/T sites, is cytoplasmic | None | Express in E. coli |
Target has 1–2 NxS/T sites, away from active site | Low | E. coli is probably fine for initial studies |
Target has 3+ NxS/T sites near functional regions | High | Express in mammalian cells or at least insect cells |
Secreted protein with multiple NxS/T | Very high | Mammalian expression essential |
Therapeutic antibody | Critical | CHO cells with glycoengineering |
The Bottom Line
Misconception | Reality |
|---|---|
"Glycans are just decorations" | Glycans are essential for folding, stability, and function in ~50% of human proteins |
"E. coli protein is fine for screening" | Only if glycans don't affect the binding site—check first |
"All glycosylated proteins need mammalian expression" | Not always—some fold and function without glycans |
"Glycan heterogeneity can't be controlled" | GnTI⁻ cells, enzyme trimming, and glycoengineered cell lines provide homogeneous glycans |
"Insect cells produce mammalian-like glycans" | No—insect cell glycans lack sialylation and are structurally different |
The core message: Before expressing any eukaryotic protein, check for predicted glycosylation sites. If they're present and near functional regions, your E. coli protein may be giving you the wrong answers.

Glycosylation-Aware Protein Science with Orbion
Orbion's AstraPTM predicts 39 post-translational modification types at residue resolution—including N-glycosylation and O-glycosylation sites—directly from sequence. This lets you see immediately whether your target protein has glycosylation sites and where they fall relative to predicted binding sites (AstraBIND) and functional regions. AstraSUIT complements this by predicting the optimal host organism and expression system, flagging when glycosylation requirements make E. coli expression inappropriate.
The integration matters: rather than discovering the glycosylation problem after three months of screening against the wrong protein, you can identify it before ordering the gene. If AstraPTM predicts N-glycosylation sites near AstraBIND-predicted binding pockets, that's a clear signal to express in mammalian cells from the start.
References
Apweiler R, et al. (1999). On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochimica et Biophysica Acta, 1473(1):4-8. Link
Hebert DN, et al. (2014). The intrinsic and extrinsic effects of N-linked glycans on glycoproteostasis. Nature Chemical Biology, 10:902-910. Link
Daniels R, et al. (2003). N-linked glycans direct the cotranslational folding pathway of influenza hemagglutinin. Molecular Cell, 11(1):79-90. Link
Wang LX, et al. (2017). Fc engineering for enhanced antibody function. Protein & Cell, 9:63-73. Link
Stewart-Jones GB, et al. (2016). Trimeric HIV-1-Env structures define glycan shields from clades A, B, and G. Cell, 165(4):813-826. Link
Walsh G & Jefferis R. (2006). Post-translational modifications in the context of therapeutic proteins. Nature Biotechnology, 24:1241-1252. Link
Kim JY, et al. (2012). CHO cells in biotechnology for production of recombinant proteins: current state and further potential. Biotechnology Advances, 30(5):1255-1267. Link
Solá RJ & Griebenow K. (2009). Effects of glycosylation on the stability of protein pharmaceuticals. Journal of Pharmaceutical Sciences, 98(4):1223-1245. Link
Varki A. (2017). Biological roles of glycans. Glycobiology, 27(1):3-49. Link
Shental-Bechor D & Levy Y. (2009). Folding of glycoproteins: toward understanding the biophysics of the glycosylation code. Current Opinion in Structural Biology, 19(5):524-533. Link
Watanabe Y, et al. (2020). Site-specific glycan analysis of the SARS-CoV-2 spike. Science, 369(6501):330-333. Link