Blog

Orbion Team

Glycosylation Matters More Than You Think

You expressed your therapeutic target in E. coli. The protein folds, purifies, and runs as a monodisperse peak on SEC. You screen compounds against it. You find a potent inhibitor. You move to cell-based validation—and nothing works. After three months of troubleshooting, someone runs your target through a glycosylation predictor. Four N-glycosylation sites, all near the binding pocket. Your E. coli protein lacked glycans that sterically occlude the binding site in vivo. Every compound you found binds an epitope that doesn't exist on the real protein.


Glycosylation is the most common post-translational modification in eukaryotes, and its impact on protein folding, stability, solubility, and drug accessibility is routinely underestimated.

Key Takeaways

  • Over 50% of human proteins are glycosylated, and glycans are often essential for folding, stability, and function—not just decorations

  • E. coli cannot glycosylate: proteins expressed in bacteria lack all glycans, which can fundamentally change structure and behavior

  • Different expression systems produce different glycans: insect cells make simpler glycans than mammalian cells, which matters for therapeutic proteins

  • Glycans sterically occlude surfaces: binding sites, epitopes, and protein-protein interaction interfaces can be masked by glycans that are invisible in E. coli-expressed protein

  • Predicting glycosylation sites from sequence is reliable for N-linked glycosylation (NxS/T motif) but harder for O-linked

The Scale of the Problem

How Common Is Glycosylation?

Glycosylation is everywhere in eukaryotic biology:

  • ~50% of all human proteins are glycosylated (Apweiler et al., 1999)

  • ~70% of the secretory proteome carries glycans

  • ~90% of plasma proteins are glycosylated

  • All antibodies are glycosylated (N297 in Fc is critical for effector function)

  • Most receptors have extracellular glycosylation

Types of Glycosylation

Type

Linkage

Consensus Sequence

Where Found

Predictability

N-linked

Asn-GlcNAc

NxS/T (x ≠ Pro)

ER, secreted proteins

High (motif is reliable)

O-linked (mucin-type)

Ser/Thr-GalNAc

No strict consensus

Mucins, extracellular

Low (no clear motif)

O-GlcNAc

Ser/Thr-GlcNAc

No consensus

Cytoplasmic/nuclear

Very low

C-mannosylation

Trp-Man

WxxW

Secreted proteins

Moderate

GPI anchor

C-terminal

Complex signal

Membrane-associated

Moderate

N-linked glycosylation is the most structurally important and most predictable from sequence.

The N-Linked Glycosylation Pathway

N-linked glycosylation occurs co-translationally in the endoplasmic reticulum:

  1. A preassembled Glc₃Man₉GlcNAc₂ oligosaccharide is transferred to Asn in the NxS/T motif

  2. Glucose residues are trimmed by glucosidases I and II

  3. The glycoprotein interacts with calnexin/calreticulin (lectin chaperones that monitor folding)

  4. If properly folded → exit ER → Golgi for further processing

  5. If misfolded → re-glucosylation → another round of calnexin/calreticulin quality control

  6. If persistently misfolded → ERAD (ER-associated degradation)


Critical insight: Steps 3–6 mean that glycosylation is part of the protein folding quality control system. Removing glycosylation doesn't just remove a "decoration"—it removes a folding checkpoint. Some proteins simply cannot fold without it.

Why Glycans Aren't Just Decorations

Role 1: Protein Folding

For many glycoproteins, N-glycans are essential for folding (Hebert et al., 2014):


Example: Influenza hemagglutinin
Hemagglutinin has 7 N-glycosylation sites. Remove them all, and the protein doesn't fold—it accumulates in the ER as aggregated, misfolded material. Remove individual sites, and folding efficiency drops proportionally (Daniels et al., 2003).

Role 2: Protein Stability

Glycans significantly enhance thermodynamic stability:

  • ΔTm increases of +5–15°C are common when comparing glycosylated vs non-glycosylated forms

  • Glycans increase solubility by adding hydrophilic mass to the protein surface

  • Glycans protect against proteolysis by sterically blocking protease access

  • Glycosylated proteins have serum half-lives 2–10x longer than their non-glycosylated counterparts


Example: EPO (erythropoietin)
Native EPO has 3 N-glycosylation sites and 1 O-glycosylation site. The glycans constitute ~40% of the molecule's mass. Non-glycosylated EPO (from E. coli) has:

  • Dramatically reduced serum half-life (minutes vs hours)

  • Reduced biological activity in vivo

  • Increased aggregation propensity

Role 3: Steric Shielding

Glycans create a "sugar shield" on the protein surface:

  • Binding site occlusion: Glycans near binding pockets can restrict access to small molecules or antibodies

  • Immune evasion: HIV envelope protein gp120 has ~25 N-glycosylation sites creating a "glycan shield" that blocks antibody recognition (Stewart-Jones et al., 2016)

  • Receptor selectivity: Glycans can determine which receptors a protein interacts with (e.g., Fc glycosylation determines FcγR selectivity)


Drug discovery implication: If your target has glycans near the binding site, compounds identified against E. coli protein may not access the site on the native glycoprotein.

Role 4: Solubility and Aggregation Resistance

Glycans are nature's solubility enhancers:

The Expression System Problem

What Each System Produces

Expression System

N-Glycan Type

Complexity

Impact

E. coli

None

No glycosylation at all

Yeast (Pichia, S. cerevisiae)

High-mannose (Man₈₋₁₄GlcNAc₂)

Hypermannosylated

Immunogenic in humans; rapid clearance

Insect cells (Sf9, Hi5)

Paucimannose (Man₃GlcNAc₂)

Simple, no sialic acid

Not human-like; shorter serum half-life

CHO cells

Complex (with sialylation)

Human-like

Industry standard for therapeutics

HEK293

Complex (with sialylation)

Most human-like

Best for research; expensive

Plant cells

Complex (with xylose, fucose)

Plant-specific sugars

Immunogenic; requires glycoengineering

The CHO Standard

For therapeutic proteins, Chinese Hamster Ovary (CHO) cells are the industry standard because they produce glycans closest to human:

When Glycosylation Type Matters

Application

Glycosylation Requirement

Recommended System

Structural biology (crystallization)

Often better WITHOUT glycans (heterogeneity)

E. coli, or deglycosylate after mammalian expression

Structural biology (cryo-EM)

Glycans visible; can help or hurt

Mammalian if glycans are part of the biology

Drug discovery (screening)

Must match in vivo glycosylation

Mammalian cells for target protein

Therapeutic production

Human-like glycans essential

CHO or HEK293

Activity assays (basic research)

Depends on whether glycans affect activity

Test both glycosylated and non-glycosylated

Antibody production

Fc glycosylation affects effector function

CHO (with engineered glycans for ADCC optimization)

Predicting Glycosylation from Sequence

N-Linked Glycosylation: Predictable

The NxS/T motif (where x ≠ Pro) is necessary but not sufficient for N-glycosylation:

  • ~90% of NxT sites are glycosylated

  • ~60% of NxS sites are glycosylated

  • Sites must be in the ER lumen (after a signal peptide) to be modified

  • Not all sequons are accessible: buried sites or sites in disordered regions may not be glycosylated


Prediction tools:

Tool

Method

Accuracy

Notes

NetNGlyc

Neural network

~75–80%

Considers sequence context beyond the motif

Sequon scanning (NxS/T)

Pattern matching

~70% (high false positive)

Simple but effective first pass

GlyGen

Database integration

High (known sites)

Curates experimentally validated sites

O-Linked Glycosylation: Hard to Predict

O-linked glycosylation has no strict consensus sequence, making prediction much harder:

  • Occurs in Ser/Thr-rich regions (mucin domains)

  • Often in disordered regions

  • Difficult to predict without experimental data


Prediction tools: NetOGlyc, ISOGlyP—but accuracy is significantly lower than N-linked prediction (~60%).

Practical Consequences for Protein Scientists

Consequence 1: Your E. coli Protein May Not Fold Like the Native Protein

If your target requires glycan-dependent folding:

  • The E. coli-expressed protein may be misfolded even if it appears soluble

  • It may be in a non-native conformation that binds different ligands

  • Activity assays may give misleading results


How to check: Compare activity/binding of E. coli protein vs mammalian-expressed protein. If they differ significantly, glycosylation-dependent folding is likely involved.

Consequence 2: Crystallization May Require Deglycosylation

Glycans are heterogeneous—each glycosylation site can carry different glycan structures. This heterogeneity prevents crystal packing.


Solutions:

  • Express in GnTI⁻ HEK293S cells (produce homogeneous Man₅GlcNAc₂ glycans)

  • Treat with Endo H or PNGase F to remove glycans after purification

  • Mutate specific glycosylation sites (NxS/T → NxA or QxS/T)

  • Express in E. coli if the protein folds without glycans


Caution: Removing glycans may destabilize the protein or alter its conformation. Always verify that the deglycosylated protein retains the biologically relevant structure.

Consequence 3: Antibody Glycosylation Directly Affects Therapeutic Function

For therapeutic antibodies, the glycan at N297 in the Fc region is critical (Wang et al., 2017):

Glycan Feature

Effect on Function

Core fucose

Reduces ADCC (afucosylated antibodies have 50–100x better ADCC)

Galactose

Affects CDC activity

Sialic acid

Anti-inflammatory effects; affects half-life

High mannose

Increased clearance rate; reduced half-life

Bisecting GlcNAc

Enhanced ADCC

Engineering glycosylation is now a standard approach for optimizing therapeutic antibodies:

  • Afucosylated antibodies (mogamulizumab, obinutuzumab) for enhanced ADCC

  • Glycoengineered cell lines (CHO with FUT8 knockout) for consistent afucosylation

Consequence 4: Vaccine Antigen Design

Many viral surface proteins are heavily glycosylated:

  • HIV gp120: ~25 N-glycosylation sites (>50% of mass is glycan)

  • SARS-CoV-2 Spike: 22 N-glycosylation sites per monomer

  • Influenza HA: 5–11 sites (varies by strain)


For vaccine design, the expression system must recapitulate native glycosylation to present the correct epitopes. E. coli-expressed viral proteins lack glycans and present non-native surfaces that elicit non-neutralizing antibodies.

When to Worry About Glycosylation

The Decision Framework

Step 1: Is your protein predicted to be glycosylated?

  • Scan for NxS/T motifs

  • Check UniProt annotations

  • Run NetNGlyc


Step 2: Are the glycosylation sites near functional regions?

  • Map sites onto the AlphaFold model

  • Check proximity to active sites, binding sites, or interaction interfaces

  • Sites >20 Å from functional regions may be less impactful


Step 3: Is the protein secreted or membrane-associated?

  • If yes → glycosylation likely occurs in vivo and matters

  • If cytoplasmic → only O-GlcNAc is possible (and often dispensable for in vitro studies)


Step 4: What is your downstream application?

  • Structure determination → may want to remove glycans for homogeneity

  • Drug screening → need glycosylated protein to avoid false hits

  • Therapeutic development → need human-like glycans from mammalian expression

Quick Reference

Situation

Glycosylation Concern

Action

Target has 0 NxS/T sites, is cytoplasmic

None

Express in E. coli

Target has 1–2 NxS/T sites, away from active site

Low

E. coli is probably fine for initial studies

Target has 3+ NxS/T sites near functional regions

High

Express in mammalian cells or at least insect cells

Secreted protein with multiple NxS/T

Very high

Mammalian expression essential

Therapeutic antibody

Critical

CHO cells with glycoengineering

The Bottom Line

Misconception

Reality

"Glycans are just decorations"

Glycans are essential for folding, stability, and function in ~50% of human proteins

"E. coli protein is fine for screening"

Only if glycans don't affect the binding site—check first

"All glycosylated proteins need mammalian expression"

Not always—some fold and function without glycans

"Glycan heterogeneity can't be controlled"

GnTI⁻ cells, enzyme trimming, and glycoengineered cell lines provide homogeneous glycans

"Insect cells produce mammalian-like glycans"

No—insect cell glycans lack sialylation and are structurally different

The core message: Before expressing any eukaryotic protein, check for predicted glycosylation sites. If they're present and near functional regions, your E. coli protein may be giving you the wrong answers.

Glycosylation-Aware Protein Science with Orbion

Orbion's AstraPTM predicts 39 post-translational modification types at residue resolution—including N-glycosylation and O-glycosylation sites—directly from sequence. This lets you see immediately whether your target protein has glycosylation sites and where they fall relative to predicted binding sites (AstraBIND) and functional regions. AstraSUIT complements this by predicting the optimal host organism and expression system, flagging when glycosylation requirements make E. coli expression inappropriate.


The integration matters: rather than discovering the glycosylation problem after three months of screening against the wrong protein, you can identify it before ordering the gene. If AstraPTM predicts N-glycosylation sites near AstraBIND-predicted binding pockets, that's a clear signal to express in mammalian cells from the start.

References

  1. Apweiler R, et al. (1999). On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochimica et Biophysica Acta, 1473(1):4-8. Link

  2. Hebert DN, et al. (2014). The intrinsic and extrinsic effects of N-linked glycans on glycoproteostasis. Nature Chemical Biology, 10:902-910. Link

  3. Daniels R, et al. (2003). N-linked glycans direct the cotranslational folding pathway of influenza hemagglutinin. Molecular Cell, 11(1):79-90. Link

  4. Wang LX, et al. (2017). Fc engineering for enhanced antibody function. Protein & Cell, 9:63-73. Link

  5. Stewart-Jones GB, et al. (2016). Trimeric HIV-1-Env structures define glycan shields from clades A, B, and G. Cell, 165(4):813-826. Link

  6. Walsh G & Jefferis R. (2006). Post-translational modifications in the context of therapeutic proteins. Nature Biotechnology, 24:1241-1252. Link

  7. Kim JY, et al. (2012). CHO cells in biotechnology for production of recombinant proteins: current state and further potential. Biotechnology Advances, 30(5):1255-1267. Link

  8. Solá RJ & Griebenow K. (2009). Effects of glycosylation on the stability of protein pharmaceuticals. Journal of Pharmaceutical Sciences, 98(4):1223-1245. Link

  9. Varki A. (2017). Biological roles of glycans. Glycobiology, 27(1):3-49. Link

  10. Shental-Bechor D & Levy Y. (2009). Folding of glycoproteins: toward understanding the biophysics of the glycosylation code. Current Opinion in Structural Biology, 19(5):524-533. Link

  11. Watanabe Y, et al. (2020). Site-specific glycan analysis of the SARS-CoV-2 spike. Science, 369(6501):330-333. Link