Blog
Orbion Team
Cryptic Proteolysis: Why TEV and PreScission Proteases Cut Your Protein Internally

After overnight TEV cleavage you see your protein has lost 5 kDa. The tag is gone — but so is a piece of your protein. The cut site is buried inside your construct, far from any ENLYFQ↓G sequence. You stare at the SDS-PAGE gel: two bands where there should be one, neither at the expected mass. Mass spec confirms the worst — your "highly specific" protease has bitten somewhere it was never supposed to.
Welcome to cryptic proteolysis, the silent productivity sink of every structural biology lab that uses TEV, 3C/PreScission, or thrombin. The textbooks call these proteases "highly specific." Your gel says otherwise.
Key Takeaways
TEV is highly specific but not infinitely specific: cryptic ENLYFQ-like motifs in your sequence can be cleaved, especially in disordered or exposed loops
PreScission/3C protease tolerates more variation at P1' than its consensus suggests, particularly Gly/Ala/Ser/Met in the P1' position
Accessibility matters more than sequence purity: a perfect site buried in the core won't cut; an imperfect site on a flexible loop will
Diagnose before redesigning: intact-mass LC-MS plus N-terminal sequencing of the unwanted fragment localizes the cryptic cut to within ~1 residue
Mitigation is usually a condition change, not a sequence change: cold-temperature, on-column cleavage, and reduced enzyme:substrate ratio rescue 70%+ of cryptic-cleavage problems before mutagenesis is needed

The Problem Most Labs Pretend Doesn't Exist
Open any protein production paper and you will read sentences like "the tag was removed by overnight digestion with TEV protease." What you won't read is "the protease also clipped an internal loop at residue 187, reducing yield by 40%, but we cropped the gel." The literature massively under-reports cryptic proteolysis because the failure modes are private — they live in lab notebooks, not papers.
But systematic studies have made the problem unavoidable. Kapust et al. (2002) showed that the TEV protease P1' position tolerates almost any amino acid except proline, with cleavage efficiency varying by less than two orders of magnitude across the natural amino acids (Kapust et al., 2002). That tolerance is the source of the problem: any ENLYFQ-X sequence in your construct is a potential cut site, and the human proteome contains thousands of them.
The same is true for PreScission. The LEVLFQ↓GP consensus was selected from a viral polyprotein context, not engineered for orthogonality. P1' is reported as a strict Gly preference, but biochemistry says otherwise — Cordingley et al. (1990) demonstrated meaningful cleavage at LEVLFQ-A and LEVLFQ-S sites in synthetic substrates (Cordingley et al., 1990).

Specificity of the Common Tag-Removal Proteases
TEV Protease
Origin: Nuclear inclusion a (NIa) protease from Tobacco Etch Virus.
Consensus: ENLYFQ↓(G/S), where ↓ is the scissile bond. The S219V "superTEV" variant is the lab workhorse — autoproteolysis-resistant and produced cheaply in E. coli.
Specificity profile: The substrate envelope is asymmetric. P6–P1 positions are strict (E-N-L-Y-F-Q is rarely tolerant to substitution), but P1' through P5' are highly permissive (Kapust et al., 2002). A single conservative substitution at any of P6–P2 (e.g., E→D at P6, L→I at P4) reduces kcat/KM by 10–100×, but does not eliminate cleavage.
Known off-target tendencies:
Sequences resembling DNLYFQ, ENLYLQ, and EN-LFQ are cleaved at 1–10% the rate of the canonical site — perfectly enough to produce a stoichiometric internal cleavage during a 16-hour overnight incubation
The N-terminal Met of E. coli-expressed substrates is not protected; if you happen to have an MENLY... near the start, the protease may strip it
Glutamine at any position contributes more than its share because P1 Q is the most stringent feature recognized
Crystal structure: The TEV S1 pocket is deep and Gln-specific (Phan et al., 2002), but the S2–S6 pockets are shallow and water-mediated, explaining the tolerance (Phan et al., 2002).
PreScission / HRV-3C Protease
Origin: Human Rhinovirus 14, 3C cysteine protease. "PreScission" is the GST-fusion commercial variant.
Consensus: LEVLFQ↓GP (canonical) or LEALFQ↓GP (variant). The P1 Gln is essential, and P1' Gly is strongly preferred — but not absolutely required.
Specificity profile: Compared to TEV, 3C has narrower P1' specificity (Gly >> Ala > Ser >> rest) but broader P2–P4 specificity. Cleavage at LE(V/I/L)(L/M/I/F)FQ↓G occurs nearly equally (Cordingley et al., 1990).
Known off-target tendencies:
Internal LXXFQG and (V/L)XLFQG-like motifs in your target sequence
Slow but real cleavage at LEVLFQ↓A and LEVLFQ↓S — significant at overnight scale
In GST-PreScission preparations, residual contaminating proteases from production can confound diagnosis
Thrombin
Consensus: LVPR↓GS (commonly written as the synthetic LVPRGS).
Specificity profile: Much looser than TEV/3C. The natural substrates of thrombin are fibrinogen and Protein C, both of which contain extended exosite interactions that thrombin uses to discriminate substrates. On a synthetic LVPRGS site, thrombin lacks those exosite contacts and becomes promiscuous: it will cut at R↓X in almost any exposed loop containing a basic residue followed by a small amino acid.
Hidden cost: Thrombin is also a transamidase under some conditions and can rearrange peptide bonds.
Factor Xa
Consensus: IEGR↓ (or IDGR↓).
Specificity profile: Even looser than thrombin. The minimal pharmacophore is basically (I/L)-(E/D)-G-R↓, and at high enzyme:substrate ratios it cuts at almost any single Arg or Lys preceded by a hydrophobic-acidic-Gly stretch. Walker et al. (1994) systematically demonstrated that Factor Xa cleaves at numerous internal sites in non-cognate substrates, especially in disordered regions (Walker et al., 1994).
Reputation: Among experienced protein chemists, Factor Xa is considered a last resort protease. It is shipped in many commercial kits because its cleavage leaves a clean N-terminus, but the trade-off in specificity is usually not worth it.
Enterokinase (Enteropeptidase)
Consensus: DDDDK↓ — five-residue motif, designed to be statistically rare.
Specificity profile: The DDDDK sequence is extremely uncommon in natural proteins, making it superficially attractive. However, enterokinase tolerates significant variation at P1–P5, and the P1' position is essentially unrestricted. Cleavage at DDXDK, DDDXK, and even basic-residue-preceded-by-acidic-cluster sites is well-documented.
Hidden problem: Many commercial enterokinase preparations are bovine-derived and contaminated with trypsin-like activities. Overnight incubation degrades disordered loops indiscriminately.

Why Cryptic Cleavage Happens: The Six Mechanisms
1. Cryptic Consensus Sequences
The most straightforward cause. Your protein contains a sequence that resembles the canonical recognition motif closely enough to be cleaved.
TEV example: A construct designed with the recognition site ENLYFQ↓S at the tag junction may also contain DNLYFQ-G (P6 E→D) internally. The internal site cleaves at perhaps 5% the rate of the engineered site — negligible over a 1-hour digest, devastating over 16 hours.
3C example: The P1 Gln is essential. But if your protein contains LEXLFQ-G or VEXLFQ-G anywhere, cleavage will happen.
How to find them: A regex search of your construct sequence against the consensus families:
TEV:
[ENDKQ][ND]?LY[FYL]Q[^P]3C:
[LIV]E[VAILM][LIM]FQGThrombin:
[LV][VAL]PR[GS]Factor Xa:
[ILV][ED]GR.
A construct with three Cys-rich loops and a glutamine-rich linker may have a dozen near-consensus sites.
2. P1' Permissiveness
The single most under-appreciated feature of TEV specificity. The P1' position is essentially a "wobble" position: any non-proline residue is acceptable. This dramatically expands the cryptic site pool.
For 3C, P1' Gly is preferred but not required. A protein with internal "...LEALFQ-A..." will be cut at low but non-zero rate.
3. Disorder and Exposed Loop Accessibility
This is the dominant factor when ranking cryptic sites by probability of actual cleavage.
The reason: proteases are large globular enzymes. The active site cleft requires the substrate to adopt an extended β-strand-like conformation across 6–8 residues. A buried, helical, or sheet-embedded ENLYFQ-G sequence is physically inaccessible to TEV. The same sequence on an exposed loop is a prime target.
Implication: A construct with 5 perfect consensus sites might cleave only at the 1 that is solvent-exposed and conformationally free. Conversely, a construct with zero perfect sites but several near-consensus sites in disordered loops can suffer extensive cryptic cleavage.
Predictor relevance: Disorder prediction (IUPred, MetaDisorder) and accessibility prediction (NetSurfP, structural mapping onto AlphaFold models) are far more predictive of cryptic cleavage than sequence consensus alone (Waugh, 2011).
4. His-Tag Protease Contamination
A non-obvious but well-documented mechanism. TEV and PreScission proteases are typically expressed in E. coli with His tags and purified by Ni-NTA. Residual E. coli proteases — particularly OmpT, Lon, and ClpXP fragments — can co-purify if the protease prep is not stringently cleaned.
Diagnostic clue: If you see cleavage at sites that bear no resemblance to the consensus, especially at K↓R, R↓R, or paired basic residues, suspect OmpT contamination.
Fix: Use commercial protease from a vendor with documented protease-free QC, or include a high-salt wash and ion-exchange polish step in your in-house TEV prep.
5. Promiscuity at High Enzyme:Substrate Ratios
Protease promiscuity scales with concentration. At a 1:100 E:S ratio (mass/mass), TEV is exquisitely specific. At 1:5 — sometimes used to force cleavage of a sluggish site — TEV begins to cut at sites it would normally ignore.
Quantitatively: Cleavage rate at a non-canonical site scales linearly with enzyme concentration. If your engineered site is 1000× preferred and you push enzyme to 1000× the standard amount, the cryptic site will cleave at 100% of the canonical rate.
6. Prolonged Incubation
Time × rate = total cleavage. A site that cuts at 0.1% the canonical rate produces 1.6% cleavage in 16 hours when the canonical site is 100% cleaved. Across many cryptic sites, this adds up.
The temptation to "let it go overnight" to ensure complete cleavage of the engineered site is exactly the wrong move. Better: monitor by SDS-PAGE at 30 min, 1 h, 2 h, 4 h, and stop at the earliest time point that shows complete cleavage of the intended bond.

A Sequence Survey of "Safe" vs "Dangerous" Constructs
Sequence feature | Cryptic cleavage risk (TEV) | Risk (3C) | Notes |
|---|---|---|---|
Q-rich disordered linkers | High | High | Q is P1 for both proteases |
(E/D)-rich acidic patches | Moderate | Low | TEV P6 prefers E |
Exposed FQ↓G/S motifs | High | High | Both proteases share P2-P1' geometry |
Hydrophobic clusters near exposed loops | Moderate | Moderate | P4-P3 hydrophobic preference |
All-helical bundle, no linkers | Very low | Very low | No accessible extended conformation |
Single-domain β-sandwich | Low | Low | Internal Qs are mostly buried |
Multi-domain with flexible interdomain linker | High | High | Linkers are obligate cleavage candidates |
Known Cryptic Site Motifs Reported in the Literature
These are sequences that have been independently observed to cause off-target cleavage in published protein production reports. The list is not exhaustive but flags well-known traps:
Motif | Protease | Approximate relative rate vs canonical |
|---|---|---|
DNLYFQ-G | TEV | 0.05–0.10 |
ENLYLQ-G | TEV | 0.02–0.10 |
ENVYFQ-G | TEV | 0.01–0.05 |
ENLYFQ-P | TEV | <0.001 (P1' proline strongly disfavored — useful as a stop codon for the protease) |
LEVLFQ-A | 3C | 0.05–0.15 |
LEVLFQ-S | 3C | 0.01–0.05 |
LEILFQ-G | 3C | 0.20–0.50 (nearly canonical) |
VEVLFQ-G | 3C | 0.10–0.30 |
LVPR-GS (internal) | Thrombin | Variable, often >0.5 if loop is exposed |
Single exposed R after acidic patch | Factor Xa | 0.1–1.0 (unpredictable) |
The relative rates are order-of-magnitude estimates from kinetic studies and should be used to prioritize which sites to remove, not as quantitative predictions.

Diagnostic Workflow: Localizing the Cryptic Cut
When you suspect cryptic cleavage, do not guess. The diagnosis is straightforward and takes 1–2 days:
Step 1: Intact-Mass LC-MS
Run the post-cleavage protein on a denaturing LC-MS column (e.g., reversed-phase C4) coupled to a high-resolution mass spectrometer.
Read the spectrum:
The expected cleavage product has a calculable mass from the sequence
Each cryptic fragment is a separate species with a measurable mass
The mass difference between the expected and observed fragment localizes the cut to a small region
Example: Your protein is 28,450 Da expected. You see 23,180 Da and 5,290 Da. The sum (28,470 Da) is 20 Da more than expected — consistent with cleavage producing two species each retaining their respective termini (the +20 is the water mass added at cleavage, accounting for both fragments). The 5,290 Da fragment corresponds to roughly residues 213–end based on the sequence — your cryptic site is around residue 212–213.
Step 2: N-Terminal Edman Sequencing or Top-Down MS
The unwanted C-terminal fragment carries a new N-terminus generated by the protease. Sequencing the first 5 residues by Edman degradation pinpoints the cut bond to single-residue resolution.
Top-down MS alternative: ETD or EThcD fragmentation of the intact small fragment gives the N-terminal sequence in ~30 minutes if you have access to an Orbitrap with ETD.
Step 3: Map Against Construct
Once you know the N-terminus of the unwanted fragment, look at the upstream P1 position. Is it Q (TEV/3C), R (thrombin), K (enterokinase)? Look at P2–P6. Does the local sequence resemble the consensus? Is the site predicted to be in a disordered or exposed loop?
This three-step workflow converges in essentially every case. Once the cryptic site is known, mitigation is targeted, not guesswork.

Mitigation: A Decision Tree
First: Try Condition Changes (No Cloning Required)
The fastest fixes are biophysical, not genetic.
Lower temperature. Cleave at 4°C instead of room temperature or 16°C. The canonical site, with optimal substrate geometry, still cleaves efficiently. Cryptic sites — which depend on the substrate happening to sample an extended conformation — slow down disproportionately because conformational dynamics are temperature-dependent. A 2–10× selectivity gain is typical.
Shorten incubation. Run a time course: 30 min, 1 h, 2 h, 4 h. Stop at the earliest time point with complete cleavage of the intended bond, judged by SDS-PAGE band-shift. Cryptic cleavage continues to accumulate after the canonical bond is fully cleaved.
Lower enzyme:substrate ratio. Drop from 1:20 to 1:50 to 1:100 (w/w). Canonical cleavage takes longer but selectivity improves linearly.
On-column cleavage. Bind your tagged protein to Ni-NTA (or amylose for MBP, glutathione for GST), then flow protease over the column. The protein is anchored and presented to the protease in a constrained orientation, often reducing access to cryptic loop sites. The cleaved product flows through; the tag stays on the column. This is the single highest-yield mitigation technique and should be tried before anything else.
Buffer optimization. TEV is tolerant of most buffers but loses activity below pH 6.5 and above pH 9.0, and in the presence of high NaCl (>500 mM). Optimal: 50 mM Tris pH 8.0, 100 mM NaCl, 1 mM DTT, 0.5 mM EDTA. PreScission prefers similar conditions but is more salt-tolerant. Stay within these envelopes.
Second: Try Construct Modifications
If conditions can't rescue the cleavage, make a new construct.
Mutate the cryptic site. A single point mutation at P1 (Q→N or Q→E) abolishes TEV/3C cleavage entirely. If the cryptic site is in a disordered loop, the mutation is almost always functionally silent. Verify with AstraDDG that the mutation is predicted to be stable.
Truncate disordered regions. If the cryptic site falls in an N-terminal or C-terminal disordered tail (or in a large flexible linker), trimming the construct boundaries removes the problem entirely. This often improves crystallizability as a bonus.
Switch protease. A site that is cryptic for TEV is rarely also cryptic for 3C, because the consensus sequences differ at P2 and P3. If your protein has a TEV-cryptic loop, switching to a SUMO-Ulp1 system or a 3C-cleavable construct often solves it.
Engineer a protease "stop codon." Insert an ENLYFQ-P sequence (P1' Pro completely blocks TEV) immediately C-terminal to a disordered region. This can act as a protective signal that draws the protease away from the cryptic site — though this strategy is more theoretical than well-validated.
Third: Switch to Non-Proteolytic Tag Removal
When proteases keep failing, eliminate the protease entirely.
SUMO/Ulp1 system. Ulp1 recognizes the fold of SUMO, not a linear sequence. There is no possible internal cryptic site because Ulp1 cannot bind a linear peptide. This is the cleanest option if you can tolerate a SUMO-fusion construct.
Sortase-mediated. Sortase A from Staphylococcus aureus recognizes LPXTG and performs a transpeptidation. The recognition motif is statistically rare and the chemistry is fundamentally different from serine/cysteine proteases — almost no cryptic sites exist in natural proteins. Sortase is increasingly popular for sensitive constructs.
Self-cleaving inteins. Intein-based tag removal (e.g., the IMPACT system) is triggered by pH/thiol changes, not by an exogenous enzyme. No off-target protease activity is possible.
Acid-cleavable linkers. Asp-Pro bonds cleave at low pH. Useful for some applications but typically requires harsh conditions incompatible with native protein.

Optimization Matrix: Conditions vs Selectivity
Parameter | Standard | Selective (less cryptic) | Aggressive (faster) |
|---|---|---|---|
Temperature | 16°C, 4°C overnight | 4°C, short time | RT, overnight |
E:S ratio (w/w) | 1:20 to 1:50 | 1:100 to 1:200 | 1:5 to 1:10 |
Time | Overnight | Time course, stop early | Overnight or longer |
Buffer pH | 8.0 | 7.5 (slightly suboptimal) | 8.0 |
NaCl | 100–150 mM | 200–300 mM | 50–100 mM |
DTT | 1 mM | 1 mM | 1 mM |
Format | In-solution | On-column | In-solution |
Monitoring | End-point SDS-PAGE | Time-course SDS-PAGE + LC-MS | End-point only |
The "Selective" column is the recommended starting point for any new construct, even before you know whether cryptic cleavage will occur. The marginal cost (one extra time course) is trivial; the upside is avoiding a one-month re-cloning loop.
The Kinetic Picture: Why Even 0.1% Matters
Protein chemists are often surprised that a cryptic site cleaving at 0.1% the rate of the canonical site can destroy a prep. The math is simple but worth working through, because it explains the dominant operational mistake — overnight incubations.
Suppose your canonical site has kcat/KM of 10⁵ M⁻¹s⁻¹ (typical for TEV on its preferred substrate). A cryptic site at 0.1% relative rate has kcat/KM ≈ 10² M⁻¹s⁻¹.
At a typical reaction setup:
Protein substrate: 1 mg/mL ≈ 30 μM (for a 30 kDa protein)
TEV protease: 1:50 w/w ≈ 0.02 mg/mL ≈ 0.7 μM
For pseudo-first-order kinetics at the cryptic site:
Rate constant k = (kcat/KM) × [enzyme] = 10² × 7×10⁻⁷ = 7×10⁻⁵ s⁻¹
Fraction cleaved at time t: 1 − exp(−kt)
After 1 hour (3600 s): 1 − exp(−0.25) ≈ 22% cryptic cleavage.
After 16 hours (overnight): 1 − exp(−4) ≈ 98% cryptic cleavage.
A 0.1% relative-rate site goes from "negligible" at 1 hour to "essentially complete" overnight. This is the single most important number to internalize. It also explains why on-column cleavage is so effective — the enzyme concentration in the column volume is lower (because most enzyme is in flow-through) and the effective time at any one substrate molecule is shorter (because the column has continuous flow exchange).
The corollary: if you must run an overnight digest, you must also run a parallel 1-hour digest and compare by LC-MS. If the two have different fragment patterns, cryptic cleavage is present. If they match, you are clean.

A Note on Engineered "Super-Specific" Protease Variants
The field has not stood still. Several engineered variants of TEV with broader or narrower substrate scope have been published over the past decade:
TEV S219V "superTEV": The Waugh lab's autoproteolysis-resistant variant. Specificity is identical to wild-type TEV; the engineering removes a single Q at position 219 that allowed the protease to cleave itself. Use this variant; there is no advantage to wild-type TEV (Kapust et al., 2001).
TEV variants with altered P1 specificity: Several papers report TEV mutants that cleave at sites other than Q-P1 — for example, an E-P1 variant. These are valuable for orthogonal cleavage strategies (cleave at two different sites with two different proteases) but do not solve cryptic-cleavage problems on the original site.
3C variants: Less work has been done on 3C engineering, partly because the commercial PreScission product (GST-3C) is a stable enzyme and partly because 3C's natural specificity is already considered adequate.
The practical implication: superTEV is the only variant most labs should consider, and the cryptic-cleavage problem is essentially independent of variant choice.

When Cryptic Cleavage Is Actually Your Friend
A small but real subset of constructs benefit from controlled internal cleavage. Examples:
Tandem domain proteins for crystallography: Sometimes you want only one domain after purification. Engineering a TEV site between domains and accepting that cleavage will be partial gives you a mixture you can separate by SEC.
Limited proteolysis mapping: Using TEV at very low E:S as a structural probe to identify exposed loops. Sites that cut are exposed; sites that resist are buried. This is a legitimate experiment, not a failure.
Activation of zymogen-like designs: Some engineered constructs use an internal protease site to activate a function upon tag removal — the protease performs two cuts: one to remove the tag and one to release an autoinhibitory peptide.
In each of these, the "problem" of internal cleavage is the design feature. The takeaway is that you should know whether your construct has internal sites by intent or by accident — never let it be a surprise.

A Case Study: The Disappearing Loop
A researcher producing a 32 kDa enzyme for crystallization noticed her purified protein consistently ran as a doublet on SDS-PAGE. The faster band was ~5 kDa smaller. She had used a standard His6-TEV-target construct, TEV cleaved overnight at 4°C, reverse Ni-NTA to remove the tag.
Diagnosis: Intact-mass LC-MS showed the doublet was 32,150 Da and 27,420 Da. The 4,730 Da difference corresponded to a 42-residue fragment. Top-down MS gave the new N-terminus: GSIVLR..., starting at residue 246 of the construct.
Mapping: Upstream of residue 246 in the original sequence was ...ENLYFG↓G... — a perfect TEV site at residues 240–245. The construct had been designed by Gibson assembly, and the engineered TEV site was at residues 8–13 (between the His tag and the target). The internal ENLYFG was an accident — a coincidence in the natural protein sequence.
AlphaFold check: Residues 240–250 mapped to a disordered loop with pLDDT < 50 — perfectly exposed, perfectly extended, perfectly accessible.
Mitigation tried:
On-column cleavage: reduced cryptic cleavage from ~30% to ~12%. Improvement but not enough.
Q→N mutation at residue 245: completely abolished cryptic cleavage. AstraDDG predicted ΔΔG = +0.3 kcal/mol (well within tolerance). Activity assay confirmed the mutation was functionally silent.
Time lost before diagnosis: 6 weeks. Time to fix after diagnosis: 5 days.
Common Mistakes and What They Cost
Mistake 1: Assuming "Highly Specific" Means "Absolutely Specific"
TEV and 3C are highly specific relative to thrombin and Factor Xa. They are not absolutely specific. The marketing literature has done structural biologists a disservice by reinforcing the absolute-specificity framing.
Cost: Months of confusion when an "impossible" off-target cleavage appears.
Mistake 2: Not Running a Pre-Cleavage Time Course
Most labs go straight to overnight digests because that's the protocol they were taught. A 2-hour time course on a small aliquot before scaling up is a 30-minute investment that prevents weeks of downstream pain.
Mistake 3: Buying TEV Without Testing the Lot
Commercial TEV varies in purity and specific activity by an order of magnitude across vendors and lots. Test new lots side-by-side with your previous lot on a known substrate. A lot with elevated contaminating protease activity will hit your construct's cryptic sites harder.
Mistake 4: Ignoring the Sequence-Level Risk Scan
You can scan your construct against the consensus regex in 30 seconds. Almost nobody does. The result is that cryptic sites are discovered only after the construct is made, expressed, purified, and partially cleaved.
Mistake 5: Removing the Tag When You Don't Need To
The simplest fix for "the protease keeps cutting my protein" is: don't use the protease. For many downstream applications — pull-downs, binding assays, even crystallography of large proteins — leaving the tag on is fine. Tag removal is often a habit, not a requirement.

A Quick Reference: Pre-Cleavage Checklist
Before you set up any tag-removal digest, walk through this list. It takes ten minutes and prevents most cryptic-cleavage incidents.
Run your construct sequence through the protease consensus regex (TEV, 3C, thrombin, Factor Xa). Note every hit.
Overlay the hit positions on the AlphaFold predicted structure. Are any in disordered or exposed loops?
If yes, score the risk: high (loop pLDDT < 50), moderate (loop pLDDT 50–70), low (buried, pLDDT > 70).
Pick the protease whose cryptic sites are least accessible in your construct. TEV vs 3C often differ here.
Design the digest with the "Selective" column from the optimization matrix above.
Run a 30 min / 1 h / 2 h / 4 h time course on a small aliquot before scaling up.
Check both expected and unexpected bands by intact-mass LC-MS at each time point if possible.
Stop at the earliest time point that shows complete cleavage of the canonical bond.
If cryptic cleavage is observed, switch to on-column format before any other change.
If on-column does not solve it, diagnose the cryptic site by top-down MS and consider a P1 mutation.
This list is the difference between a clean protein on the first try and a months-long troubleshooting loop.
The Bottom Line
Symptom | Most likely cause | First fix to try |
|---|---|---|
5 kDa loss, single new fragment | Cryptic ENLYFQ-X site in exposed loop | On-column cleavage + LC-MS diagnosis |
Multiple lower bands appearing over time | E:S ratio too high OR contaminating proteases | Reduce E:S, ion-exchange polish on protease prep |
Sudden new fragments after switching protease lots | New lot has contaminating E. coli proteases | Test lot on known substrate; return if problematic |
Cleavage at K↓X or R↓X with no consensus match | OmpT or trypsin-like contamination | High-salt wash on protease purification |
Slow accumulation overnight, none at 2 hours | Cryptic site with rate ~0.01 of canonical | Stop incubation at end-of-canonical, not overnight |
Cleavage in a flexible linker between domains | Disorder + accidental near-consensus | Mutate P1 Q→N in the linker (almost always silent) |
Thrombin or Factor Xa cutting "everywhere" | Inherent promiscuity at high E:S | Switch to TEV/3C, accept extra N-terminal residues |
The non-negotiable rule: Before designing a new construct, scan it against the protease consensus regexes. Before scaling up a cleavage, run a time course. Before mutating, diagnose. Before switching protease, try on-column.

Designing Protease-Robust Constructs with Orbion
Constructs that are robust to cryptic proteolysis are constructs where the cleavage site sits on an accessible loop while internal near-consensus motifs sit in buried or structured regions. This is exactly the kind of multi-feature design problem that benefits from integrated structural prediction.
In Orbion's Construct Design module, you can sketch a tag-linker-target architecture and immediately see how each component overlays on the AlphaFold-predicted structure. AstraUNFOLD highlights disordered regions and exposed loops — the high-risk zones for cryptic cleavage — and flags them for visual inspection against your chosen protease's consensus. AstraPTM's recognition-motif scanning, originally built for PTM site detection, also scans for protease consensus sequences across TEV, 3C, thrombin, Factor Xa, enterokinase, and sortase, and reports a per-site accessibility score that combines sequence match strength with predicted local disorder.
The output is a single annotated construct view: every potential cryptic site mapped, ranked by accessibility, with suggested point mutations (filtered through AstraDDG for stability impact) that abolish the cryptic site without destabilizing the protein. The goal is to make "I cleaved overnight and lost a piece of my protein" a problem you diagnose before cloning, not after.
References
Kapust RB, Tözsér J, Copeland TD, Waugh DS. (2002). The P1' specificity of tobacco etch virus protease. Biochemical and Biophysical Research Communications, 294(5):949-955. Link
Kapust RB, Tözsér J, Fox JD, Anderson DE, Cherry S, Copeland TD, Waugh DS. (2001). Tobacco etch virus protease: mechanism of autolysis and rational design of stable mutants with wild-type catalytic proficiency. Protein Engineering, 14(12):993-1000. Link
Phan J, Zdanov A, Evdokimov AG, Tropea JE, Peters HK 3rd, Kapust RB, Li M, Wlodawer A, Waugh DS. (2002). Structural basis for the substrate specificity of tobacco etch virus protease. Journal of Biological Chemistry, 277(52):50564-50572. Link
Cordingley MG, Callahan PL, Sardana VV, Garsky VM, Colonno RJ. (1990). Substrate requirements of human rhinovirus 3C protease for peptide cleavage in vitro. Journal of Biological Chemistry, 265(16):9062-9065. Link
Waugh DS. (2011). An overview of enzymatic reagents for the removal of affinity tags. Protein Expression and Purification, 80(2):283-293. PMC3094691
Walker PA, Leong LE, Ng PW, Tan SH, Waller S, Murphy D, Porter AG. (1994). Efficient and rapid affinity purification of proteins using recombinant fusion proteases. Bio/Technology, 12(6):601-605. Link