MIP can design a novel protein from scratch for a target small molecule. Give it a SMILES string, and it generates an amino acid sequence together with a co-folded 3D structure, then independently refolds the design as a sanity check — all inside your chat. Enzyme design is powered by DISCO (DIffusion for Sequence-structure CO-design), a diffusion model that jointly generates sequence and structure conditioned on a ligand. Designs are refolded with NVIDIA Boltz-2 and rendered with Mol*, the same viewer used by RCSB PDB.Documentation Index
Fetch the complete documentation index at: https://docs.purna.ai/llms.txt
Use this file to discover all available pages before exploring further.
Starting a design
Ask MIP to design a protein for a substrate or ligand:- “Design an enzyme that binds caffeine”
- “Design a protein that catalyses cyclopropanation of styrene — use ethyl diazoacetate as the carbene source”
- “Design 3 scaffolds for this SMILES: CN1C=NC2=C1C(=O)N(C)C(=O)N2C”
Input options
| Parameter | What it does | Default |
|---|---|---|
ligandSmiles | SMILES of the target substrate or ligand. Required. | — |
proteinSequence | Partial sequence with - at positions to design. Enables motif scaffolding. | Fully masked 150-residue protein |
dnaTarget | ACGT sequence for DNA-binding designs | None |
rnaTarget | ACGU sequence for RNA-binding designs | None |
effort | fast (100 diffusion steps, 2 recycles) or max (200 steps, 4 recycles) | max |
numDesigns | Independent designs to generate (1–5) | 3 |
Motif scaffolding
If you already know the catalytic residues you want to preserve, pass them insideproteinSequence. Fixed positions hold their amino acid, - positions are designed by DISCO.
"----------S----------H----------D----------"pins a Ser/His/Asp triad and lets DISCO scaffold the rest."MKGH----------------------------GGHM"fixes terminal residues and designs everything between.
The result card
When the job completes, a result card opens inline:Design target
The ligand’s 2D structure, rendered from its canonical SMILES. Click Show SMILES to copy the string.3D structure
An interactive Mol* viewer with the designed protein backbone and the co-folded ligand. Residues whose Cα lies within 5 Å of the ligand are highlighted. Multi-design jobs get a tab strip at the top so you can switch between seeds.DISCO outputs backbone-only structures (N, Cα, C, O) — no sidechains. Highlighted residues are positional candidates near the ligand, not confirmed catalytic residues. Experimental characterisation is always required.
Structure composition
A compact bar chart of helix, sheet, and coil fractions computed from the backbone with Biotite.Physical and chemical properties
Four sequence-derived numbers: molecular weight, isoelectric point, hydropathicity (GRAVY), and Guruprasad instability index. A collapsible Sequence-derived heuristics section underneath shows charge at pH 7, extinction coefficient, aromatic fraction, and rough suggestions for ion-exchange buffer conditions. These are heuristics from the amino acid sequence, not predictions of expression behaviour.Model confidence
DISCO’s native confidence scores — ranking score, inter-chain ipTM, and steric-clash flag when present.Boltz-2 validation
Every design is automatically refolded with Boltz-2 to independently predict the structure from sequence and ligand. MIP then computes the backbone RMSD between DISCO’s structure and Boltz-2’s prediction.| RMSD | Verdict | Meaning |
|---|---|---|
| < 2.0 Å | PASS | The sequence encodes the intended fold (paper co-designability threshold) |
| 2.0–3.0 Å | Marginal | Worth visual inspection; may be a loop difference |
| > 3.0 Å | Fail | The sequence refolds to something different — consider regenerating |
Designed sequence
The amino acid sequence in monospace, color-coded by property group (hydrophobic, positive, negative, polar, Cys, Gly, Tyr). Residues with their Cα within 5 Å of the ligand are highlighted in emerald. Hover any residue to see its position and near-ligand status.Downloading results
The result card toolbar provides:- FASTA — the designed protein sequence as FASTA, ready for gene synthesis
- CIF / PDB — the full co-folded structure, openable in Mol*, PyMOL, or ChimeraX
How it works
- Validation — MIP parses your SMILES with openchemlib, rejects invalid input before any GPU spins up, and uses the canonical form so equivalent SMILES hit the dedup cache.
- Submission — The DISCO input JSON is sent to a dedicated Cerebrium GPU (NVIDIA L40). Identical jobs submitted within 10 minutes return the existing result.
- Design — DISCO runs
effortdiffusion steps with the configured number of recycles. Typical runtime is 15–60 minutes foreffort=maxon a 150–250 residue protein, scaling linearly withnumDesigns. - Storage — Each design (CIF and FASTA) is uploaded to cloud storage.
- Validation — A Boltz-2 refolding job is dispatched automatically. Results are joined back to the DISCO design card as they complete.
- Billing — Charged at 10 credits per second of actual GPU time (including cold start), rounded up to the nearest minute.
Limits and guardrails
| Parameter | Limit |
|---|---|
| Protein sequence length | 1,000 residues (L40 GPU ceiling) |
| SMILES length | 5,000 characters |
| DNA / RNA target length | 10,000 bases |
| Concurrent GPU jobs per user | 3 |
| Minimum credit balance to start | 5,000 credits |
| Boltz-2 validation cost | 1,000 credits per design (skipped with a notification if the balance is too low) |
When to use enzyme design vs structure prediction
| Scenario | Recommended approach |
|---|---|
| You have a target protein and want its 3D structure | Use structure prediction |
| You have a small molecule and want a protein built for it | Use enzyme design |
| You have a known scaffold and want to alter a few residues | Use structure prediction with mutant comparison |
| You have a known active-site motif and want a new fold around it | Use enzyme design with motif scaffolding via proteinSequence |
Designed enzymes are computational predictions, not proven catalysts. Plan for experimental characterisation — expression, purification, and activity assays — before treating a design as a working enzyme. The DISCO paper reports that one round of directed evolution on a designed enzyme can deliver several-fold activity improvements.
