Contents

1. Biomolecular Structure Prediction Tools


This page of BioMoDes lists state-of-the-art and emerging tools for Biomolecular Structure Prediction.


1.1. Protein (Monomer) Structure Prediction

2024 (Click to collapse/expand)
  • Genie 2: A generative model for structure-based protein design that sets a new state-of-the-art by outperforming existing models on various metrics. Genie 2 builds on Genie (version 1) to expand the structure space (size and diversity) captured by the model.
    Posted: May 24, 2024
    Preprint | Code (GitHub)

  • OpenFold: A fully open-source and trainable PyTorch reimplementation of AF2 with training code and data. OpenFold, trained from scratch, matched AF2 in accuracy and introduced some technical modifications that offered improved speed and memory efficiency over AF2.
    Published: May 14, 2024
    Paper | Code (GitHub) | Data (Open Data) | Documentation | Colab Notebook

  • AlphaFold 3: The latest version of the AlphaFold model that is capable of predicting, with high accuracy, the structures of complexes containing nearly all molecular types present in the PDB (protein, DNA, RNA, small-molecule ligands, ions, and modified residues).
    Published: April 29, 2024
    Paper 1 | Webserver

  • RaptorX-Single: A single sequence protein structure prediction method that integrates protein language model and a Evoformer-based structure generation module. RaptorX-Single outperforms MSA-based (AF2) and MSA-free methods in predicting structures of antibodies (fine-tuned), structures of orphan proteins, and effects of single mutations. RaptorX-Single also runs faster than MSA-based AF2.
    Published: March 20, 2024
    Paper Code (GitHub)

  • Evo: A long-context foundation model that generalizes across the central dogma of biology: DNA, RNA, and proteins. Evo is a 7 billion parameter model trained to generate DNA sequences and is capable of prediction and generative tasks, from molecules to whole genomes.
    Posted: March 06, 2024
    Preprint | Code (GitHub) | Code (PyPI) | Blog | Playground | Colab Notebook

  • AlphaFind: A web-based search engine for finding structures similar to a given query in the entire proteome. AlphaFind queries the AlphaFold DB to find similar structures for any given input.
    Posted Feb 18, 2024
    Preprint | Code (GitHub) | Wiki | Webserver

  • AlphaFlow/ESMFlow: Fine-tuned versions of AlphaFold/ESMFold and retrained on MD simulation ensembles. AlphaFlow generates protein conformational ensembles, including experimental ensembles (as in the PDB), and molecular dynamics simulation ensembles.
    Posted Feb 7, 2024
    Preprint | Code (GitHub)

  • ALBATROSS: A tool and webserver that combines sequence design, large-scale coarse-grained simulations and deep learning for the prediction of conformational properties (ensemble) of intrinsically disordered proteins.
    Published: Jan 31, 2024
    Paper 1 | Paper 2 | Preprint | Colab Notebooks | Code (GitHub) | Webserver

  • DMFold: A tool that integrates large genomic and metagenomics sequence databases for improved protein structure prediction.
    Published: Jan 02, 2024
    Paper | Code (DMFold) | Code (DeepMSA2) | Webserver (DMFold) | Webserver (DeepMSA2)

2023
  • Chroma: A diffusion model for protein design developed by Generate:Biomedicines. Chroma is a programmable generative model that can directly sample novel protein structures and sequences. Some of its capabilities include: complexes, symmetries, substructures, shapes, “semantics” and even natural-language prompts.
    Published: Nov 15, 2023
    Paper | Preprint | Code (GitHub) | Colab Notebooks

  • AFCluster: An AlphaFold2-based method to predict multiple biologically-relevant conformations of protein structures.
    Published: Nov 13, 2023
    Paper | Code (GitHub) | Colab Notebook
    Note: There is a preprint challenging the claims in the AF-Cluster paper. You can read it here.

  • ESMFold: A method to predict atomic-level protein structure from sequence using a large protein language model. ESMFold is nearly as accurate as alignment-based methods and considerably faster, enabling the construction of the ESM Metagenomic Atlas, a database containing more than 617 million metagenomic protein sequences with one-third being high confidence predictions.
    Published: March 16, 2023
    Paper | Preprint | Code (GitHub) | Webserver | ESM Metagenomic Atlas


1.2. Protein Complex Structure Prediction

2024
  • Umol: A deep learning method for all-atom protein-ligand complex structure prediction from protein sequence and ligand SMILES string.
    Published: May 28, 2024
    Paper | Code (GitHub) | Colab Notebook

  • OpenFold: A fully open-source and trainable PyTorch reimplementation of AF2 with training code and data. OpenFold, trained from scratch, matched AF2 in accuracy and introduced some technical modifications that offered improved speed and memory efficiency over AF2.
    Published: May 14, 2024
    Paper | Preprint | Code (GitHub) | Data (Open Data) | Documentation | Colab Notebook

  • AlphaFold 3: The latest version of the AlphaFold model that is capable of predicting, with high accuracy, the structures of complexes containing nearly all molecular types present in the PDB (protein, DNA, RNA, small-molecule ligands, ions, and modified residues).
    Published: April 29, 2024
    Paper 1 | Webserver

  • FABind+: An improved version of FABind, for the prediction of protein-ligand binding based on pocket prediction and docking.
    Posted: March 29, 2024
    Preprint | Code (GitHub)

  • RosettaFold All-Atom: A network capable of predicting the structures of all atoms of a biological unit, including proteins, nucleic acids, small molecules, metals, covalent modifications (covalently modified proteins). In other words, RF-AA can generate accurate models for complexes of proteins with other protein and non-protein molecules. It’s basically a “predict-every-(bio)molecule” tool. RF-AA also provides error estimates of its predictions.
    Published: March 07, 2024
    Paper | Code (GitHub)

  • Evo: A long-context foundation model that generalizes across the central dogma of biology: DNA, RNA, and proteins. Evo is a 7 billion parameter model trained to generate DNA sequences and is capable of prediction and generative tasks, from molecules to whole genomes.
    Posted: March 06, 2024
    Preprint | Code (GitHub) | Code (PyPI) | Blog | Playground | Colab Notebook

  • GlycoSHIELD, GlycoALPHAFOLD, GlycoTRAJ, GlycoSASA,…: A set of tools and webserver for modeling glycoprotein morphology and structural dynamics.
    Published: Feb 29, 2024
    Paper | Code (GitLab) | Webserver

  • PocketGen: A method for generating full-atom ligand-binding pockets to design small molecule-binding proteins. PocketGen uses a co-design strategy that, given the ligand and the scaffold, simultaneously designs the sequence and structure of the protein pocket.
    Posted: Feb 28, 2024
    Preprint | Code (GitHub) | Blog

  • DiffDock/DiffDock-L: A diffusion generative model for blind molecular docking.
    Posted: Feb 11, 2023 | Posted: 28 Feb 2024
    Preprint 1, DiffDock | Preprint 2, DiffDock-L | Code (GitHub) | Demo/DiffDock-Web (HuggingFace)

  • NeuralPLexer: A tool for predicting protein–ligand complex structures using protein sequence and ligand molecular graph inputs.
    Published: Feb 12, 2024
    Paper | Code (GitHub) | Code (Code Ocean)

  • CombFold: A combinatorial and hierarchical assembly algorithm combined with AlphaFold2 for predicting structures of large protein assemblies.
    Published: Feb 07, 2024
    Paper | Code (GitHub) | Code (Code Ocean) | Colab Notebook

  • DynamicBind: A generative model and webserver for predicting ligand-specific protein-ligand complex structure. DynamicBind is a “dynamic docking” method that attempts to overcome the limitations of traditional docking and MD simulation.
    Published: Feb 05, 2024
    Paper | Code (GitHub) | Webserver

  • DMFold-Multimer: The protein structure prediction tool that, by integrating large genomic and metagenomics sequence databases, outperformed 86 other methods in the complex modeling section of CASP15 .
    Published: Jan 02, 2024
    Paper | Code (DMFold-Multimer) | Code (DeepMSA2-Multimer) | Webserver (DMFold-Multimer) | Webserver (DeepMSA2-Multimer)

2023
  • FragFold: An AlphaFold2-based method for high-throughput prediction of peptide binding to protein targets. It’s a method that was employed for high-throughput computational discovery of inhibitory protein fragments.
    Posted: Dec 20, 2023
    Preprint | Code (GitHub)

  • Chroma: Chroma, a diffusion model for protein design developed by Generate:Biomedicines. Chroma is a programmable generative model that can directly sample novel protein structures and sequences. Some of its capabilities include: complexes, symmetries, substructures, shapes, “semantics” and even natural-language prompts.
    Published: Nov 15, 2023
    Paper | Preprint | Code (GitHub) | Colab Notebooks

  • LightDock: A protein-protein, protein-peptide, and protein-nucleic acid flexible docking framework based on the Glowworm Swarm Optimization (GSO) algorithm. LightDock is not just a protocol but a framework that accepts multiple user-selected scoring functions and force-fields.
    Published: May 04, 2023
    Paper | Code (GitHub) - Home | Code (GitHub) - Python Implementation | Code (GitHub) - Rust Implementation | Webserver | Homepage

2022
  • AlphaFill: A tool that, using sequence and structure similarity, FILLS the gap in the AF2 protein structure database of all known protein sequences. AlphaFill populates AF2 structure models with relevant small-molecule ligands and cofactors by, essentially, transplanting ligands found in experimentally determined structures of homologous proteins.
    Published: Nov 24, 2022
    Paper | Code (GitHub) | Webserver

  • AlphaFold2-Multimer: AlphaFold2 retrained for the prediction of protein-protein complexes.
    Posted: March 10, 2022
    Preprint | Code (GitHub)


1.3 Antibody Structure Prediction

2024
  • GeoAB: A method for computational design and optimization (affinity maturation) of antibody. GeoAB utilizes a co-design strategy, predicting the structure of a CDR and optimized 1D sequences for structure.
    Posted: May 17, 2024
    Preprint | Code (GitHub)

  • H3-OPT: A model for predicting antibody structures based on AlphaFold2 and a pre-trained protein language model.
    Posted: Mar 14, 2024
    Preprint | Code (GitHub)

  • DeepSP: A deep learning method to predict the stability of monoclonal antibodies from sequence.
    Posted: Mar 03, 2024
    Preprint | Code (GitHub)

  • tFold-Ab and tFold-Ag: Methods for antibody and antibody-antigen complex modelling and design by Tencent.
    Posted: Feb 08, 2024
    Preprint | Code (GitHub)


1.4. RNA Structure Prediction

2024
  • RNADiffFold: A generative model for RNA secondary structure prediction that leverages neural networks from RNA-FM and UFold for feature extraction.
    Posted: June 02, 2024
    Preprint | Code (GitHub)
  • Evo: A long-context foundation model that generalizes across the central dogma of biology: DNA, RNA, and proteins. Evo is a 7 billion parameter model trained to generate DNA sequences and is capable of prediction and generative tasks, from molecules to whole genomes.
    Posted: March 06, 2024
    Preprint | Code (GitHub) | Code (PyPI) | Blog | Playground | Colab Notebook


I try my best to make the information on this website as accurate as possible. If you find any errors in the contents of this page or any other page on this website, I would greatly appreciate that you kindly get in touch with me at contact@abeebyekeen.com.


If you are interested in joining my free weekly “BioMoDes and Top Reads” newsletter, please subscribe below.

* indicates required