Skip to content

jbloomlab/nextstrain-prot-titers-tree

Repository files navigation

Build interactive nextstrain trees on protein sequences designed to display neutralization titer values

Release Build Status License: MIT Code style: black Ruff Snakemake


This repository contains a snakemake pipeline developed by the Bloom lab that builds interactive nextstrain trees of protein sequences that can be colored and analyzed in terms of additional data such as neutralization titers. The pipeline was designed for the use case of displaying high-throughput neutralization titer data for many strains similar to that described in Kikawa et al (2025).

This pipeline is specifically tailored for the case where you want to build protein sequence trees and have the divergence indicate the number of amino-acid mutations separating different proteins. Note that the tree inference and ancestral reconstruction use a simple Poisson substitution model where all amino-acid mutations are equally likely, not a JTT92 or more sophisticated model--this works well for densely sampled phylogenies where there is minimal ambiguity in ancestral reconstructions and you care mostly about how many mutations separate proteins. If you want more accurate phylogenetic reconstructions or have deep branches, using nucleotide models or other protein substitution models should be preferred---do not blindly use this pipeline without understanding this limitation. Gaps (deletions) are treated as a distinct character state, not as missing data, so that shared deletions are correctly assigned to their common ancestor rather than independently to each descendant. This is achieved by using a custom Poisson GTR model (data/poisson_gap_aa.txt) with TreeTime's 22-state amino-acid alphabet (20 amino acids + stop + gap) rather than a built-in model like JTT92 which uses a 20-state alphabet that treats gaps as ambiguous.

Configuring the pipeline, running it, and viewing the results

To run the pipeline, you need to build a configuration pipeline that has the configuration for the tree (input data, display options, etc).

Here are the configuration files for the examples included in this repository:

You should build your own configuration file for your data mirroring those examples (the configuration files should be self-explanatory; particularly see the comments documenting config_example-flu-seqneut-2025.yaml).

Then run the pipeline with:

snakemake -j <nthreads> --configfile <path_to_your_configuration_file> --software-deployment-method conda

Note that running this requires snakemake to be installed, which you can do by building and activating the conda environment in environment.yml.

The tree-building step using IQ-TREE will use multiple threads (up to a maximum of 8 threads, or the number of cores specified with the -j argument to snakemake, whichever is smaller) to speed up the analysis.

The result of this is an auspice JSON file with the tree suitable for viewing either by uploading to https://auspice.us/ or via a Nextstrain Community Build. The auspice JSON trees for the examples are in ./auspice and can be viewed as a Nextstrain Community Build at:

If the metadata in the configuration file has titers, they are displayed on the tree. You can also show all amino-acid identities on the tree, color by amino-acid identity at a site, and show branch lengths either based on amino-acid mutations per site or time.

If you also specify titers with per-serum titers (eg, as in config_example-flu-seqneut-2025.yaml) then the pipeline will also produce a sidecar JSON with these titers (eg, the files in ./auspice with the suffix *_measurements.json) that can be used to visualize per-serum titers in the Measurements panel when viewing the tree.

Using in a larger snakemake pipeline

The typical way to use this pipeline is as a submodule of a larger snakemake pipeline. See https://github.com/jbloomlab/flu-seqneut-2025 for an example of how that can be done.

Briefly, first add this repo as a git submodule to your larger repository pipeline by cloning it into that repository and then additing it as a git submodule with:

  git submodule add https://github.com/jbloomlab/nextstrain-prot-titers-tree

This creates a file called gitmodules and adds the nextstrain-prot-titers-tree subdirectory, both of which can then be committed to your parent repo.

You can then use it as a module in your larger pipeline, as for instance like this:

for subtype in config["subtypes"]:
    module_name = f"nextstrain-prot-titers-tree_{subtype}"
    module:
        name: module_name
        snakefile: "nextstrain-prot-titers-tree/Snakefile"
        config: config["nextstrain-prot-titers-tree_config"][subtype]
    use rule * from module_name as module_name*

Testing via GitHub Actions

When updating the pipeline, you should:

These checks are run automatically when you via the GitHub Action specified in .github/workflows/test.yaml.

About

Build interactive `nextstrain` trees on protein sequences designed to display neutralization titer values

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages