Build interactive `nextstrain` trees on protein sequences designed to display neutralization titer values

This repository contains a snakemake pipeline developed by the Bloom lab that builds interactive nextstrain trees of protein sequences that can be colored and analyzed in terms of additional data such as neutralization titers. The pipeline was designed for the use case of displaying high-throughput neutralization titer data for many strains similar to that described in Kikawa et al (2025).

This pipeline is specifically tailored for the case where you want to build protein sequence trees and have the divergence indicate the number of amino-acid mutations separating different proteins. Note that the tree inference and ancestral reconstruction use a simple Poisson substitution model where all amino-acid mutations are equally likely, not a JTT92 or more sophisticated model--this works well for densely sampled phylogenies where there is minimal ambiguity in ancestral reconstructions and you care mostly about how many mutations separate proteins. If you want more accurate phylogenetic reconstructions or have deep branches, using nucleotide models or other protein substitution models should be preferred---do not blindly use this pipeline without understanding this limitation. Gaps (deletions) are treated as a distinct character state, not as missing data, so that shared deletions are correctly assigned to their common ancestor rather than independently to each descendant. This is achieved by using a custom Poisson GTR model (data/poisson_gap_aa.txt) with TreeTime's 22-state amino-acid alphabet (20 amino acids + stop + gap) rather than a built-in model like JTT92 which uses a 20-state alphabet that treats gaps as ambiguous.

Configuring the pipeline, running it, and viewing the results

To run the pipeline, you need to build a configuration pipeline that has the configuration for the tree (input data, display options, etc).

Here are the configuration files for the examples included in this repository:

config_example-flu-seqneut-2025.yaml which has an example configuration using the H3N2 data from Kikawa et al (2025) (which s stored in data/example-flu-seqneut-2025/).
config_example-H5NX-seqneutVSVdG.yaml which has an example configuration for a H5NX phylogenetic tree.

You should build your own configuration file for your data mirroring those examples (the configuration files should be self-explanatory; particularly see the comments documenting config_example-flu-seqneut-2025.yaml).

Then run the pipeline with:

snakemake -j <nthreads> --configfile <path_to_your_configuration_file> --software-deployment-method conda

Note that running this requires snakemake to be installed, which you can do by building and activating the conda environment in environment.yml.

The tree-building step using IQ-TREE will use multiple threads (up to a maximum of 8 threads, or the number of cores specified with the -j argument to snakemake, whichever is smaller) to speed up the analysis.

The result of this is an auspice JSON file with the tree suitable for viewing either by uploading to https://auspice.us/ or via a Nextstrain Community Build. The auspice JSON trees for the examples are in ./auspice and can be viewed as a Nextstrain Community Build at:

If the metadata in the configuration file has titers, they are displayed on the tree. You can also show all amino-acid identities on the tree, color by amino-acid identity at a site, and show branch lengths either based on amino-acid mutations per site or time.

If you also specify titers with per-serum titers (eg, as in config_example-flu-seqneut-2025.yaml) then the pipeline will also produce a sidecar JSON with these titers (eg, the files in ./auspice with the suffix *_measurements.json) that can be used to visualize per-serum titers in the Measurements panel when viewing the tree.

Using in a larger `snakemake` pipeline

The typical way to use this pipeline is as a submodule of a larger snakemake pipeline. See https://github.com/jbloomlab/flu-seqneut-2025 for an example of how that can be done.

Briefly, first add this repo as a git submodule to your larger repository pipeline by cloning it into that repository and then additing it as a git submodule with:

  git submodule add https://github.com/jbloomlab/nextstrain-prot-titers-tree

This creates a file called gitmodules and adds the nextstrain-prot-titers-tree subdirectory, both of which can then be committed to your parent repo.

You can then use it as a module in your larger pipeline, as for instance like this:

for subtype in config["subtypes"]:
    module_name = f"nextstrain-prot-titers-tree_{subtype}"
    module:
        name: module_name
        snakefile: "nextstrain-prot-titers-tree/Snakefile"
        config: config["nextstrain-prot-titers-tree_config"][subtype]
    use rule * from module_name as module_name*

Testing via GitHub Actions

When updating the pipeline, you should:

lint code with ruff (ruff check .)
format code with black (black .)
lint Snakefile with snakemake --lint (snakemake --lint --configfile example_config.yaml)
format Snakefile with snakefmt (snakefmt .).

These checks are run automatically when you via the GitHub Action specified in .github/workflows/test.yaml.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
auspice		auspice
data		data
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config_example-H5NX-seqneutVSVdG.yaml		config_example-H5NX-seqneutVSVdG.yaml
config_example-flu-seqneut-2025.yaml		config_example-flu-seqneut-2025.yaml
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Build interactive `nextstrain` trees on protein sequences designed to display neutralization titer values

Configuring the pipeline, running it, and viewing the results

Using in a larger `snakemake` pipeline

Testing via GitHub Actions

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Build interactive nextstrain trees on protein sequences designed to display neutralization titer values

Configuring the pipeline, running it, and viewing the results

Using in a larger snakemake pipeline

Testing via GitHub Actions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Build interactive `nextstrain` trees on protein sequences designed to display neutralization titer values

Using in a larger `snakemake` pipeline

Packages