Update Documentation.

ccrepy · Hackable Diffusion Authors · commit ecd41f8b2642 · 2026-05-06T09:23:10.000-07:00
PiperOrigin-RevId: 911379267
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -134,6 +134,50 @@ print(f"Output shape: {output.shape}")
 # Output shape: (1, 64, 64, 3)
 ```
 
+### `DiT`
+
+(`lib/architecture/dit.py`)
+
+The `DiT` class implements a **Diffusion Transformer** backbone based on
+<https://arxiv.org/abs/2212.09748>. It uses adaptive layer norm zero
+(adaLN-Zero) as the conditioning mechanism. The architecture consists of
+repeated transformer blocks with optional encoder/decoder and absolute
+positional encoding.
+
+Key parameters:
+
+  * `num_blocks`: Number of DiT blocks.
+  * `block`: A DiT block module (e.g., `DiTBlockAdaLNZero`).
+  * `encoder`: Optional encoder (e.g., `Patchify` for image inputs).
+  * `decoder`: Optional decoder (e.g., `DePatchify` for image outputs).
+  * `absolute_posenc`: Optional positional encoding module.
+  * `use_padding_mask`: Whether to mask out padding tokens (for tokenized
+    inputs).
+
+The `DiT` expects an `ADAPTIVE_NORM` conditioning embedding. The `mnist_dit`
+notebook demonstrates its usage.
+
+## Diffusion Network
+
+(`lib/diffusion_network.py`)
+
+The **`DiffusionNetwork`** class is the primary entry point for constructing a
+complete diffusion model. It composes a backbone (e.g., `Unet` or `DiT`) with a
+`ConditioningEncoder` into a single Flax module that conforms to the
+`BaseDiffusionNetwork` protocol.
+
+  * **`DiffusionNetwork`**: Single-modal model. Takes `(time, xt,
+    conditioning)` and internally runs the conditioning encoder, applies any
+    input/time rescaling, and calls the backbone.
+  * **`MultiModalDiffusionNetwork`**: Generalizes `DiffusionNetwork` to
+    multi-modal PyTree data, allowing different prediction types and data
+    dtypes per leaf.
+  * **`SelfConditioningDiffusionNetwork`**: Adds self-conditioning, where the
+    model receives its own previous prediction as an additional input.
+
+These classes also support `InputRescaler` and `TimeRescaler` for
+schedule-dependent input preprocessing (e.g., EDM preconditioning).
+
 ### `ConditionalMLP`
 
 (`lib/architecture/mlp.py`)
diff --git a/docs/corruption.md b/docs/corruption.md
@@ -367,10 +367,14 @@ The torus is a flat space with periodic boundary conditions.
 ### Example Usage
 
 ```python
+import jax
+import jax.numpy as jnp
 from hackable_diffusion.lib import manifolds
 from hackable_diffusion.lib.corruption.riemannian import RiemannianProcess
 from hackable_diffusion.lib.corruption.schedules import LinearRiemannianSchedule
 
+key = jax.random.PRNGKey(0)
+
 # 1. Define manifold and process
 manifold = manifolds.Sphere()
 schedule = LinearRiemannianSchedule()
@@ -379,6 +383,7 @@ process = RiemannianProcess(manifold=manifold, schedule=schedule)
 # 2. Corrupt data
 x0 = jnp.array([[1.0, 0.0, 0.0]]) # Point on S2
 time = jnp.array([0.5])
+key, subkey = jax.random.split(key)
 xt, target_info = process.corrupt(subkey, x0, time)
 
 # target_info['velocity'] is the regression target u_t
diff --git a/docs/index.md b/docs/index.md
@@ -80,13 +80,14 @@ encapsulates the call to the model and can be composed with other
 functionalities like classifier-free guidance. This allows the main sampling
 loop to be agnostic to the details of how a prediction is made.
 
-### [Diffusion Loss Functions](./loss.md)
+### [Training](./training.md)
 
 (`lib/training/`)
 
 This module provides flexible loss functions for training diffusion models. It
 includes highly configurable weighted MSE losses for Gaussian processes (like
-`SiD2Loss`) and cross-entropy losses for discrete data.
+`SiD2Loss`) and cross-entropy losses for discrete data. It also provides time
+sampling strategies for selecting training timesteps.
 
 ### [Sampling](./sampling.md)
 
@@ -107,14 +108,18 @@ excellent starting points for understanding the library's components in action.
     a simple 2D toy dataset.
 *   **`mnist.ipynb`**: Trains a standard continuous diffusion model (Gaussian
     process) on the MNIST dataset, demonstrating image data handling.
+*   **`mnist_dit.ipynb`**: Trains a Diffusion Transformer (DiT) on MNIST,
+    showcasing the DiT backbone as an alternative to U-Net.
 *   **`mnist_discrete.ipynb`**: Trains a discrete diffusion model on MNIST,
     treating pixel values as categorical data. This showcases the use of
     `CategoricalProcess`.
+*   **`mnist_simplicial.ipynb`**: Trains a simplicial diffusion model on MNIST
+    using `SimplicialProcess` with Dirichlet noise on the probability simplex.
 *   **`mnist_multimodal.ipynb`**: A more advanced example that trains a
     multimodal model to jointly generate MNIST images with discrete and
     continuous diffusion models, demonstrating the "Nested" design pattern in a
     practical setting.
+*   **`mnist_nn_and_nnx.ipynb`**: Demonstrates both Flax `nn` and `nnx` module
+    styles for defining diffusion networks.
 *   **`riemannian_sphere_training.ipynb`**: Demonstrates Riemannian Flow
     Matching on the unit sphere S^2.
-*   **`riemannian_torus_ode_to_sde.ipynb`**: Shows how to use Riemannian Flow
-    Matching on the torus manifold for both ODE and SDE sampling.
diff --git a/docs/inference.md b/docs/inference.md
@@ -163,3 +163,54 @@ inference_fn = GuidedDiffusionInferenceFn(
 # predicted_x0 = final_prediction['x0']
 # ... use predicted_x0 to compute x_t_minus_1
 ```
+
+## Inference Wrappers
+
+(`lib/inference/wrappers.py`)
+
+In practice, you need a concrete way to convert a trained model into an
+`InferenceFn`. The library provides two wrappers:
+
+### `FlaxLinenInferenceFn`
+
+Wraps a Flax `nn.Module` and its parameters into an `InferenceFn`. This is the
+most common wrapper for models defined with the Linen API.
+
+```python
+from hackable_diffusion.lib.inference.wrappers import FlaxLinenInferenceFn
+
+base_inference_fn = FlaxLinenInferenceFn(
+    network=my_diffusion_network,  # An nn.Module
+    params=restored_params,        # A pytree of model parameters
+)
+```
+
+### `FlaxNNXInferenceFn`
+
+Wraps an NNX module (converted from a Linen module) into an `InferenceFn`.
+
+```python
+from hackable_diffusion.lib.inference.wrappers import FlaxNNXInferenceFn
+
+base_inference_fn = FlaxNNXInferenceFn(
+    nnx_network=my_nnx_network,   # A ConvertedNNXDiffusionNetwork
+)
+```
+
+### `convert_flax_linen_module_with_params_to_nnx`
+
+A utility function to bridge a Linen module and its pre-trained parameters to
+an NNX module:
+
+```python
+from hackable_diffusion.lib.inference.wrappers import (
+    convert_flax_linen_module_with_params_to_nnx
+)
+
+nnx_model = convert_flax_linen_module_with_params_to_nnx(
+    linen_module=my_linen_module,
+    restored_linen_params=restored_params,
+    dummy_time, dummy_xt, dummy_conditioning, False,  # init args
+)
+```
+
diff --git a/docs/multimodal.md b/docs/multimodal.md
@@ -0,0 +1,145 @@
+# Multimodal Diffusion
+
+This document explains how to use Hackable Diffusion's "Nested" wrappers to
+build multimodal diffusion models that operate on PyTree-structured data.
+
+The multimodal wrappers are located in `lib/multimodal.py`.
+
+[TOC]
+
+## Overview
+
+Hackable Diffusion's core protocols (`CorruptionProcess`, `SamplerStep`,
+`DiffusionLoss`, etc.) are designed around single-modal arrays. To handle
+multimodal data — where different parts of the input (e.g., image + labels,
+continuous + discrete) require different diffusion treatments — the library
+provides **Nested wrappers**.
+
+Each wrapper takes a PyTree of single-modal components that matches the
+structure of your data. When called, it dispatches each method to the
+corresponding component-data pair.
+
+## Available Wrappers
+
+### Training
+
+*   **`NestedProcess`**: Applies different corruption processes per modality.
+*   **`NestedDiffusionLoss`**: Computes different loss functions per modality.
+*   **`NestedTimeSampler`**: Samples timesteps independently per modality.
+
+### Sampling
+
+*   **`NestedSamplerStep`**: Runs different sampler algorithms per modality.
+*   **`NestedTimeSchedule`**: Uses different time discretizations per modality.
+*   **`NestedGuidanceFn`**: Applies different guidance functions per modality.
+
+## Key Concept: Structure Matching
+
+The **structure of your Nested wrapper must match the structure of your data**.
+For example, if your data is a dictionary `{"image": ..., "label": ...}`, your
+`NestedProcess` must also be keyed by `{"image": ..., "label": ...}`.
+
+```python
+data = {
+    "image": jnp.zeros((batch, 32, 32, 3)),
+    "label": jnp.zeros((batch, 1), dtype=jnp.int32),
+}
+
+process = NestedProcess(
+    processes={
+        "image": GaussianProcess(schedule=CosineSchedule()),
+        "label": CategoricalProcess.masking_process(
+            schedule=LinearDiscreteSchedule(), num_categories=10,
+        ),
+    }
+)
+```
+
+## Example: Multimodal Training Setup
+
+```python
+from hackable_diffusion.lib.multimodal import (
+    NestedProcess,
+    NestedDiffusionLoss,
+    NestedTimeSampler,
+)
+from hackable_diffusion.lib.corruption.gaussian import GaussianProcess
+from hackable_diffusion.lib.corruption.discrete import CategoricalProcess
+from hackable_diffusion.lib.corruption.schedules import (
+    CosineSchedule,
+    LinearDiscreteSchedule,
+)
+from hackable_diffusion.lib.training.gaussian_loss import SiD2Loss
+from hackable_diffusion.lib.training.discrete_loss import MD4Loss
+from hackable_diffusion.lib.training.time_sampling import UniformTimeSampler
+
+# 1. Define per-modality corruption processes
+process = NestedProcess(
+    processes={
+        "image": GaussianProcess(schedule=CosineSchedule()),
+        "label": CategoricalProcess.masking_process(
+            schedule=LinearDiscreteSchedule(), num_categories=10,
+        ),
+    }
+)
+
+# 2. Define per-modality losses
+loss_fn = NestedDiffusionLoss(
+    losses={
+        "image": SiD2Loss(schedule=CosineSchedule()),
+        "label": MD4Loss(schedule=LinearDiscreteSchedule()),
+    }
+)
+
+# 3. Define per-modality time sampling (optional — can also share time)
+time_sampler = NestedTimeSampler(
+    time_samplers={
+        "image": UniformTimeSampler(),
+        "label": UniformTimeSampler(),
+    }
+)
+```
+
+## Example: Multimodal Sampling Setup
+
+```python
+from hackable_diffusion.lib.multimodal import (
+    NestedSamplerStep,
+    NestedTimeSchedule,
+)
+from hackable_diffusion.lib.sampling.gaussian_step_sampler import DDIMStep
+from hackable_diffusion.lib.sampling.discrete_step_sampler import DiscreteDDIMStep
+from hackable_diffusion.lib.sampling.time_scheduling import UniformTimeSchedule
+
+sampler_step = NestedSamplerStep(
+    sampler_steps={
+        "image": DDIMStep(
+            corruption_process=gaussian_process,
+            stoch_coeff=0.0,
+        ),
+        "label": DiscreteDDIMStep(
+            corruption_process=categorical_process,
+        ),
+    }
+)
+
+time_schedule = NestedTimeSchedule(
+    time_schedules={
+        "image": UniformTimeSchedule(),
+        "label": UniformTimeSchedule(),
+    }
+)
+```
+
+## How It Works
+
+Internally, Nested wrappers use `utils.lenient_map` to traverse the data and
+component PyTrees in parallel, calling the corresponding method on each
+component with its matching data leaf. This means:
+
+*   Any nesting depth works (dictionaries, named tuples, etc.).
+*   Single-modal and multimodal code share the same protocols.
+*   You can mix and match any combination of corruption processes, samplers, and
+    losses.
+
+The `mnist_multimodal` notebook provides a complete end-to-end example.
diff --git a/docs/sampling.md b/docs/sampling.md
@@ -84,6 +84,9 @@ Implementations for **Gaussian** processes include:
 *   **`DDIMStep`**: Implements the popular Denoising Diffusion Implicit Models
     sampler. It can be deterministic (`stoch_coeff=0.0`) or stochastic
     (`stoch_coeff > 0.0`).
+*   **`AdjustedDDIMStep`**: An improved DDIM variant from
+    <https://arxiv.org/abs/2403.06807> that adjusts the update with an
+    estimated covariance term to reduce sampling error.
 *   **`SdeStep`**: A stochastic sampler based on discretizing the reverse-time
     Stochastic Differential Equation (SDE).
 *   **`VelocityStep`**: A sampler that operates using the velocity prediction
@@ -332,20 +335,23 @@ from hackable_diffusion.lib import manifolds
 from hackable_diffusion.lib.corruption.riemannian import RiemannianProcess
 from hackable_diffusion.lib.corruption.schedules import LinearRiemannianSchedule
 from hackable_diffusion.lib.sampling.riemannian_sampling import RiemannianFlowSamplerStep
+from hackable_diffusion.lib.sampling.time_scheduling import UniformTimeSchedule
 
 # 1. Define manifold and process
 manifold = manifolds.Sphere()
 process = RiemannianProcess(
+    
     manifold=manifold,
     schedule=LinearRiemannianSchedule(),
+, schedule=LinearRiemannianSchedule()
 )
 
 # 2. Configure Sampler Step
 stepper = RiemannianFlowSamplerStep(corruption_process=process)
 
 # 3. Create the sampler
 sampler = DiffusionSampler(
-    time_schedule=UniformTimeSchedule(), # or EDM
+    time_schedule=UniformTimeSchedule(),
     stepper=stepper,
     num_steps=50,
 )
diff --git a/docs/sitemap.md b/docs/sitemap.md
@@ -3,6 +3,6 @@
 *   [Architecture](./architecture.md)
 *   [Corruption Processes](./corruption.md)
 *   [Inference Function](./inference.md)
-*   [Diffusion Loss Functions](./loss.md)
+*   [Multimodal](./multimodal.md)
 *   [Sampling](./sampling.md)
-
+*   [Training](./training.md)
diff --git a/docs/training.md b/docs/training.md