Skip to content

Add Rust ABI support for WASM subgraphs#6462

Open
cargopete wants to merge 14 commits intographprotocol:masterfrom
cargopete:rust-abi-support
Open

Add Rust ABI support for WASM subgraphs#6462
cargopete wants to merge 14 commits intographprotocol:masterfrom
cargopete:rust-abi-support

Conversation

@cargopete
Copy link
Copy Markdown
Contributor

@cargopete cargopete commented Mar 28, 2026

Motivation

The AS ABI has structural problems that cannot be fixed without breaking existing subgraphs:

  1. No closures / stringly-typed bindings. AscPtr<T> encodes type information in Rust phantom generics with no compile-time enforcement on the guest side.
  2. Broken nullable handling. Nullability is represented inconsistently across types (AscNullableString, sentinel pointers, separate flags), leading to per-type special-casing in the host.
  3. Managed heap coupling. The host must call into the AS runtime allocator (__alloc, __new, __pin) to pass data to the mapping. An AS compiler change can silently break graph-node.
  4. Opaque errors. Deserialization failures surface as generic failed to read AscPtr traps with no field or type context.
  5. No versioning. apiVersion describes the AS class layout, not a wire protocol. Adding a host function requires ad-hoc compatibility code scattered across asc_abi/.

Rust targeting wasm32-unknown-unknown is already production-proven in the Substreams ecosystem with the same wasmtime runtime family and the same class of host-imported functions used here. There is no novel compiler or runtime risk.

What this PR does

Adds a parallel rust_abi/ serialization layer (~1,450 LOC) that sits next to asc_abi/ and is selected by manifest. The runtime — HostExports, store, chain ingestion, gas accounting — is unchanged.

The protocol in one sentence: the host serializes the trigger to a flat byte buffer, calls allocate(len) on the mapping, copies the bytes in, invokes handler(ptr, len), and calls reset_arena() after. The mapping owns its heap; the host never touches an AS-style allocator.

Spec: docs/rust-abi-spec.md in this PR is the authoritative protocol reference: wire formats, TLV tag table, trigger layouts, host function signatures, versioning rules, and maintenance model.

Implementation summary

New files (runtime/wasm/src/rust_abi/):

  • mod.rsMappingLanguage enum; from_kind("wasm/rust") parser
  • types.rsToRustWasm/FromRustWasm traits; ValueTag enum; impls for all scalar types
  • entity.rs — TLV serialize_entity / deserialize_entity_data covering all graph-core::Value variants including Timestamp
  • trigger.rsToRustBytes trait; fixed-layout RustLogTrigger, RustCallTrigger, RustBlockTrigger
  • host.rs — wasmtime linker wrappers for store_set/get/remove, crypto_keccak256, log_log, data_source_address/network/create, ipfs_cat, ethereum_call, abort; is_rust_module() namespace detection

Modified files:

  • runtime/wasm/src/mapping.rs — skip parity_wasm gas-injection pipeline for Rust modules (parity_wasm cannot parse bulk-memory opcodes emitted by current Rust toolchains); configure wasmtime fuel metering instead
  • runtime/wasm/src/module/mod.rsbuild_linker() dispatches on MappingLanguage; Rust path skips id_of_type, skips _start
  • runtime/wasm/src/module/instance.rshandle_trigger_rust(): allocate → write → call → reset_arena; invoke_handler_rust() with trap/timeout/reorg/out-of-fuel handling
  • chain/ethereum/src/trigger.rsToRustBytes for all three Ethereum trigger types
  • chain/near/src/trigger.rs — stub ToRustBytes (unimplemented)
  • core/src/subgraph/instance_manager.rsToRustBytes trait bound propagation
  • manifest parsing — wasm/rust recognised as a distinct mapping.kind

Performance

Benchmarks run against a Rust ERC20 Transfer indexer compiled with the Graphite SDK.

Binary size (wasm32-unknown-unknown --release):

Variant AS Rust Ratio
Raw release 32.8 KB 104.7 KB 3.2×
wasm-opt -Oz 18.8 KB 75.4 KB 4.0×

The size delta is dominated by num_bigint and dlmalloc being statically linked into the Rust binary — functions that AS delegates to host calls. This is reducible; it is not a fundamental property of the ABI.

Handler throughput (Rust, isolation benchmark):

A wasmtime harness links no-op host stubs and loops allocate → memcpy → handle_transfer → reset_arena with an identical 212-byte Transfer payload. No gas metering, no store I/O.

Run Iterations Throughput Per event
1 500,000 615k ev/s 1,625 ns
2 500,000 613k ev/s 1,631 ns
3 2,000,000 621k ev/s 1,609 ns

Steady state: ~617k Transfer events/sec, ~1.62 µs/event on Apple Silicon under wasmtime 29, no fuel metering.

Honest caveat: we do not benchmark the AS handler in isolation because invoking it outside graph-node requires reconstructing the entire asc_abi/ encoder (AscPtr object graph, managed-class headers, type IDs). The right end-to-end comparison is deploying both subgraphs against the same chain head and reading subgraph_indexing_handler_execution_time from graph-node's Prometheus metrics. The Rust side of that comparison is covered by the live integration test below.

Testing

Unit tests (14): Entity TLV round-trips for every Value variant; each trigger type; BigInt/BigDecimal/String/Bytes/Address primitives.

WASM integration test (tests/integration/tests/wasm_handler.rs): loads the compiled Rust ERC20 WASM into wasmtime, serializes a RustLogTrigger using the exact production binary format, invokes handle_transfer(ptr, len), and asserts that the resulting store_set call carries the expected entity fields (from, to, value, blockNumber, timestamp, transactionHash, id).

Live mainnet test (scripts/live-test.sh): deployed the ERC20 subgraph to a running fork of this graph-node, indexed real USDC Transfer events from Ethereum mainnet starting at block 24756400, and verified correct GraphQL query results for all entity fields.

Maintenance model

HostExports is already language-agnostic. Adding a new host function is:

  1. One impl in HostExports (shared, as today).
  2. One thin AS wrapper (existing pattern).
  3. One thin Rust wrapper in rust_abi/host.rs — read (ptr, len) args, deserialize, call HostExports, write output back. Typically 20–40 lines, no heap manipulation, no AscPtr<T> juggling.

The serialization layer (rust_abi/) is ~1,450 lines total and touches nothing outside the runtime/wasm crate boundary.

Out of scope (follow-ups)

  • NEAR ToRustBytes — stub only; requires a per-chain serialization impl
  • Offchain/subgraph triggers — currently serialized as empty bytes; needs a design
  • Shared graphite-abi crateValueTag constants are currently duplicated between graph-node and the SDK; a no_std types crate would eliminate drift risk

- Add Ethereum ToRustBytes impl for Log/Call/Block triggers
- Add NEAR ToRustBytes stub (unimplemented, Ethereum-only for now)
- Propagate ToRustBytes trait bounds through instance_manager
- Skip parity_wasm gas injection for Rust modules (can't parse bulk-memory opcodes)
- Skip AS-specific exports (id_of_type, _start) for Rust modules
- Add handle_trigger_rust and invoke_handler_rust calling convention
- Add Rust host function wrappers in module/context.rs
@lutter
Copy link
Copy Markdown
Collaborator

lutter commented Mar 30, 2026

Can you explain a bit what the motivation for this is? Another ABI is a huge commitment in terms of maintenance etc. If another ABI is called for, it would be good to also see if we can avoid some of the mistakes the current ASC ABI makes.

@cargopete
Copy link
Copy Markdown
Contributor Author

@lutter ▎ Thanks for the feedback!

Motivation
The core issue is that AssemblyScript's pain points aren't cosmetic — they're structural. Broken nullable handling (type narrowing only works on locals, not property accesses), no closures (.map()/.filter() crash the compiler), opaque compiler errors, and a debugging story that amounts to "comment everything out and uncomment line by line." These aren't fixable in the SDK layer; they're baked into the language.

Meanwhile The Graph has already validated Rust→WASM as a first-class path with Substreams. A Rust mapping language is the natural extension of that investment to the subgraph layer — same WASM runtime, same graph-node infrastructure, just a different serialization boundary.

Maintenance argument
The maintenance surface is smaller than it might look. HostExports is already fully language-agnostic — it operates on native Rust types (String, HashMap<String, Value>, Vec). The AS coupling lives entirely in the serialization layer. rust_abi/ is a parallel serialization layer (~1,450 LOC), not a parallel runtime. Adding a new host function means implementing it once in HostExports and adding two thin wrappers — the existing AS one and the new Rust one. That's the full maintenance delta.

What we tried to learn from the ASC ABI's mistakes
The ASC ABI has a few known rough edges: AscPtr is stringly-typed in places, the managed heap requires graph-node to reach into AS memory and allocate on its behalf, BigInt endianness has been a source of bugs, and there's no clean versioning story.
The Rust ABI was designed to avoid these:

  • Simple ptr+len calling convention — no managed heap, no allocate-on-behalf-of-the-runtime. The Rust module owns a bump allocator; graph-node just writes into memory the module already allocated and resets the arena after each handler call.
  • Explicit TLV serialization — tagged binary format with a fixed, documented value tag table. No implicit type coercions, no silent endianness bugs (the BigInt LE fix we hit during live testing is already baked into the spec).
  • Language detection via manifest — language: wasm/rust in subgraph.yaml. Clean dispatch, no heuristics.
  • Versioned from day one — apiVersion: 0.0.1 in the manifest gives a migration path if the ABI needs to evolve.

Happy to discuss any of these design choices further, or to write a formal docs/rust-abi-spec.md or a forum post if that would help the review.

P1: Replace unimplemented!() panic in NearTrigger::to_rust_bytes() with Vec::new()
P2: Propagate PossibleReorg errors from ipfs_cat host fn instead of swallowing them
P2: Fix useless .into() conversions on anyhow::Error (clippy useless_conversion)
P3: Apply rustfmt to all changed files
P3: Clarify comment on byte-scan Rust detection heuristic in ValidModule::new()

Build: cargo build clean
Tests: 14 rust_abi + existing wasm tests all pass
Lint: cargo clippy --deny warnings clean
Format: cargo fmt --check clean
@cargopete
Copy link
Copy Markdown
Contributor Author

Review Summary

Findings

# Severity Finding Resolution
1 P1 NearTrigger::to_rust_bytes() calls unimplemented!() which panics at runtime if NEAR chain code is ever used with a Rust module dispatch path Fixed — replaced with Vec::new() return and explanatory comment
2 P2 ipfs_cat host function silently swallowed all errors (including PossibleReorg) by mapping them to Ok(u32::MAX), preventing block retry on reorg FixedPossibleReorg errors now set possible_reorg = true and propagate as traps
3 P2 Two clippy useless_conversion warnings: .into() on anyhow::Error in runtime_adapter.rs and context.rs Fixed — removed redundant .into() calls
4 P3 Rustfmt diffs across all changed files (long lines, trailing spaces in alignment comments) Fixed — ran cargo fmt across all changed crates
5 P3 Byte-scan heuristic for Rust module detection (raw_module.windows(8).any(...)) lacked explanation of why it's safe despite being imprecise Fixed — clarified comment explaining false-positive consequences are caught downstream
6 P3 MappingLanguage::from_kind() defined but never called Dismissed — unused dead code in a new module; not worth churn, can be wired to manifest parsing in a follow-up
7 P3 reset_arena call silently ignores errors Dismissed — intentional; arena reset is best-effort cleanup after handler completion
8 Info log_log mapped to HostExportError::Deterministic — confirmed correct since HostExports::log_log returns Result<(), DeterministicHostError> No action

Verification

  • Build: cargo build -p graph-runtime-wasm -p graph-chain-ethereum -p graph-chain-near -p graph-core — clean
  • Tests: 14 rust_abi unit tests + existing wasm host_exports tests — all pass
  • Lint: cargo clippy --deny warnings — clean
  • Format: cargo fmt --check — clean

Commit

86a07a8d16ef4efdbb857489d16942b890fbc6e1 — fix: resolve code review findings

@cargopete cargopete marked this pull request as ready for review April 6, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants