bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta for qwen3.5. by JC-ut0 · Pull Request #1259 · jd-opensource/xllm

JC-ut0 · 2026-04-10T13:21:06Z

This PR makes the following changes to the xLLM inference engine:

Add mamba_ssm_dtype config field — A new mamba_ssm_dtype string property is added to ModelArgs , allowing the SSM (State Space Model) cache dtype to be specified independently from the model's primary dtype.
Use mamba_ssm_dtype for KV cache capacity estimation — In llm_engine.cpp , the estimate_kv_cache_capacity() function now uses the mamba_ssm_dtype -derived byte size for calculating the SSM slot size, instead of always using the model dtype size.
Fix g/beta tensor layout in GDN gating — In qwen3_gated_delta_net_base.cpp , after calling fused_gdn_gating() , the g and beta tensors are permuted from [seq_len, batch, heads] to [batch, seq_len, heads] layout via .permute({1, 0, 2}).contiguous() .
Load mamba_ssm_dtype from model config — The qwen3_5.h model registration macro now loads mamba_ssm_dtype from the JSON config (with text_config.mamba_ssm_dtype fallback) using LOAD_ARG_TEXT_OR_ROOT .

gemini-code-assist

Code Review

This pull request implements the LLMEngine and WorkerImpl components, providing the core infrastructure for distributed inference, KV cache management, and model lifecycle operations. It adds support for advanced architectures like DeepSeek v3 and Qwen3, along with optimizations such as XTensor memory management and rolling weight loading. The review feedback suggests enhancing type safety by using enums instead of string literals for model arguments. Additionally, the implementation should be refined to adhere to the project's style guide, particularly concerning fixed-width integer usage, constant naming conventions, and the appropriate use of the auto keyword.

This reverts commit b04d3ef.

JC-ut0 · 2026-04-14T11:37:04Z

/gemini-review

JC-ut0 · 2026-04-14T12:13:44Z

/gemini-review

yingxudeng

Great catch.

yingxudeng · 2026-04-16T02:53:04Z

CUDA 还没编译完，但是非NPU的很多已经编译通过了。这个pr已经排队两天了，先合并吧，应该cuda应该编译没问题

JC-ut0 requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners April 10, 2026 13:21

gemini-code-assist bot reviewed Apr 10, 2026

View reviewed changes

Comment thread xllm/models/llm/qwen3_5.h Outdated

JC-ut0 force-pushed the qwen3_5_ssmcache branch from d76d019 to e42f91b Compare April 10, 2026 13:44

fix qwen3.5 ssm_cache dtype to fp32

25ef04b

JC-ut0 force-pushed the qwen3_5_ssmcache branch from e42f91b to 25ef04b Compare April 10, 2026 13:47

fix qwen3.5 ssm_cache

b04d3ef

yingxudeng changed the title ~~fix qwen3.5 ssm_cache dtype to fp32~~ feat: init ssm_cache by config ssm_cache_type and unify compute precision to fp32. Apr 11, 2026

XuZhang99 reviewed Apr 11, 2026

View reviewed changes

Comment thread xllm/core/distributed_runtime/llm_engine.cpp Outdated

XuZhang99 changed the title ~~feat: init ssm_cache by config ssm_cache_type and unify compute precision to fp32.~~ bugfix: init ssm_cache by config ssm_cache_type and unify compute precision to fp32. Apr 11, 2026

yq33victor reviewed Apr 11, 2026

View reviewed changes

Comment thread xllm/core/layers/npu_torch/qwen3_gated_delta_net_base.cpp Outdated

use get_dtype_size from util

724d84e

XuZhang99 mentioned this pull request Apr 12, 2026

[Question]: 一些关于 ict_final 的问题 #1251

Closed

JC-ut0 added 2 commits April 14, 2026 19:26

Fix dimension order of g and beta from fused_gdn_gating in decode path

1ae2626

Revert "fix qwen3.5 ssm_cache"

eb51399

This reverts commit b04d3ef.

refactor the code

9507fc5

JC-ut0 changed the title ~~bugfix: init ssm_cache by config ssm_cache_type and unify compute precision to fp32.~~ bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta Apr 14, 2026

JimHsiung previously approved these changes Apr 14, 2026

View reviewed changes

yingxudeng previously approved these changes Apr 14, 2026

View reviewed changes

yingxudeng changed the title ~~bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta~~ bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta. Apr 14, 2026

XuZhang99 previously approved these changes Apr 14, 2026

View reviewed changes

JC-ut0 changed the title ~~bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta.~~ bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta for qwen3.5 Apr 14, 2026

yingxudeng changed the title ~~bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta for qwen3.5~~ bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta for qwen3.5. Apr 14, 2026

format

9423f68

JC-ut0 dismissed stale reviews from XuZhang99, yingxudeng, and JimHsiung via 9423f68 April 15, 2026 02:18

Merge branch 'main' into qwen3_5_ssmcache

e40d1fe

yingxudeng approved these changes Apr 15, 2026

View reviewed changes

zhang-minchao approved these changes Apr 15, 2026

View reviewed changes

yingxudeng mentioned this pull request Apr 15, 2026

perf: use fused gdn gating for qwen3.5 prefill. #1285

Closed

DongheJin approved these changes Apr 16, 2026

View reviewed changes

yingxudeng merged commit 32412da into jd-opensource:main Apr 16, 2026
16 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta for qwen3.5.#1259

bugfix: init ssm_cache by config ssm_cache_type and fix dimension order of g & beta for qwen3.5.#1259
yingxudeng merged 8 commits intojd-opensource:mainfrom
JC-ut0:qwen3_5_ssmcache

JC-ut0 commented Apr 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JC-ut0 commented Apr 14, 2026

Uh oh!

JC-ut0 commented Apr 14, 2026

Uh oh!

yingxudeng left a comment

Uh oh!

yingxudeng commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

JC-ut0 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JC-ut0 commented Apr 14, 2026

Uh oh!

JC-ut0 commented Apr 14, 2026

Uh oh!

yingxudeng left a comment

Choose a reason for hiding this comment

Uh oh!

yingxudeng commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

JC-ut0 commented Apr 10, 2026 •

edited

Loading