feat: add configurable decode ACL-graph fallback threshold. by DongheJin · Pull Request #1233 · jd-opensource/xllm

DongheJin · 2026-04-08T13:11:29Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a new global flag, acl_graph_decode_batch_size_limit, to manage the maximum batch size for ACL graph decoding. If the actual decode batch size surpasses this limit, the system will automatically revert to eager mode to prevent Out-Of-Memory (OOM) issues. This change necessitated refactoring various decoder layers (GLM4, GLM4-MoE, Qwen3, Qwen3-MoE) to support distinct parameter sets and execution nodes for both graph and eager modes, along with updates to their forward and build_node_variant_pack methods for dynamic mode selection. A new test case was also added to validate this fallback mechanism. The review comments suggest improving the handling of the acl_graph_decode_batch_size_limit flag by either documenting its std::max(1, ...) behavior or treating non-positive inputs as errors. Additionally, the manual copying and overriding of enableAclGraphPagedAttention for eager decode parameters across multiple layers is identified as repetitive and error-prone, recommending encapsulation through copy constructors, factory methods, or a shared utility for better safety and maintainability.

gemini-code-assist · 2026-04-08T13:16:13Z

+  const uint32_t decode_batch_size_limit =
+      std::max(1, FLAGS_acl_graph_decode_batch_size_limit);


The use of std::max(1, ...) is a good safety measure, but it should be explicitly documented or handled as a configuration error if the user provides a non-positive threshold, as this silently overrides the user's intent.

gemini-code-assist · 2026-04-08T13:16:13Z

+  decode_eager_param_ = decode_graph_param_;
+  decode_eager_param_.enableAclGraphPagedAttention = false;


Manual assignment of decode_eager_param_ from decode_graph_param_ followed by a specific member override is error-prone. Consider adding a copy constructor or a dedicated factory method to ChatglmLayerParam to handle this initialization safely.

gemini-code-assist · 2026-04-08T13:16:13Z

+  decode_eager_param_ = decode_graph_param_;
+  decode_eager_param_.enableAclGraphPagedAttention = false;


Similar to other decoder implementations, manual copying and modification of decode_eager_param_ is fragile. Please encapsulate this logic within the parameter struct or a factory method to ensure consistency.

gemini-code-assist · 2026-04-08T13:16:13Z

+  decode_eager_param_ = decode_graph_param_;
+  decode_eager_param_.enableAclGraphPagedAttention = false;


The manual override of enableAclGraphPagedAttention after copying the parameter struct is prone to maintenance issues. Encapsulate this initialization logic to prevent future regressions.

gemini-code-assist · 2026-04-08T13:16:13Z

+  decode_eager_param_ = decode_graph_param_;
+  decode_eager_param_.enableAclGraphPagedAttention = false;


The manual initialization of decode_eager_param_ by copying and overriding a flag is repetitive across different layer implementations. Consider refactoring this into a shared utility or a constructor-based approach.

feat: add configurable decode ACL-graph fallback threshold.

e285cb5

DongheJin requested review from JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners April 8, 2026 13:11

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add configurable decode ACL-graph fallback threshold.#1233

feat: add configurable decode ACL-graph fallback threshold.#1233
DongheJin wants to merge 1 commit intojd-opensource:mainfrom
DongheJin:bugfix/aclgraph_oom_main

DongheJin commented Apr 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const uint32_t decode_batch_size_limit =
		std::max(1, FLAGS_acl_graph_decode_batch_size_limit);

		decode_eager_param_ = decode_graph_param_;
		decode_eager_param_.enableAclGraphPagedAttention = false;

Conversation

DongheJin commented Apr 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant