Skip to content

feat: enable rec fast sampler for llm beam search.#1224

Open
RobbieLeung wants to merge 1 commit intojd-opensource:mainfrom
RobbieLeung:feat/beam_sample_kernel
Open

feat: enable rec fast sampler for llm beam search.#1224
RobbieLeung wants to merge 1 commit intojd-opensource:mainfrom
RobbieLeung:feat/beam_sample_kernel

Conversation

@RobbieLeung
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a CUDA-optimized fast path for the RecSampler and integrates it into the LLM beam search logic. It adds two new configuration options, enable_block_copy_kernel and enable_rec_fast_sampler, and ensures their propagation across the distributed runtime and worker processes. The implementation includes a new CUDA kernel for efficient top-k post-processing and refactors the RecSampler to support instance-scoped fast path toggling. Review feedback identifies several violations of the repository style guide, including the use of plain int instead of fixed-width integers, the use of auto for simple types, and the presence of relative include paths.

Comment thread xllm/core/distributed_runtime/spawn_worker_server/spawn_worker_server_process.cpp Outdated
Comment thread xllm/core/distributed_runtime/worker_server.cpp Outdated
Comment thread xllm/core/distributed_runtime/worker_server.cpp Outdated
Comment thread xllm/core/kernels/cuda/topk_postprocess.cu Outdated
Comment thread xllm/core/kernels/cuda/topk_postprocess.cu Outdated
Comment thread xllm/core/kernels/cuda/topk_postprocess.cu Outdated
Comment thread xllm/core/kernels/cuda/topk_postprocess.cu Outdated
DragonFive
DragonFive previously approved these changes Apr 8, 2026
Copy link
Copy Markdown
Collaborator

@DragonFive DragonFive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RobbieLeung
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fast-path sampling mechanism for LLM beam search, primarily targeting CUDA environments. Key changes include the refactoring of RecSampler to support instance-scoped fast-path enablement, the addition of a specialized CUDA kernel for top-k post-processing, and the integration of these components into the LLMWorkerImpl. Additionally, the PR includes new unit tests and improved hardware-specific logging. Feedback on the implementation highlights a critical need to extend the CUDA kernel's type support; currently, it uses a dispatch macro limited to float and double, which should be expanded to include Half and BFloat16 to support standard production models.

Comment thread xllm/core/kernels/cuda/topk_postprocess.cu
@LMX-xin
Copy link
Copy Markdown
Collaborator

LMX-xin commented Apr 15, 2026

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants