feat: enable rec fast sampler for llm beam search.#1224
feat: enable rec fast sampler for llm beam search.#1224RobbieLeung wants to merge 1 commit intojd-opensource:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a CUDA-optimized fast path for the RecSampler and integrates it into the LLM beam search logic. It adds two new configuration options, enable_block_copy_kernel and enable_rec_fast_sampler, and ensures their propagation across the distributed runtime and worker processes. The implementation includes a new CUDA kernel for efficient top-k post-processing and refactors the RecSampler to support instance-scoped fast path toggling. Review feedback identifies several violations of the repository style guide, including the use of plain int instead of fixed-width integers, the use of auto for simple types, and the presence of relative include paths.
8b186cb to
d6936d4
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a fast-path sampling mechanism for LLM beam search, primarily targeting CUDA environments. Key changes include the refactoring of RecSampler to support instance-scoped fast-path enablement, the addition of a specialized CUDA kernel for top-k post-processing, and the integration of these components into the LLMWorkerImpl. Additionally, the PR includes new unit tests and improved hardware-specific logging. Feedback on the implementation highlights a critical need to extend the CUDA kernel's type support; currently, it uses a dispatch macro limited to float and double, which should be expanded to include Half and BFloat16 to support standard production models.
|
LGTM |
No description provided.