docs: update TRT-LLM speculative decoding guide to LLM API / PyTorch backend#150
Merged
whoisj merged 4 commits intotriton-inference-server:mainfrom Apr 21, 2026
Conversation
Replace the legacy TRT engine backend approach (trtllm-build, inflight_batcher_llm, fill_template.py) with the modern LLM API / PyTorch backend workflow. Update EAGLE section to use EAGLE 3 with Llama-3.1-8B-Instruct, add deprecation notice for MEDUSA (unsupported on PyTorch backend), and update Draft Model section to use DraftTargetDecodingConfig via model.yaml. Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Contributor
Author
|
Tested on H100x2 Query the server |
whoisj
requested changes
Apr 15, 2026
…d code blocks render correctly in GFM Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Contributor
|
LGTM, will approve if @yinggeh doesn't have any blockers. |
yinggeh
requested changes
Apr 20, 2026
- Standardize EAGLE-3 naming (hyphen) throughout — was inconsistent - Indent numbered list content 3 spaces so items render as a single list - Fix launch_triton_server.py URL: was tensorrtllm_backend/scripts (404), now NVIDIA/TensorRT-LLM/triton_backend/scripts (correct repo) - Fix engine backend archive links: point to tensorrtllm_backend#tensorrt-engine-backend instead of deleted archive file Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
yinggeh
approved these changes
Apr 21, 2026
Contributor
|
Pre-commit error can be safely ignored |
whoisj
approved these changes
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the legacy TRT engine backend workflow with the modern LLM API / PyTorch backend.