Commit 4fe2b90

authored

docs: update TRT-LLM speculative decoding guide to LLM API / PyTorch backend (#150)

* docs: update speculative decoding guide to use LLM API / PyTorch backend Replace the legacy TRT engine backend approach (trtllm-build, inflight_batcher_llm, fill_template.py) with the modern LLM API / PyTorch backend workflow. Update EAGLE section to use EAGLE 3 with Llama-3.1-8B-Instruct, add deprecation notice for MEDUSA (unsupported on PyTorch backend), and update Draft Model section to use DraftTargetDecodingConfig via model.yaml. Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> * fix: indent numbered list content by 3 spaces so nested paragraphs and code blocks render correctly in GFM Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> * fix: standardize EAGLE 3 spelling to EAGLE-3 throughout the guide Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> * docs: address whoisj and yinggeh review comments - Standardize EAGLE-3 naming (hyphen) throughout — was inconsistent - Indent numbered list content 3 spaces so items render as a single list - Fix launch_triton_server.py URL: was tensorrtllm_backend/scripts (404), now NVIDIA/TensorRT-LLM/triton_backend/scripts (correct repo) - Fix engine backend archive links: point to tensorrtllm_backend#tensorrt-engine-backend instead of deleted archive file Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> --------- Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

1 parent 6b48ce4 commit 4fe2b90Copy full SHA for 4fe2b90

1 file changed

Feature_Guide/Speculative_Decoding/TRT-LLM
- README.md

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 4fe2b90

File tree

0 commit comments