Commit 31efda7

committed

docs: update speculative decoding guide to use LLM API / PyTorch backend

Replace the legacy TRT engine backend approach (trtllm-build, inflight_batcher_llm, fill_template.py) with the modern LLM API / PyTorch backend workflow. Update EAGLE section to use EAGLE 3 with Llama-3.1-8B-Instruct, add deprecation notice for MEDUSA (unsupported on PyTorch backend), and update Draft Model section to use DraftTargetDecodingConfig via model.yaml. Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

1 parent 6b48ce4 commit 31efda7Copy full SHA for 31efda7

1 file changed

Feature_Guide/Speculative_Decoding/TRT-LLM
- README.md

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 31efda7

File tree

0 commit comments