Commit 31efda7
committed
docs: update speculative decoding guide to use LLM API / PyTorch backend
Replace the legacy TRT engine backend approach (trtllm-build, inflight_batcher_llm,
fill_template.py) with the modern LLM API / PyTorch backend workflow. Update EAGLE
section to use EAGLE 3 with Llama-3.1-8B-Instruct, add deprecation notice for
MEDUSA (unsupported on PyTorch backend), and update Draft Model section to use
DraftTargetDecodingConfig via model.yaml.
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>1 parent 6b48ce4 commit 31efda7
1 file changed
Lines changed: 95 additions & 287 deletions
0 commit comments