Commit 4fe2b90
authored
docs: update TRT-LLM speculative decoding guide to LLM API / PyTorch backend (#150)
* docs: update speculative decoding guide to use LLM API / PyTorch backend
Replace the legacy TRT engine backend approach (trtllm-build, inflight_batcher_llm,
fill_template.py) with the modern LLM API / PyTorch backend workflow. Update EAGLE
section to use EAGLE 3 with Llama-3.1-8B-Instruct, add deprecation notice for
MEDUSA (unsupported on PyTorch backend), and update Draft Model section to use
DraftTargetDecodingConfig via model.yaml.
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
* fix: indent numbered list content by 3 spaces so nested paragraphs and code blocks render correctly in GFM
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
* fix: standardize EAGLE 3 spelling to EAGLE-3 throughout the guide
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
* docs: address whoisj and yinggeh review comments
- Standardize EAGLE-3 naming (hyphen) throughout — was inconsistent
- Indent numbered list content 3 spaces so items render as a single list
- Fix launch_triton_server.py URL: was tensorrtllm_backend/scripts (404),
now NVIDIA/TensorRT-LLM/triton_backend/scripts (correct repo)
- Fix engine backend archive links: point to
tensorrtllm_backend#tensorrt-engine-backend instead of deleted archive file
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
---------
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>1 parent 6b48ce4 commit 4fe2b90
1 file changed
Lines changed: 118 additions & 310 deletions
0 commit comments