Skip to content

Commit 29a12cf

Browse files
authored
cherry-pick speculative decoding related PR #133 and #135 (#136)
* docs: move Constrained_Decoding and Function_Calling to Feature_Guide | rm AI_Agents_Guide folder (#135) * docs: Add EAGLE/SpS Speculative Decoding support with vLLM (#133)
1 parent f6fd598 commit 29a12cf

20 files changed

Lines changed: 356 additions & 79 deletions

File tree

AI_Agents_Guide/README.md

Lines changed: 0 additions & 62 deletions
This file was deleted.
File renamed without changes.

AI_Agents_Guide/Constrained_Decoding/artifacts/client.py renamed to Feature_Guide/Constrained_Decoding/artifacts/client.py

File renamed without changes.

AI_Agents_Guide/Constrained_Decoding/artifacts/client_utils.py renamed to Feature_Guide/Constrained_Decoding/artifacts/client_utils.py

File renamed without changes.

AI_Agents_Guide/Constrained_Decoding/artifacts/utils.py renamed to Feature_Guide/Constrained_Decoding/artifacts/utils.py

File renamed without changes.
File renamed without changes.
File renamed without changes.

AI_Agents_Guide/Function_Calling/artifacts/client_utils.py renamed to Feature_Guide/Function_Calling/artifacts/client_utils.py

File renamed without changes.

AI_Agents_Guide/Function_Calling/artifacts/system_prompt_schema.yml renamed to Feature_Guide/Function_Calling/artifacts/system_prompt_schema.yml

File renamed without changes.

Feature_Guide/Speculative_Decoding/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,4 +54,6 @@ may prove simpler than generating a summary for an article. [Spec-Bench](https:/
5454
shows the performance of different speculative decoding approaches on different tasks.
5555

5656
## Speculative Decoding with Triton Inference Server
57-
Follow [here](TRT-LLM/README.md) to learn how Triton Inference Server supports speculative decoding with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM).
57+
Triton Inference Server supports speculative decoding on different types of Triton backends. See what a Triton backend is [here](https://github.com/triton-inference-server/backend).
58+
- Follow [here](TRT-LLM/README.md) to learn how Triton Inference Server supports speculative decoding with [TensorRT-LLM Backend](https://github.com/triton-inference-server/tensorrtllm_backend).
59+
- Follow [here](vLLM/README.md) to learn how Triton Inference Server supports speculative decoding with [vLLM Backend](https://github.com/triton-inference-server/vllm_backend).

0 commit comments

Comments
 (0)