Skip to content

Commit 001f1d9

Browse files
committed
address comments
1 parent 29d3075 commit 001f1d9

12 files changed

Lines changed: 10 additions & 53 deletions

File tree

Feature_Guide/Speculative_Decoding/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,6 @@ may prove simpler than generating a summary for an article. [Spec-Bench](https:/
5454
shows the performance of different speculative decoding approaches on different tasks.
5555

5656
## Speculative Decoding with Triton Inference Server
57-
Triton Inference Server supports speculative decoding on different types of Triton backends. See what a Triton backend is [here](https://github.com/triton-inference-server/tensorrtllm_backend).
57+
Triton Inference Server supports speculative decoding on different types of Triton backends. See what a Triton backend is [here](https://github.com/triton-inference-server/backend).
5858
- Follow [here](TRT-LLM/README.md) to learn how Triton Inference Server supports speculative decoding with [TensorRT-LLM Backend](https://github.com/triton-inference-server/tensorrtllm_backend).
5959
- Follow [here](vLLM/README.md) to learn how Triton Inference Server supports speculative decoding with [vLLM Backend](https://github.com/triton-inference-server/vllm_backend).

Feature_Guide/Speculative_Decoding/TRT-LLM/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ python3 /tensorrtllm_client/inflight_batcher_llm_client.py --request-output-len
202202
> ...
203203
> ```
204204
205-
2. The [generate endpoint](https://github.com/triton-inference-server/tensorrtllm_backend/tree/release/0.5.0#query-the-server-with-the-triton-generate-endpoint).
205+
2. The [generate endpoint](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/protocol/extension_generate.html).
206206
207207
```bash
208208
curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is ML?", "max_tokens": 50, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2}'

Feature_Guide/Speculative_Decoding/vLLM/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ docker run --gpus all -it --net=host --rm -p 8001:8001 --shm-size=1G \
8888

8989
### Send Inference Requests
9090

91-
Let's send an inference request to the [generate endpoint](https://github.com/triton-inference-server/tensorrtllm_backend/tree/release/0.5.0#query-the-server-with-the-triton-generate-endpoint).
91+
Let's send an inference request to the [generate endpoint](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/protocol/extension_generate.html).
9292

9393
```bash
9494
curl -X POST localhost:8000/v2/models/eagle_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, "temperature": 0}}' | jq
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
{
22
"model": "/hf-models/Meta-Llama-3-8B-Instruct"
3-
}
3+
}

Feature_Guide/Speculative_Decoding/vLLM/model_repository/base_model/config.pbtxt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,4 @@ instance_group [
3434
count: 1
3535
kind: KIND_MODEL
3636
}
37-
]
37+
]

Feature_Guide/Speculative_Decoding/vLLM/model_repository/eagle_model copy/1/model.json

Lines changed: 0 additions & 6 deletions
This file was deleted.

Feature_Guide/Speculative_Decoding/vLLM/model_repository/eagle_model copy/config.pbtxt

Lines changed: 0 additions & 37 deletions
This file was deleted.

Feature_Guide/Speculative_Decoding/vLLM/model_repository/eagle_model/1/model.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
"speculative_model": "/hf-models/EAGLE-LLaMA3-Instruct-8B",
44
"speculative_draft_tensor_parallel_size": 1,
55
"num_speculative_tokens": 5
6-
}
6+
}

Feature_Guide/Speculative_Decoding/vLLM/model_repository/eagle_model/config.pbtxt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,4 @@ instance_group [
3434
count: 1
3535
kind: KIND_MODEL
3636
}
37-
]
37+
]

Feature_Guide/Speculative_Decoding/vLLM/model_repository/opt_model/1/model.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
"speculative_model": "facebook/opt-125m",
44
"tensor_parallel_size": 1,
55
"num_speculative_tokens": 5
6-
}
6+
}

0 commit comments

Comments
 (0)