Skip to content

Commit 3ce91be

Browse files
authored
docs: Fix broken links in documents (#8636)
1 parent 265d5cb commit 3ce91be

25 files changed

Lines changed: 104 additions & 866 deletions

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,8 @@ Major features include:
5454
frameworks](https://github.com/triton-inference-server/fil_backend)
5555
- [Concurrent model
5656
execution](docs/user_guide/architecture.md#concurrent-model-execution)
57-
- [Dynamic batching](docs/user_guide/model_configuration.md#dynamic-batcher)
58-
- [Sequence batching](docs/user_guide/model_configuration.md#sequence-batcher) and
57+
- [Dynamic batching](docs/user_guide/batcher.md#dynamic-batcher)
58+
- [Sequence batching](docs/user_guide/batcher.md#sequence-batcher) and
5959
[implicit state management](docs/user_guide/architecture.md#implicit-state-management)
6060
for stateful models
6161
- Provides [Backend API](https://github.com/triton-inference-server/backend) that
@@ -70,8 +70,8 @@ Major features include:
7070
protocols](docs/customization_guide/inference_protocols.md) based on the community
7171
developed [KServe
7272
protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
73-
- A [C API](docs/customization_guide/inference_protocols.md#in-process-triton-server-api) and
74-
[Java API](docs/customization_guide/inference_protocols.md#java-bindings-for-in-process-triton-server-api)
73+
- A [C API](docs/customization_guide/inprocess_c_api.md) and
74+
[Java API](docs/customization_guide/inprocess_java_api.md)
7575
allow Triton to link directly into your application for edge and other in-process use cases
7676
- [Metrics](docs/user_guide/metrics.md) indicating GPU utilization, server
7777
throughput, server latency, and more

docs/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2018-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -111,17 +111,17 @@ The Model Configuration ModelOptimizationPolicy property is used to specify opti
111111

112112
#### Scheduling and Batching
113113

114-
Triton supports batching individual inference requests to improve compute resource utilization. This is extremely important as individual requests typically will not saturate GPU resources thus not leveraging the parallelism provided by GPUs to its extent. Learn more about Triton's [Batcher and Scheduler](user_guide/model_configuration.md#scheduling-and-batching).
115-
- [Default Scheduler - Non-Batching](user_guide/model_configuration.md#default-scheduler)
116-
- [Dynamic Batcher](user_guide/model_configuration.md#dynamic-batcher)
114+
Triton supports batching individual inference requests to improve compute resource utilization. This is extremely important as individual requests typically will not saturate GPU resources thus not leveraging the parallelism provided by GPUs to its extent. Learn more about Triton's [Batcher and Scheduler](#scheduling-and-batching).
115+
- [Default Scheduler - Non-Batching](user_guide/scheduler.md#default-scheduler)
116+
- [Dynamic Batcher](user_guide/batcher.md#dynamic-batcher)
117117
- [How to Configure Dynamic Batcher](user_guide/model_configuration.md#recommended-configuration-process)
118-
- [Delayed Batching](user_guide/model_configuration.md#delayed-batching)
118+
- [Delayed Batching](user_guide/batcher.md#delayed-batching)
119119
- [Preferred Batch Size](user_guide/model_configuration.md#preferred-batch-sizes)
120120
- [Preserving Request Ordering](user_guide/model_configuration.md#preserve-ordering)
121121
- [Priority Levels](user_guide/model_configuration.md#priority-levels)
122122
- [Queuing Policies](user_guide/model_configuration.md#queue-policy)
123123
- [Ragged Batching](user_guide/ragged_batching.md)
124-
- [Sequence Batcher](user_guide/model_configuration.md#sequence-batcher)
124+
- [Sequence Batcher](user_guide/batcher.md#sequence-batcher)
125125
- [Stateful Models](user_guide/model_execution.md#stateful-models)
126126
- [Control Inputs](user_guide/model_execution.md#control-inputs)
127127
- [Implicit State - Stateful Inference Using a Stateless Model](user_guide/implicit_state_management.md#implicit-state-management)

docs/contents.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
..
2-
.. Copyright 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
.. Copyright 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
..
44
.. Redistribution and use in source and binary forms, with or without
55
.. modification, are permitted provided that the following conditions
@@ -37,7 +37,7 @@
3737
:caption: Getting Started
3838

3939
getting_started/quick_deployment_by_backend
40-
LLM With TRT-LLM <getting_started/trtllm_user_guide.md>
40+
LLM With TensorRT-LLM <getting_started/trtllm_user_guide.md>
4141
Multimodal model <../tutorials/Popular_Models_Guide/Llava1.5/llava_trtllm_guide.md>
4242
Stable diffusion <../tutorials/Popular_Models_Guide/StableDiffusion/README.md>
4343

@@ -96,10 +96,10 @@
9696
:hidden:
9797
:caption: Backends
9898

99-
TRT-LLM <tensorrtllm_backend/README>
99+
TensorRT-LLM <tensorrtllm_backend/README>
100100
vLLM <backend_guide/vllm>
101101
Python <python_backend/README>
102-
Pytorch <pytorch_backend/README>
102+
PyTorch <pytorch_backend/README>
103103
ONNX Runtime <onnxruntime_backend/README>
104104
TensorRT <tensorrt_backend/README>
105105
FIL <fil_backend/README>

docs/customization_guide/build.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2018-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -59,8 +59,6 @@ to build Triton on a platform that is not listed here.
5959

6060
* [Ubuntu 22.04, x86-64](#building-for-ubuntu-2204)
6161

62-
* [Jetpack 4.x, NVIDIA Jetson (Xavier, Nano, TX2)](#building-for-jetpack-4x)
63-
6462
* [Windows 10, x86-64](#building-for-windows-10)
6563

6664
If you are developing or debugging Triton, see [Development and

docs/customization_guide/inference_protocols.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2018-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -31,7 +31,7 @@
3131
Clients can communicate with Triton using either an [HTTP/REST
3232
protocol](#httprest-and-grpc-protocols), a [GRPC
3333
protocol](#httprest-and-grpc-protocols), or by an [in-process C
34-
API](inprocess_c_api.md#in-process-triton-server-api) or its
34+
API](inprocess_c_api.md) or its
3535
[C++ wrapper](https://github.com/triton-inference-server/developer_tools/tree/main/server).
3636

3737
## HTTP/REST and GRPC Protocols
@@ -142,7 +142,7 @@ For client-side documentation, see [Client-Side GRPC Status Codes](https://githu
142142

143143
#### GRPC Inference Handler Threads
144144

145-
In general, using 2 threads per completion queue seems to give the best performance, see [gRPC Performance Best Practices] (https://grpc.io/docs/guides/performance/#c). However, in cases where the performance bottleneck is at the request handling step (e.g. ensemble models), increasing the number of gRPC inference handler threads may lead to a higher throughput.
145+
In general, using 2 threads per completion queue seems to give the best performance, see [gRPC Performance Best Practices](https://grpc.io/docs/guides/performance/#c). However, in cases where the performance bottleneck is at the request handling step (e.g. ensemble models), increasing the number of gRPC inference handler threads may lead to a higher throughput.
146146

147147
* `--grpc-infer-thread-count`: 2 by default.
148148

docs/examples/jetson/concurrency_and_dynamic_batching/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright (c) 2021-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -326,6 +326,6 @@ dynamic_batching {
326326
}
327327
```
328328

329-
To try further options of dynamic batcher see the [documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher).
329+
To try further options of dynamic batcher see the [documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/batcher.md#dynamic-batcher).
330330

331331
You can also try enabling both concurrent model execution and dynamic batching.

docs/getting_started/llm.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright (c) 2024-2025, NVIDIA CORPORATION. All rights reserved.
2+
# Copyright (c) 2024-2026, NVIDIA CORPORATION. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -30,7 +30,7 @@
3030

3131
This guide captures the steps to build Phi-3 with TRT-LLM and deploy with Triton Inference Server. It also shows a shows how to use GenAI-Perf to run benchmarks to measure model performance in terms of throughput and latency.
3232

33-
This guide is tested on A100 80GB SXM4 and H100 80GB PCIe. It is confirmed to work with Phi-3-mini-128k-instruct and Phi-3-mini-4k-instruct (see [Support Matrix](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/phi) for full list) using TRT-LLM v0.11 and Triton Inference Server 24.07.
33+
This guide is tested on A100 80GB SXM4 and H100 80GB PCIe. It is confirmed to work with Phi-3-mini-128k-instruct and Phi-3-mini-4k-instruct (see [Support Matrix](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/phi) for full list) using TRT-LLM v0.11 and Triton Inference Server 24.07.
3434

3535
- [Build and test TRT-LLM engine](#build-and-test-trt-llm-engine)
3636
- [Deploy with Triton Inference Server](#deploy-with-triton-inference-server)
@@ -76,7 +76,7 @@ Reference: <https://nvidia.github.io/TensorRT-LLM/installation/linux.html>
7676

7777
## Build the TRT-LLM Engine
7878

79-
Reference: <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/phi>
79+
Reference: <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/phi>
8080

8181
4. ## Download Phi-3-mini-4k-instruct
8282

@@ -354,7 +354,7 @@ All config files inside /tensorrtllm\_backend/all\_models/inflight\_batcher\_llm
354354
<details>
355355
<summary><b> ensemble/config.pbtxt</b></summary>
356356

357-
# Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
357+
# Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
358358
#
359359
# Redistribution and use in source and binary forms, with or without
360360
# modification, are permitted provided that the following conditions
@@ -864,7 +864,7 @@ All config files inside /tensorrtllm\_backend/all\_models/inflight\_batcher\_llm
864864
<details>
865865
<summary><b>postprocessing/config.pbtxt</b></summary>
866866

867-
# Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
867+
# Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
868868
#
869869
# Redistribution and use in source and binary forms, with or without
870870
# modification, are permitted provided that the following conditions
@@ -993,7 +993,7 @@ All config files inside /tensorrtllm\_backend/all\_models/inflight\_batcher\_llm
993993
<details>
994994
<summary><b> preprocessing/config.pbtxt</b> </summary>
995995

996-
# Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
996+
# Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
997997
#
998998
# Redistribution and use in source and binary forms, with or without
999999
# modification, are permitted provided that the following conditions
@@ -1188,7 +1188,7 @@ All config files inside /tensorrtllm\_backend/all\_models/inflight\_batcher\_llm
11881188
<summary> <b> tensorrt_llm/config.pbtxt </b></summary>
11891189

11901190

1191-
# Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1191+
# Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
11921192
#
11931193
# Redistribution and use in source and binary forms, with or without
11941194
# modification, are permitted provided that the following conditions

docs/getting_started/trtllm_user_guide.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -50,7 +50,7 @@ to prepare engines for your LLM models and serve them with Triton.
5050
## How to use your custom TRT-LLM model
5151

5252
All the supported models can be found in the
53-
[examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) folder in
53+
[examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core) folder in
5454
the TRT-LLM repo. Follow the examples to convert your models to TensorRT
5555
engines.
5656

@@ -61,7 +61,7 @@ for Triton, and
6161
Only the *mandatory parameters* need to be set in the model config file. Feel free
6262
to modify the optional parameters as needed. To learn more about the
6363
parameters, model inputs, and outputs, see the
64-
[model config documentation](ttps://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/model_config.md) for more details.
64+
[model config documentation](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/model_config.md) for more details.
6565

6666
## Advanced Configuration Options and Deployment Strategies
6767

@@ -95,7 +95,7 @@ to learn how to use GenAI-Perf to benchmark your LLM models.
9595
## Performance Best Practices
9696

9797
Check out the
98-
[Performance Best Practices guide](https://nvidia.github.io/TensorRT-LLM/performance/perf-best-practices.html)
98+
[Performance tuning guide](https://nvidia.github.io/TensorRT-LLM/performance/performance-tuning-guide/)
9999
to learn how to optimize your TensorRT-LLM models for better performance.
100100

101101
## Metrics

docs/index.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2023-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2023-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -58,9 +58,9 @@ architecture. The [model repository](user_guide/model_repository.md) is a
5858
file-system based repository of the models that Triton will make
5959
available for inferencing. Inference requests arrive at the server via
6060
either [HTTP/REST or GRPC](customization_guide/inference_protocols.md) or by the [C
61-
API](customization_guide/inference_protocols.md) and are then routed to the appropriate per-model
61+
API](customization_guide/inprocess_c_api.md) and are then routed to the appropriate per-model
6262
scheduler. Triton implements [multiple scheduling and batching
63-
algorithms](#models-and-schedulers) that can be configured on a
63+
algorithms](./user_guide/architecture.md#models-and-schedulers) that can be configured on a
6464
model-by-model basis. Each model's scheduler optionally performs
6565
batching of inference requests and then passes the requests to the
6666
[backend](https://github.com/triton-inference-server/backend/blob/main/README.md)
@@ -89,7 +89,7 @@ framework such as Kubernetes.
8989
Major features include:
9090

9191
- [Supports multiple deep learning
92-
frameworks](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton)
92+
frameworks](backend/README.md#where-can-i-find-all-the-backends-that-are-available-for-triton)
9393
- [Supports multiple machine learning
9494
frameworks](https://github.com/triton-inference-server/fil_backend)
9595
- [Concurrent model

docs/introduction/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2023-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2023-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -60,7 +60,7 @@ available for inferencing. Inference requests arrive at the server via
6060
either [HTTP/REST or GRPC](../customization_guide/inference_protocols.md) or by the [C
6161
API](../customization_guide/inprocess_c_api.md) and are then routed to the appropriate per-model
6262
scheduler. Triton implements [multiple scheduling and batching
63-
algorithms](#models-and-schedulers) that can be configured on a
63+
algorithms](../user_guide/architecture.md#models-and-schedulers) that can be configured on a
6464
model-by-model basis. Each model's scheduler optionally performs
6565
batching of inference requests and then passes the requests to the
6666
[backend](https://github.com/triton-inference-server/backend/blob/main/README.md)

0 commit comments

Comments
 (0)