Skip to content

Commit d2cc2f8

Browse files
committed
added related pages
1 parent 2ae29e1 commit d2cc2f8

6 files changed

Lines changed: 17 additions & 1 deletion

File tree

Conceptual_Guide/Part_1-model_deployment/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Deploy models using Triton
22

3+
|Related Pages | [Model Repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md) | [Model Configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md) |
4+
| ------------ | --------------- | --------------- |
5+
36
Any deep learning inference serving solution needs to tackle two fundamental challenges:
47
* Managing multiple models.
58
* Versioning, loading, and unloading models.

Conceptual_Guide/Part_2-improving_resource_utilization/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Dynamic Batching & Concurrent Model Execution
22

3+
|Related Pages | [Performance Analyzer](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md) | [Model Configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md) |
4+
| ------------ | --------------- | --------------- |
5+
36
Part-1 of this series introduced the mechanisms to set up a Triton Inference Server. This iteration discusses the concept of dynamic batching and concurrent model execution. These are important features that can be used to reduce latency as well as increase throughput via higher resource utilization.
47

58
## What is Dynamic Batching?

Conceptual_Guide/Part_3-optimizing_triton_configuration/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
# Customizing deployment with Model Analyzer
2+
|Related Pages | [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) | [Model Navigator](https://github.com/triton-inference-server/model_navigator) |
3+
| ------------ | --------------- | --------------- |
24

35
Every inference deployment has its unique set of challenges. These challenges may arise from Service Level Agreements about maintaining latency, limited hardware resources, unique requirements of individual models, the nature and the volume of requests, or something completely different. Additionally, the Triton Inference Server has many features which can be leveraged be make tradeoffs between memory consumption and performance.
46

Conceptual_Guide/Part_4-inference_acceleration/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Accelerating Inference for Deep Learning Models
22

3+
|Related Pages | [TensorRT](https://github.com/NVIDIA/TensorRT) | [Model Navigator](https://github.com/triton-inference-server/model_navigator) | [Optimization Summary](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/optimization.md) |
4+
| ------------ | --------------- | --------------- | --------------- |
5+
36
Model acceleration is a complex nuanced topic. The viability of techniques like graph optimizations for models, pruning, knowledge distillation, quantization, and more, highly depend on the structure of the model. Each of these topics are vast fields of research in their own right and building custom tools requires massive engineering investment.
47

58
Rather than having an exhaustive outline of the ecosystem, for brevity and objectivity, this discussion will be focused on the tools and features which are recommended to use while deploying models using the Triton Inference Server.

Conceptual_Guide/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,4 @@
1-
# Conceptual Guides
1+
# Conceptual Guides
2+
3+
| Related Pages | [Server Docs](https://github.com/triton-inference-server/server/tree/main/docs#triton-inference-server-documentation) |
4+
| ------------ | --------------- |

Quick_Deploy/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
# Quickly deploy your models
22

3+
| Related Pages | [Server Quick Start Guide](https://github.com/triton-inference-server/server/blob/main/docs/getting_started/quickstart.md) |
4+
| ------------ | --------------- |

0 commit comments

Comments
 (0)