|
1 | 1 | <!-- |
2 | | -# Copyright (c) 2024-2025, NVIDIA CORPORATION. All rights reserved. |
| 2 | +# Copyright (c) 2024-2026, NVIDIA CORPORATION. All rights reserved. |
3 | 3 | # |
4 | 4 | # Redistribution and use in source and binary forms, with or without |
5 | 5 | # modification, are permitted provided that the following conditions |
|
30 | 30 |
|
31 | 31 | This guide captures the steps to build Phi-3 with TRT-LLM and deploy with Triton Inference Server. It also shows a shows how to use GenAI-Perf to run benchmarks to measure model performance in terms of throughput and latency. |
32 | 32 |
|
33 | | -This guide is tested on A100 80GB SXM4 and H100 80GB PCIe. It is confirmed to work with Phi-3-mini-128k-instruct and Phi-3-mini-4k-instruct (see [Support Matrix](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/phi) for full list) using TRT-LLM v0.11 and Triton Inference Server 24.07. |
| 33 | +This guide is tested on A100 80GB SXM4 and H100 80GB PCIe. It is confirmed to work with Phi-3-mini-128k-instruct and Phi-3-mini-4k-instruct (see [Support Matrix](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/phi) for full list) using TRT-LLM v0.11 and Triton Inference Server 24.07. |
34 | 34 |
|
35 | 35 | - [Build and test TRT-LLM engine](#build-and-test-trt-llm-engine) |
36 | 36 | - [Deploy with Triton Inference Server](#deploy-with-triton-inference-server) |
@@ -76,7 +76,7 @@ Reference: <https://nvidia.github.io/TensorRT-LLM/installation/linux.html> |
76 | 76 |
|
77 | 77 | ## Build the TRT-LLM Engine |
78 | 78 |
|
79 | | -Reference: <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/phi> |
| 79 | +Reference: <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/phi> |
80 | 80 |
|
81 | 81 | 4. ## Download Phi-3-mini-4k-instruct |
82 | 82 |
|
@@ -354,7 +354,7 @@ All config files inside /tensorrtllm\_backend/all\_models/inflight\_batcher\_llm |
354 | 354 | <details> |
355 | 355 | <summary><b> ensemble/config.pbtxt</b></summary> |
356 | 356 |
|
357 | | - # Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 357 | + # Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
358 | 358 | # |
359 | 359 | # Redistribution and use in source and binary forms, with or without |
360 | 360 | # modification, are permitted provided that the following conditions |
@@ -864,7 +864,7 @@ All config files inside /tensorrtllm\_backend/all\_models/inflight\_batcher\_llm |
864 | 864 | <details> |
865 | 865 | <summary><b>postprocessing/config.pbtxt</b></summary> |
866 | 866 |
|
867 | | - # Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 867 | + # Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
868 | 868 | # |
869 | 869 | # Redistribution and use in source and binary forms, with or without |
870 | 870 | # modification, are permitted provided that the following conditions |
@@ -993,7 +993,7 @@ All config files inside /tensorrtllm\_backend/all\_models/inflight\_batcher\_llm |
993 | 993 | <details> |
994 | 994 | <summary><b> preprocessing/config.pbtxt</b> </summary> |
995 | 995 |
|
996 | | - # Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 996 | + # Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
997 | 997 | # |
998 | 998 | # Redistribution and use in source and binary forms, with or without |
999 | 999 | # modification, are permitted provided that the following conditions |
@@ -1188,7 +1188,7 @@ All config files inside /tensorrtllm\_backend/all\_models/inflight\_batcher\_llm |
1188 | 1188 | <summary> <b> tensorrt_llm/config.pbtxt </b></summary> |
1189 | 1189 |
|
1190 | 1190 |
|
1191 | | - # Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 1191 | + # Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
1192 | 1192 | # |
1193 | 1193 | # Redistribution and use in source and binary forms, with or without |
1194 | 1194 | # modification, are permitted provided that the following conditions |
|
0 commit comments