Skip to content

Commit 6e645e8

Browse files
Merge pull request #3 from triton-inference-server/tvarshney_update_conceptual_guide
added additional conceptual guide
2 parents 57735c9 + b60ca5a commit 6e645e8

12 files changed

Lines changed: 730 additions & 1 deletion

File tree

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
<!--
2+
# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# Building Complex Pipelines: Stable Diffusion
30+
31+
*Note*: This tutorial aims at demonstrating the ease of deployment and doesn't incorporate all possible optimizations using the NVIDIA ecosystem.
32+
33+
It is recommended to watch [this explainer video](https://youtu.be/JgP2WgNIq_w) with discusses the pipeline, before proceeding with the example. This example focuses on showcasing two of Triton Inference Server's features:
34+
* Using multiple frameworks in the same inference pipeline. Refer [this for more information](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton) about supported frameworks.
35+
* Using the Python Backend's [Business Logic Scripting](https://github.com/triton-inference-server/python_backend#business-logic-scripting) API to build complex non linear pipelines.
36+
37+
## Using Multiple Backends
38+
39+
Building a pipeline powered by deep learning models is a collaborative effort which often involves multiple contributors. Contributors often have differing development environment. This can lead to issues whilst building a single pipeline with work from different contributors. Triton users can solve this challenge with the use of the Python or a C++ backend along with the Business Logic Scripting API (BLS) API to trigger model execution.
40+
41+
![Pipeline](./img/multiple_backends.PNG)
42+
43+
In this example, the models are being run on:
44+
* ONNX Backend
45+
* TensorRT Backend
46+
* Python Backend
47+
48+
Both the models deployed on a framework backend can be triggered using the following API:
49+
```
50+
encoding_request = pb_utils.InferenceRequest(
51+
model_name="text_encoder",
52+
requested_output_names=["last_hidden_state"],
53+
inputs=[input_ids_1],
54+
)
55+
56+
response = encoding_request.exec()
57+
text_embeddings = pb_utils.get_output_tensor_by_name(response, "last_hidden_state")
58+
```
59+
60+
Refer to `model.py` in the `pipeline` model for a complete example.
61+
62+
## Stable Diffusion Example
63+
64+
Before starting, clone this repository and navigate to the root folder. Use three different terminals for an easier user experience.
65+
66+
### Step 1: Prepare the Server Environment
67+
* First, run the Triton Inference Server Container.
68+
```
69+
# Replace yy.mm with year and month of release. Eg. 22.08
70+
docker run --gpus=all -it --shm-size=256m --rm -p8000:8000 -p8001:8001 -p8002:8002 -v ${PWD}:/workspace/ -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:yy.mm-py3 bash
71+
```
72+
* Next, install all the dependencies required by the models running in the python backend and login with your [huggingface token](https://huggingface.co/settings/tokens)(Account on [HuggingFace](https://huggingface.co/) is required).
73+
74+
```
75+
# PyTorch & Transformers Lib
76+
pip install torch torchvision torchaudio
77+
pip install transformers ftfy scipy accelerate
78+
pip install diffusers==0.9.0
79+
pip install transformers[onnxruntime]
80+
huggingface-cli login
81+
```
82+
83+
### Step 2: Exporting and converting the models
84+
Use the NGC PyTorch container, to export and convert the models.
85+
86+
```
87+
docker run -it --gpus all -p 8888:8888 -v ${PWD}:/mount nvcr.io/nvidia/pytorch:yy.mm-py3
88+
89+
pip install transformers ftfy scipy
90+
pip install transformers[onnxruntime]
91+
pip install diffusers==0.9.0
92+
huggingface-cli login
93+
cd /mount
94+
python export.py
95+
96+
# Accelerating VAE with TensorRT
97+
trtexec --onnx=vae.onnx --saveEngine=vae.plan --minShapes=latent_sample:1x4x64x64 --optShapes=latent_sample:4x4x64x64 --maxShapes=latent_sample:8x4x64x64 --fp16
98+
99+
# Place the models in the model repository
100+
mkdir model_repository/vae/1
101+
mkdir model_repository/text_encoder/1
102+
mv vae.plan model_repository/vae/1/model.plan
103+
mv encoder.onnx model_repository/text_encoder/1/model.onnx
104+
```
105+
106+
### Step 3: Launch the Server
107+
From the server container, launch the Triton Inference Server.
108+
```
109+
tritonserver --model-repository=/models
110+
```
111+
112+
### Step 4: Run the client
113+
Use the client container and run the client.
114+
```
115+
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:yy.mm-py3-sdk bash
116+
117+
# Client with no GUI
118+
python3 client.py
119+
120+
# Client with GUI
121+
pip install gradio packaging
122+
python3 gui/client.py --triton_url="localhost:8001"
123+
```
124+
Note: First Inference query may take more time than successive queries
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
import numpy as np
28+
import time
29+
from tritonclient.utils import *
30+
from PIL import Image
31+
import tritonclient.http as httpclient
32+
33+
34+
def main():
35+
client = httpclient.InferenceServerClient(url="localhost:8000")
36+
37+
prompt = "Pikachu with a hat, 4k, 3d render"
38+
text_obj = np.array([prompt], dtype="object").reshape((-1, 1))
39+
40+
input_text = httpclient.InferInput("prompt", text_obj.shape,
41+
np_to_triton_dtype(text_obj.dtype))
42+
input_text.set_data_from_numpy(text_obj)
43+
44+
output_img = httpclient.InferRequestedOutput("generated_image")
45+
46+
query_response = client.infer(model_name="pipeline",
47+
inputs=[input_text],
48+
outputs=[output_img])
49+
50+
image = query_response.as_numpy("generated_image")
51+
im = Image.fromarray(np.squeeze(image.astype(np.uint8)))
52+
im.save("generated_image2.jpg")
53+
54+
55+
if __name__ == "__main__":
56+
start = time.time()
57+
main()
58+
end = time.time()
59+
60+
print("Time taken:", end - start)
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
from diffusers import AutoencoderKL
28+
from transformers import CLIPTextModel, CLIPTokenizer
29+
import torch
30+
31+
prompt = "Draw a dog"
32+
vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4",
33+
subfolder="vae",
34+
use_auth_token=True)
35+
36+
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
37+
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
38+
39+
vae.forward = vae.decode
40+
torch.onnx.export(
41+
vae,
42+
(torch.randn(1, 4, 64, 64), False),
43+
"vae.onnx",
44+
input_names=["latent_sample", "return_dict"],
45+
output_names=["sample"],
46+
dynamic_axes={
47+
"latent_sample": {
48+
0: "batch",
49+
1: "channels",
50+
2: "height",
51+
3: "width"
52+
},
53+
},
54+
do_constant_folding=True,
55+
opset_version=14,
56+
)
57+
58+
text_input = tokenizer(
59+
prompt,
60+
padding="max_length",
61+
max_length=tokenizer.model_max_length,
62+
truncation=True,
63+
return_tensors="pt",
64+
)
65+
66+
torch.onnx.export(
67+
text_encoder,
68+
(text_input.input_ids.to(torch.int32)),
69+
"encoder.onnx",
70+
input_names=["input_ids"],
71+
output_names=["last_hidden_state", "pooler_output"],
72+
dynamic_axes={
73+
"input_ids": {
74+
0: "batch",
75+
1: "sequence"
76+
},
77+
},
78+
opset_version=14,
79+
do_constant_folding=True,
80+
)
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!--
2+
# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# Stable Diffusion UI
30+
A simple Gradio UI for communicating with Stable Diffusion on Triton
31+
32+
## To deploy
33+
```
34+
pip install -r requirements.txt
35+
python client.py --triton_url <YOUR_TRITON_SERVER_URL>
36+
```
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
import argparse
28+
29+
import gradio as gr
30+
import numpy as np
31+
import tritonclient.grpc as grpcclient
32+
from PIL import Image
33+
from tritonclient.utils import np_to_triton_dtype
34+
35+
parser = argparse.ArgumentParser()
36+
parser.add_argument("--triton_url", default="localhost:8001")
37+
args = parser.parse_args()
38+
39+
client = grpcclient.InferenceServerClient(url=f"{args.triton_url}")
40+
41+
42+
def generate(prompt):
43+
text_obj = np.array([prompt], dtype="object").reshape((-1, 1))
44+
input_text = grpcclient.InferInput("prompt", text_obj.shape,
45+
np_to_triton_dtype(text_obj.dtype))
46+
input_text.set_data_from_numpy(text_obj)
47+
48+
output_img = grpcclient.InferRequestedOutput("generated_image")
49+
50+
response = client.infer(model_name="pipeline",
51+
inputs=[input_text],
52+
outputs=[output_img])
53+
resp_img = response.as_numpy("generated_image")
54+
print(resp_img.shape)
55+
im = Image.fromarray(np.squeeze(resp_img.astype(np.uint8)))
56+
return im
57+
58+
59+
with gr.Blocks() as app:
60+
prompt = gr.Textbox(label="Prompt")
61+
submit_btn = gr.Button("Generate")
62+
img_output = gr.Image().style(height=512)
63+
submit_btn.click(fn=generate, inputs=prompt, outputs=img_output)
64+
65+
app.launch(server_name="0.0.0.0")
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
gradio
28+
tritonclient[grpc]
1.58 MB
Loading

0 commit comments

Comments
 (0)