triton-inference-server
diff --git a/‎Conceptual_Guide/Part_6-building_complex_pipelines/README.md‎
Lines changed: 124 additions & 0 deletions b/‎Conceptual_Guide/Part_6-building_complex_pipelines/README.md‎
Lines changed: 124 additions & 0 deletions
diff --git a/‎Conceptual_Guide/Part_6-building_complex_pipelines/client.py‎
Lines changed: 60 additions & 0 deletions b/‎Conceptual_Guide/Part_6-building_complex_pipelines/client.py‎
Lines changed: 60 additions & 0 deletions
diff --git a/‎Conceptual_Guide/Part_6-building_complex_pipelines/export.py‎
Lines changed: 80 additions & 0 deletions b/‎Conceptual_Guide/Part_6-building_complex_pipelines/export.py‎
Lines changed: 80 additions & 0 deletions
diff --git a/‎Conceptual_Guide/Part_6-building_complex_pipelines/gui/README.md‎
Lines changed: 36 additions & 0 deletions b/‎Conceptual_Guide/Part_6-building_complex_pipelines/gui/README.md‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎Conceptual_Guide/Part_6-building_complex_pipelines/gui/client.py‎
Lines changed: 65 additions & 0 deletions b/‎Conceptual_Guide/Part_6-building_complex_pipelines/gui/client.py‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎Conceptual_Guide/Part_6-building_complex_pipelines/gui/requirements.txt‎
Lines changed: 28 additions & 0 deletions b/‎Conceptual_Guide/Part_6-building_complex_pipelines/gui/requirements.txt‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎Conceptual_Guide/Part_6-building_complex_pipelines/img/multiple_backends.PNG‎
1.58 MB b/‎Conceptual_Guide/Part_6-building_complex_pipelines/img/multiple_backends.PNG‎
1.58 MB
@@ -0,0 +1,124 @@
+<!--
+# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+-->
+
+# Building Complex Pipelines: Stable Diffusion
+
+*Note*: This tutorial aims at demonstrating the ease of deployment and doesn't incorporate all possible optimizations using the NVIDIA ecosystem.
+
+It is recommended to watch [this explainer video](https://youtu.be/JgP2WgNIq_w) with discusses the pipeline, before proceeding with the example. This example focuses on showcasing two of Triton Inference Server's features:
+* Using multiple frameworks in the same inference pipeline. Refer [this for more information](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton) about supported frameworks.
+* Using the Python Backend's [Business Logic Scripting](https://github.com/triton-inference-server/python_backend#business-logic-scripting) API to build complex non linear pipelines.
+
+## Using Multiple Backends
+
+Building a pipeline powered by deep learning models is a collaborative effort which often involves multiple contributors. Contributors often have differing development environment. This can lead to issues whilst building a single pipeline with work from different contributors. Triton users can solve this challenge with the use of the Python or a C++ backend along with the Business Logic Scripting API (BLS) API to trigger model execution.
+
+![Pipeline](./img/multiple_backends.PNG)
+
+In this example, the models are being run on:
+* ONNX Backend
+* TensorRT Backend
+* Python Backend
+
+Both the models deployed on a framework backend can be triggered using the following API:
+```
+encoding_request = pb_utils.InferenceRequest(
+    model_name="text_encoder",
+    requested_output_names=["last_hidden_state"],
+    inputs=[input_ids_1],
+)
+
+response = encoding_request.exec()
+text_embeddings = pb_utils.get_output_tensor_by_name(response, "last_hidden_state")
+```
+
+Refer to `model.py` in the `pipeline` model for a complete example.
+
+## Stable Diffusion Example
+
+Before starting, clone this repository and navigate to the root folder. Use three different terminals for an easier user experience.
+
+### Step 1: Prepare the Server Environment
+* First, run the Triton Inference Server Container.
+```
+# Replace yy.mm with year and month of release. Eg. 22.08
+docker run --gpus=all -it --shm-size=256m --rm -p8000:8000 -p8001:8001 -p8002:8002 -v ${PWD}:/workspace/ -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:yy.mm-py3 bash
+```
+* Next, install all the dependencies required by the models running in the python backend and login with your [huggingface token](https://huggingface.co/settings/tokens)(Account on [HuggingFace](https://huggingface.co/) is required).
+
+```
+# PyTorch & Transformers Lib
+pip install torch torchvision torchaudio
+pip install transformers ftfy scipy accelerate
+pip install diffusers==0.9.0
+pip install transformers[onnxruntime]
+huggingface-cli login
+```
+
+### Step 2: Exporting and converting the models
+Use the NGC PyTorch container, to export and convert the models.
+
+```
+docker run -it --gpus all -p 8888:8888 -v ${PWD}:/mount nvcr.io/nvidia/pytorch:yy.mm-py3
+
+pip install transformers ftfy scipy
+pip install transformers[onnxruntime]
+pip install diffusers==0.9.0
+huggingface-cli login
+cd /mount
+python export.py
+
+# Accelerating VAE with TensorRT
+trtexec --onnx=vae.onnx --saveEngine=vae.plan --minShapes=latent_sample:1x4x64x64 --optShapes=latent_sample:4x4x64x64 --maxShapes=latent_sample:8x4x64x64 --fp16
+
+# Place the models in the model repository
+mkdir model_repository/vae/1
+mkdir model_repository/text_encoder/1
+mv vae.plan model_repository/vae/1/model.plan
+mv encoder.onnx model_repository/text_encoder/1/model.onnx
+```
+
+### Step 3: Launch the Server
+From the server container, launch the Triton Inference Server.
+```
+tritonserver --model-repository=/models
+```
+
+### Step 4: Run the client
+Use the client container and run the client.
+```
+docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:yy.mm-py3-sdk bash
+
+# Client with no GUI
+python3 client.py
+
+# Client with GUI
+pip install gradio packaging
+python3 gui/client.py --triton_url="localhost:8001"
+```
+Note: First Inference query may take more time than successive queries
@@ -0,0 +1,60 @@
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import numpy as np
+import time
+from tritonclient.utils import *
+from PIL import Image
+import tritonclient.http as httpclient
+
+
+def main():
+    client = httpclient.InferenceServerClient(url="localhost:8000")
+
+    prompt = "Pikachu with a hat, 4k, 3d render"
+    text_obj = np.array([prompt], dtype="object").reshape((-1, 1))
+
+    input_text = httpclient.InferInput("prompt", text_obj.shape,
+                                       np_to_triton_dtype(text_obj.dtype))
+    input_text.set_data_from_numpy(text_obj)
+
+    output_img = httpclient.InferRequestedOutput("generated_image")
+
+    query_response = client.infer(model_name="pipeline",
+                                  inputs=[input_text],
+                                  outputs=[output_img])
+
+    image = query_response.as_numpy("generated_image")
+    im = Image.fromarray(np.squeeze(image.astype(np.uint8)))
+    im.save("generated_image2.jpg")
+
+
+if __name__ == "__main__":
+    start = time.time()
+    main()
+    end = time.time()
+
+    print("Time taken:", end - start)
@@ -0,0 +1,80 @@
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+from diffusers import AutoencoderKL
+from transformers import CLIPTextModel, CLIPTokenizer
+import torch
+
+prompt = "Draw a dog"
+vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4",
+                                    subfolder="vae",
+                                    use_auth_token=True)
+
+tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
+text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
+
+vae.forward = vae.decode
+torch.onnx.export(
+    vae,
+    (torch.randn(1, 4, 64, 64), False),
+    "vae.onnx",
+    input_names=["latent_sample", "return_dict"],
+    output_names=["sample"],
+    dynamic_axes={
+        "latent_sample": {
+            0: "batch",
+            1: "channels",
+            2: "height",
+            3: "width"
+        },
+    },
+    do_constant_folding=True,
+    opset_version=14,
+)
+
+text_input = tokenizer(
+    prompt,
+    padding="max_length",
+    max_length=tokenizer.model_max_length,
+    truncation=True,
+    return_tensors="pt",
+)
+
+torch.onnx.export(
+    text_encoder,
+    (text_input.input_ids.to(torch.int32)),
+    "encoder.onnx",
+    input_names=["input_ids"],
+    output_names=["last_hidden_state", "pooler_output"],
+    dynamic_axes={
+        "input_ids": {
+            0: "batch",
+            1: "sequence"
+        },
+    },
+    opset_version=14,
+    do_constant_folding=True,
+)
@@ -0,0 +1,36 @@
+<!--
+# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+-->
+
+# Stable Diffusion UI
+A simple Gradio UI for communicating with Stable Diffusion on Triton
+
+## To deploy
+```
+pip install -r requirements.txt
+python client.py --triton_url <YOUR_TRITON_SERVER_URL>
+```
@@ -0,0 +1,65 @@
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import argparse
+
+import gradio as gr
+import numpy as np
+import tritonclient.grpc as grpcclient
+from PIL import Image
+from tritonclient.utils import np_to_triton_dtype
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--triton_url", default="localhost:8001")
+args = parser.parse_args()
+
+client = grpcclient.InferenceServerClient(url=f"{args.triton_url}")
+
+
+def generate(prompt):
+    text_obj = np.array([prompt], dtype="object").reshape((-1, 1))
+    input_text = grpcclient.InferInput("prompt", text_obj.shape,
+                                       np_to_triton_dtype(text_obj.dtype))
+    input_text.set_data_from_numpy(text_obj)
+
+    output_img = grpcclient.InferRequestedOutput("generated_image")
+
+    response = client.infer(model_name="pipeline",
+                            inputs=[input_text],
+                            outputs=[output_img])
+    resp_img = response.as_numpy("generated_image")
+    print(resp_img.shape)
+    im = Image.fromarray(np.squeeze(resp_img.astype(np.uint8)))
+    return im
+
+
+with gr.Blocks() as app:
+    prompt = gr.Textbox(label="Prompt")
+    submit_btn = gr.Button("Generate")
+    img_output = gr.Image().style(height=512)
+    submit_btn.click(fn=generate, inputs=prompt, outputs=img_output)
+
+app.launch(server_name="0.0.0.0")
@@ -0,0 +1,28 @@
+# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+gradio
+tritonclient[grpc]