Merge pull request #4 from triton-inference-server/tvarshney_quick_deploy

tanayvarshney · web-flow · commit d2b8a835d663 · 2022-12-18T15:23:52.000-08:00
added Torch, TF and TRT quick deploy
diff --git a/HuggingFace/README.md b/HuggingFace/README.md
@@ -0,0 +1 @@
+https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPModel.forward.example
diff --git a/Quick_Deploy/PyTorch/README.md b/Quick_Deploy/PyTorch/README.md
@@ -0,0 +1,71 @@
+# Deploying a PyTorch Model
+
+This README showcases how to deploy a simple ResNet model on Triton Inference Server.
+
+## Step 1: Export the model
+
+Save the PyTorch model.
+
+```
+# <xx.xx> is the yy:mm for the publishing tag for NVIDIA's PyTorch 
+# container; eg. 22.04
+
+docker run -it --gpus all -v ${PWD}:/workspace nvcr.io/nvidia/pytorch:<xx.xx>-py3
+python export.py
+```
+
+## Step 2: Set Up Triton Inference Server
+
+To use Triton, we need to build a model repository. The structure of the repository as follows:
+```
+model_repository
+|
++-- resnet50
+    |
+    +-- config.pbtxt
+    +-- 1
+        |
+        +-- model.pt
+```
+
+A sample model configuration of the model is included with this demo as `config.pbtxt`. If you are new to Triton, it is highly recommended to [review Part 1](../../Conceptual_Guide/Part_1-model_deployment/README.md) of the conceptual guide.
+```
+docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models
+```
+
+## Step 3: Using a Triton Client to Query the Server
+
+Install dependencies & download an example image to test inference.
+
+```
+pip install torchvision
+pip install attrdict
+pip install nvidia-pyindex
+pip install tritonclient[all]
+
+wget  -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
+```
+Building a client requires three basic points. Firstly, we setup a connection with the Triton Inference Server.
+```
+client = httpclient.InferenceServerClient(url="localhost:8000")
+```
+Secondly, we specify the names of the input and output layer(s) of our model.
+```
+inputs = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
+inputs.set_data_from_numpy(transformed_img, binary_data=True)
+
+outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)
+```
+Lastly, we send an inference request to the Triton Inference Server.
+```
+# Querying the server
+results = client.infer(model_name="resnet50", inputs=[inputs], outputs=[outputs])
+predictions = results.as_numpy('output__0')
+print(predictions[:5])
+```
+The output of the same should look like below:
+```
+[b'12.468750:90' b'11.523438:92' b'9.664062:14' b'8.429688:136'
+ b'8.234375:11']
+```
+The output format here is `<confidence_score>:<classification_index>`. To learn how to map these to the label names and more, refer to our [documentation](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_classification.md). The client code above is available in `client.py`. 
diff --git a/Quick_Deploy/PyTorch/client.py b/Quick_Deploy/PyTorch/client.py
@@ -0,0 +1,31 @@
+import numpy as np
+from torchvision import transforms
+from PIL import Image
+import tritonclient.http as httpclient
+from tritonclient.utils import triton_to_np_dtype
+
+# preprocessing function
+def rn50_preprocess(img_path="img1.jpg"):
+    img = Image.open(img_path)
+    preprocess = transforms.Compose([
+        transforms.Resize(256),
+        transforms.CenterCrop(224),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+    ])
+    return preprocess(img).numpy()
+
+transformed_img = rn50_preprocess()
+
+# Setting up client
+client = httpclient.InferenceServerClient(url="localhost:8000")
+
+inputs = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
+inputs.set_data_from_numpy(transformed_img, binary_data=True)
+
+outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)
+
+# Querying the server
+results = client.infer(model_name="resnet50", inputs=[inputs], outputs=[outputs])
+inference_output = results.as_numpy('output__0')
+print(inference_output[:5])
diff --git a/Quick_Deploy/PyTorch/config.pbtxt b/Quick_Deploy/PyTorch/config.pbtxt
@@ -0,0 +1,19 @@
+name: "resnet50"
+platform: "pytorch_libtorch"
+max_batch_size : 0
+input [
+  {
+    name: "input__0"
+    data_type: TYPE_FP32
+    dims: [ 3, 224, 224 ]
+    reshape { shape: [ 1, 3, 224, 224 ] }
+  }
+]
+output [
+  {
+    name: "output__0"
+    data_type: TYPE_FP32
+    dims: [ 1, 1000 ,1, 1]
+    reshape { shape: [ 1, 1000 ] }
+  }
+]
diff --git a/Quick_Deploy/PyTorch/export.py b/Quick_Deploy/PyTorch/export.py
@@ -0,0 +1,6 @@
+import torch
+import torch_tensorrt
+torch.hub._validate_not_a_forked_repo=lambda a,b,c: True
+
+model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True).eval().to("cuda")
+torch.save(model, "model.pt")
diff --git a/Quick_Deploy/README.md b/Quick_Deploy/README.md
diff --git a/Quick_Deploy/TensorFlow/README.md b/Quick_Deploy/TensorFlow/README.md
@@ -0,0 +1,82 @@
+# Deploying a TensorFlow Model
+
+This README showcases how to deploy a simple ResNet model on Triton Inference Server.
+
+## Step 1: Export the model
+
+Export a TensorFlow model as a saved model.
+
+```
+# <xx.xx> is the yy:mm for the publishing tag for NVIDIA's Tensorflow 
+# container; eg. 22.04
+
+docker run -it --gpus all -v ${PWD}:/workspace nvcr.io/nvidia/tensorflow:<xx.xx>-tf2-py3
+python export.py
+```
+
+## Step 2: Set Up Triton Inference Server
+
+To use Triton, we need to build a model repository. The structure of the repository as follows:
+```
+model_repository
+|
++-- resnet50
+    |
+    +-- config.pbtxt
+    +-- 1
+        |
+        +-- model.savedmodel
+            |
+            +-- saved_model.pb
+            +-- variables
+                |
+                +-- variables.data-00000-of-00001
+                +-- variables.index
+```
+
+A sample model configuration of the model is included with this demo as `config.pbtxt`. If you are new to Triton, it is highly recommended to [review Part 1](../../Conceptual_Guide/Part_1-model_deployment/README.md) of the conceptual guide.
+```
+docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models --backend-config=tensorflow,version=2
+```
+
+## Step 3: Using a Triton Client to Query the Server
+
+Install dependencies & download an example image to test inference.
+
+```
+pip install --upgrade tensorflow
+pip install pillow
+pip install nvidia-pyindex
+pip install tritonclient[all]
+
+wget  -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
+```
+Building a client requires three basic points. Firstly, we setup a connection with the Triton Inference Server.
+```
+triton_client = httpclient.InferenceServerClient(url="localhost:8000")
+```
+Secondly, we specify the names of the input and output layer(s) of our model.
+```
+inputs = httpclient.InferInput("input_1", transformed_img.shape, datatype="FP32")
+inputs.set_data_from_numpy(transformed_img, binary_data=True)
+
+output = httpclient.InferRequestedOutput("predictions", binary_data=True, class_count=1000)
+```
+Lastly, we send an inference request to the Triton Inference Server.
+```
+# Querying the server
+results = triton_client.infer(model_name="resnet50", inputs=[inputs], outputs=[output])
+predictions = results.as_numpy('predictions')
+print(predictions)
+```
+The output of the same should look like below:
+```
+[b'0.301167:90' b'0.169790:14' b'0.161309:92' b'0.093105:94'
+ b'0.058743:136' b'0.050185:11' b'0.033802:91' b'0.011760:88'
+ b'0.008309:989' b'0.004927:95' b'0.004905:13' b'0.004095:317'
+ b'0.004006:96' b'0.003694:12' b'0.003526:42' b'0.003390:313'
+ ...
+ b'0.000001:751' b'0.000001:685' b'0.000001:408' b'0.000001:116'
+ b'0.000001:627' b'0.000001:933' b'0.000000:661' b'0.000000:148']
+```
+The output format here is `<confidence_score>:<classification_index>`. To learn how to map these to the label names and more, refer to our [documentation](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_classification.md). The client code above is available in `client.py`. 
diff --git a/Quick_Deploy/TensorFlow/client.py b/Quick_Deploy/TensorFlow/client.py
@@ -0,0 +1,29 @@
+from tensorflow.keras.preprocessing import image
+from tensorflow.keras.applications.resnet50 import preprocess_input
+import tensorflow as tf
+
+import numpy as np
+import tritonclient.http as httpclient
+from tritonclient.utils import triton_to_np_dtype
+
+def process_image(image_path="img1.jpg"):
+    img = image.load_img(image_path, target_size=(224, 224))
+    x = image.img_to_array(img)
+    x = np.expand_dims(x, axis=0)
+    return preprocess_input(x)
+
+transformed_img = process_image()
+
+# Setting up client
+triton_client = httpclient.InferenceServerClient(url="localhost:8000")
+
+inputs = httpclient.InferInput("input_1", transformed_img.shape, datatype="FP32")
+inputs.set_data_from_numpy(transformed_img, binary_data=True)
+
+output = httpclient.InferRequestedOutput("predictions", binary_data=True, class_count=1000)
+
+# Querying the server
+results = triton_client.infer(model_name="resnet50", inputs=[inputs], outputs=[output])
+
+predictions = results.as_numpy('predictions')
+print(predictions)
diff --git a/Quick_Deploy/TensorFlow/config.pbtxt b/Quick_Deploy/TensorFlow/config.pbtxt
@@ -0,0 +1,17 @@
+name: "resnet50"
+platform: "tensorflow_savedmodel"
+max_batch_size : 0
+input [
+  {
+    name: "input_1"
+    data_type: TYPE_FP32
+    dims: [-1, 224, 224, 3 ]
+  }
+]
+output [
+  {
+    name: "predictions"
+    data_type: TYPE_FP32
+    dims: [-1, 1000]
+  }
+]
diff --git a/Quick_Deploy/TensorFlow/export.py b/Quick_Deploy/TensorFlow/export.py
@@ -0,0 +1,6 @@
+import tensorflow as tf
+from tensorflow.keras.applications.resnet50 import ResNet50
+
+# Load model0
+model = ResNet50(weights='imagenet')
+model.save('resnet50_saved_model') 
diff --git a/README.md b/README.md
@@ -5,6 +5,13 @@ For users experiencing the "Tensor in" & "Tensor out" approach to Deep Learning
 | [Conceptual Guide](Conceptual_Guide/README.md) | [Quick Overview](https://www.youtube.com/watch?v=NQDtfSi5QF4) | [Documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html) | [Help me deploy](Quick_Deploy/README.md) |
 | ------------ | --------------- | --------------- | ------------ |
 
+## Quick Deploy
+
+The focus of these examples is to demonstrate deployment for models trained with various frameworks. These are quick demonstrations made with an understanding that the user is somewhat familiar with Triton. **It is highly recommended to review [Part 1 of the Conceptual Guide](Conceptual_Guide/Part_1-model_deployment/README.md) for a complete understanding**.
+
+| Deploy a | [PyTorch Model](./Quick_Deploy/PyTorch/README.md) | [TensorFlow Model](./Quick_Deploy/TensorFlow/README.md) | [ONNX Model]() | [Custom Scripts]() | [TensorRT Accelerated Model](https://github.com/NVIDIA/TensorRT/tree/main/quickstart/deploy_to_triton) |
+| ------------ | --------------- | ------------ | --------------- | --------------- | --------------- |
+
 ## Navigating Triton Inference Server Resources
 
 The Triton Inference Server GitHub organization contains multiple repositories housing different features of the Triton Inference Server. The following is not a complete description of all the repositories, but just a simple guide to build intuitive understanding.

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPModel.forward.example`