Skip to content

Commit d2b8a83

Browse files
Merge pull request #4 from triton-inference-server/tvarshney_quick_deploy
added Torch, TF and TRT quick deploy
2 parents 6e645e8 + 0bb48c8 commit d2b8a83

11 files changed

Lines changed: 269 additions & 4 deletions

File tree

HuggingFace/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPModel.forward.example

Quick_Deploy/PyTorch/README.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Deploying a PyTorch Model
2+
3+
This README showcases how to deploy a simple ResNet model on Triton Inference Server.
4+
5+
## Step 1: Export the model
6+
7+
Save the PyTorch model.
8+
9+
```
10+
# <xx.xx> is the yy:mm for the publishing tag for NVIDIA's PyTorch
11+
# container; eg. 22.04
12+
13+
docker run -it --gpus all -v ${PWD}:/workspace nvcr.io/nvidia/pytorch:<xx.xx>-py3
14+
python export.py
15+
```
16+
17+
## Step 2: Set Up Triton Inference Server
18+
19+
To use Triton, we need to build a model repository. The structure of the repository as follows:
20+
```
21+
model_repository
22+
|
23+
+-- resnet50
24+
|
25+
+-- config.pbtxt
26+
+-- 1
27+
|
28+
+-- model.pt
29+
```
30+
31+
A sample model configuration of the model is included with this demo as `config.pbtxt`. If you are new to Triton, it is highly recommended to [review Part 1](../../Conceptual_Guide/Part_1-model_deployment/README.md) of the conceptual guide.
32+
```
33+
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models
34+
```
35+
36+
## Step 3: Using a Triton Client to Query the Server
37+
38+
Install dependencies & download an example image to test inference.
39+
40+
```
41+
pip install torchvision
42+
pip install attrdict
43+
pip install nvidia-pyindex
44+
pip install tritonclient[all]
45+
46+
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
47+
```
48+
Building a client requires three basic points. Firstly, we setup a connection with the Triton Inference Server.
49+
```
50+
client = httpclient.InferenceServerClient(url="localhost:8000")
51+
```
52+
Secondly, we specify the names of the input and output layer(s) of our model.
53+
```
54+
inputs = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
55+
inputs.set_data_from_numpy(transformed_img, binary_data=True)
56+
57+
outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)
58+
```
59+
Lastly, we send an inference request to the Triton Inference Server.
60+
```
61+
# Querying the server
62+
results = client.infer(model_name="resnet50", inputs=[inputs], outputs=[outputs])
63+
predictions = results.as_numpy('output__0')
64+
print(predictions[:5])
65+
```
66+
The output of the same should look like below:
67+
```
68+
[b'12.468750:90' b'11.523438:92' b'9.664062:14' b'8.429688:136'
69+
b'8.234375:11']
70+
```
71+
The output format here is `<confidence_score>:<classification_index>`. To learn how to map these to the label names and more, refer to our [documentation](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_classification.md). The client code above is available in `client.py`.

Quick_Deploy/PyTorch/client.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
import numpy as np
2+
from torchvision import transforms
3+
from PIL import Image
4+
import tritonclient.http as httpclient
5+
from tritonclient.utils import triton_to_np_dtype
6+
7+
# preprocessing function
8+
def rn50_preprocess(img_path="img1.jpg"):
9+
img = Image.open(img_path)
10+
preprocess = transforms.Compose([
11+
transforms.Resize(256),
12+
transforms.CenterCrop(224),
13+
transforms.ToTensor(),
14+
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
15+
])
16+
return preprocess(img).numpy()
17+
18+
transformed_img = rn50_preprocess()
19+
20+
# Setting up client
21+
client = httpclient.InferenceServerClient(url="localhost:8000")
22+
23+
inputs = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
24+
inputs.set_data_from_numpy(transformed_img, binary_data=True)
25+
26+
outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)
27+
28+
# Querying the server
29+
results = client.infer(model_name="resnet50", inputs=[inputs], outputs=[outputs])
30+
inference_output = results.as_numpy('output__0')
31+
print(inference_output[:5])

Quick_Deploy/PyTorch/config.pbtxt

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
name: "resnet50"
2+
platform: "pytorch_libtorch"
3+
max_batch_size : 0
4+
input [
5+
{
6+
name: "input__0"
7+
data_type: TYPE_FP32
8+
dims: [ 3, 224, 224 ]
9+
reshape { shape: [ 1, 3, 224, 224 ] }
10+
}
11+
]
12+
output [
13+
{
14+
name: "output__0"
15+
data_type: TYPE_FP32
16+
dims: [ 1, 1000 ,1, 1]
17+
reshape { shape: [ 1, 1000 ] }
18+
}
19+
]

Quick_Deploy/PyTorch/export.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import torch
2+
import torch_tensorrt
3+
torch.hub._validate_not_a_forked_repo=lambda a,b,c: True
4+
5+
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True).eval().to("cuda")
6+
torch.save(model, "model.pt")

Quick_Deploy/README.md

Lines changed: 0 additions & 4 deletions
This file was deleted.

Quick_Deploy/TensorFlow/README.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Deploying a TensorFlow Model
2+
3+
This README showcases how to deploy a simple ResNet model on Triton Inference Server.
4+
5+
## Step 1: Export the model
6+
7+
Export a TensorFlow model as a saved model.
8+
9+
```
10+
# <xx.xx> is the yy:mm for the publishing tag for NVIDIA's Tensorflow
11+
# container; eg. 22.04
12+
13+
docker run -it --gpus all -v ${PWD}:/workspace nvcr.io/nvidia/tensorflow:<xx.xx>-tf2-py3
14+
python export.py
15+
```
16+
17+
## Step 2: Set Up Triton Inference Server
18+
19+
To use Triton, we need to build a model repository. The structure of the repository as follows:
20+
```
21+
model_repository
22+
|
23+
+-- resnet50
24+
|
25+
+-- config.pbtxt
26+
+-- 1
27+
|
28+
+-- model.savedmodel
29+
|
30+
+-- saved_model.pb
31+
+-- variables
32+
|
33+
+-- variables.data-00000-of-00001
34+
+-- variables.index
35+
```
36+
37+
A sample model configuration of the model is included with this demo as `config.pbtxt`. If you are new to Triton, it is highly recommended to [review Part 1](../../Conceptual_Guide/Part_1-model_deployment/README.md) of the conceptual guide.
38+
```
39+
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models --backend-config=tensorflow,version=2
40+
```
41+
42+
## Step 3: Using a Triton Client to Query the Server
43+
44+
Install dependencies & download an example image to test inference.
45+
46+
```
47+
pip install --upgrade tensorflow
48+
pip install pillow
49+
pip install nvidia-pyindex
50+
pip install tritonclient[all]
51+
52+
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
53+
```
54+
Building a client requires three basic points. Firstly, we setup a connection with the Triton Inference Server.
55+
```
56+
triton_client = httpclient.InferenceServerClient(url="localhost:8000")
57+
```
58+
Secondly, we specify the names of the input and output layer(s) of our model.
59+
```
60+
inputs = httpclient.InferInput("input_1", transformed_img.shape, datatype="FP32")
61+
inputs.set_data_from_numpy(transformed_img, binary_data=True)
62+
63+
output = httpclient.InferRequestedOutput("predictions", binary_data=True, class_count=1000)
64+
```
65+
Lastly, we send an inference request to the Triton Inference Server.
66+
```
67+
# Querying the server
68+
results = triton_client.infer(model_name="resnet50", inputs=[inputs], outputs=[output])
69+
predictions = results.as_numpy('predictions')
70+
print(predictions)
71+
```
72+
The output of the same should look like below:
73+
```
74+
[b'0.301167:90' b'0.169790:14' b'0.161309:92' b'0.093105:94'
75+
b'0.058743:136' b'0.050185:11' b'0.033802:91' b'0.011760:88'
76+
b'0.008309:989' b'0.004927:95' b'0.004905:13' b'0.004095:317'
77+
b'0.004006:96' b'0.003694:12' b'0.003526:42' b'0.003390:313'
78+
...
79+
b'0.000001:751' b'0.000001:685' b'0.000001:408' b'0.000001:116'
80+
b'0.000001:627' b'0.000001:933' b'0.000000:661' b'0.000000:148']
81+
```
82+
The output format here is `<confidence_score>:<classification_index>`. To learn how to map these to the label names and more, refer to our [documentation](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_classification.md). The client code above is available in `client.py`.

Quick_Deploy/TensorFlow/client.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
from tensorflow.keras.preprocessing import image
2+
from tensorflow.keras.applications.resnet50 import preprocess_input
3+
import tensorflow as tf
4+
5+
import numpy as np
6+
import tritonclient.http as httpclient
7+
from tritonclient.utils import triton_to_np_dtype
8+
9+
def process_image(image_path="img1.jpg"):
10+
img = image.load_img(image_path, target_size=(224, 224))
11+
x = image.img_to_array(img)
12+
x = np.expand_dims(x, axis=0)
13+
return preprocess_input(x)
14+
15+
transformed_img = process_image()
16+
17+
# Setting up client
18+
triton_client = httpclient.InferenceServerClient(url="localhost:8000")
19+
20+
inputs = httpclient.InferInput("input_1", transformed_img.shape, datatype="FP32")
21+
inputs.set_data_from_numpy(transformed_img, binary_data=True)
22+
23+
output = httpclient.InferRequestedOutput("predictions", binary_data=True, class_count=1000)
24+
25+
# Querying the server
26+
results = triton_client.infer(model_name="resnet50", inputs=[inputs], outputs=[output])
27+
28+
predictions = results.as_numpy('predictions')
29+
print(predictions)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
name: "resnet50"
2+
platform: "tensorflow_savedmodel"
3+
max_batch_size : 0
4+
input [
5+
{
6+
name: "input_1"
7+
data_type: TYPE_FP32
8+
dims: [-1, 224, 224, 3 ]
9+
}
10+
]
11+
output [
12+
{
13+
name: "predictions"
14+
data_type: TYPE_FP32
15+
dims: [-1, 1000]
16+
}
17+
]

Quick_Deploy/TensorFlow/export.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import tensorflow as tf
2+
from tensorflow.keras.applications.resnet50 import ResNet50
3+
4+
# Load model0
5+
model = ResNet50(weights='imagenet')
6+
model.save('resnet50_saved_model')

0 commit comments

Comments
 (0)