SageMaker single-model endpoint returns 500 for empty-body POST to /invocations (should be 400)
When running Triton in SageMaker single-model mode (--allow-sagemaker=true), an empty-body POST to /invocations returns HTTP 500 instead of 400.
This happens when a client ECS service boots up and its SageMakerRuntimeClient sends an empty or null body to /invocations during initialization. The 500 inflates server error metrics and can trigger false alarms in production monitoring.
Both null byte (\x00, fails "at 0") and whitespace-only (" ", fails "at 1") bodies trigger the same 500 response.
Root Cause
In single-model mode, /invocations routes through:
SagemakerAPIServer::Handle() → parent HTTPAPIServer::HandleInfer() → EVRequestToJsonImpl()
When the body is empty, EVRequestToJsonImpl is called with allows_empty_body=false (hardcoded in EVRequestToJson). The empty buffer falls through to RapidJSON's Parse() which fails with "The document is empty". This error is created with TRITONSERVER_ERROR_INTERNAL, and HttpCodeFromError() maps TRITONSERVER_ERROR_INTERNAL → EVHTP_RES_SERVERR (500).
In contrast, the MME path (SageMakerMMEHandleInfer in sagemaker_server.cc) uses the HTTP_RESPOND_IF_ERR macro which hardcodes EVHTP_RES_BADREQ (400) for all parse errors. So the same empty-body request returns 400 in MME mode but 500 in single-model mode.
Relevant Code
src/http_server.cc — EVRequestToJsonImpl():
// When n == 0 (empty body) and allows_empty_body == false,
// the function falls through to json_buffer.Parse() which fails
// with TRITONSERVER_ERROR_INTERNAL ("The document is empty")
src/http_server.cc — HttpCodeFromError():
case TRITONSERVER_ERROR_INTERNAL:
return EVHTP_RES_SERVERR; // 500
src/sagemaker_server.cc — HTTP_RESPOND_IF_ERR (MME path):
#define HTTP_RESPOND_IF_ERR(REQ, X) \
do { \
TRITONSERVER_Error* err__ = (X); \
if (err__ != nullptr) { \
EVBufferAddErrorJson((REQ)->buffer_out, err__); \
evhtp_send_reply((REQ), EVHTP_RES_BADREQ); \ // 400 — correct
...
Suggested Fix
Option 1 (minimal): In EVRequestToJsonImpl, when n == 0 and allows_empty_body == false, return TRITONSERVER_ERROR_INVALID_ARG instead of letting it fall through to the JSON parser. INVALID_ARG maps to 400 via HttpCodeFromError.
Option 2 (better): In SagemakerAPIServer::Handle(), add an early return for empty-body POST to /invocations:
if (RE2::FullMatch(std::string(req->uri->path->full), invocations_regex_)) {
if (evbuffer_get_length(req->buffer_in) == 0) {
evhtp_send_reply(req, EVHTP_RES_BADREQ); // 400, not 500
return;
}
// ... existing HandleInfer logic
}
Environment
- Triton container:
sagemaker-tritonserver:25.04-py3
- Mode: SageMaker single-model (
--allow-sagemaker=true --allow-http=false)
- Deployment: SageMaker real-time endpoint
Steps to Reproduce
Via SageMaker Python SDK (confirmed on a live endpoint):
import boto3
client = boto3.client('sagemaker-runtime', region_name='us-west-2')
# Null byte body — triggers "at 0"
resp = client.invoke_endpoint(
EndpointName='<your-triton-endpoint>',
ContentType='application/json',
Body=b'\x00'
)
# Returns: ModelError HTTP 424
# Message: Received server error (500) from primary with message
# {"error":"failed to parse the request JSON buffer: The document is empty. at 0"}
# Whitespace body — triggers "at 1"
resp = client.invoke_endpoint(
EndpointName='<your-triton-endpoint>',
ContentType='application/json',
Body=b' '
)
# Returns: ModelError HTTP 424
# Message: Received server error (500) from primary with message
# {"error":"failed to parse the request JSON buffer: The document is empty. at 1"}
Or via curl directly against the container:
# Start Triton in SageMaker mode with any model
docker run --rm -p 8080:8080 \
763104351884.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tritonserver:25.04-py3 \
tritonserver --allow-sagemaker=true --model-repository=/opt/ml/model
# Null body
curl -v -X POST http://localhost:8080/invocations --data-binary $'\x00'
# Actual: HTTP 500 {"error":"failed to parse the request JSON buffer: The document is empty. at 0"}
# Expected: HTTP 400
SageMaker single-model endpoint returns 500 for empty-body POST to /invocations (should be 400)
When running Triton in SageMaker single-model mode (
--allow-sagemaker=true), an empty-body POST to/invocationsreturns HTTP 500 instead of 400.This happens when a client ECS service boots up and its
SageMakerRuntimeClientsends an empty or null body to/invocationsduring initialization. The 500 inflates server error metrics and can trigger false alarms in production monitoring.Both null byte (
\x00, fails "at 0") and whitespace-only (" ", fails "at 1") bodies trigger the same 500 response.Root Cause
In single-model mode,
/invocationsroutes through:SagemakerAPIServer::Handle()→ parentHTTPAPIServer::HandleInfer()→EVRequestToJsonImpl()When the body is empty,
EVRequestToJsonImplis called withallows_empty_body=false(hardcoded inEVRequestToJson). The empty buffer falls through to RapidJSON'sParse()which fails with "The document is empty". This error is created withTRITONSERVER_ERROR_INTERNAL, andHttpCodeFromError()mapsTRITONSERVER_ERROR_INTERNAL→EVHTP_RES_SERVERR(500).In contrast, the MME path (
SageMakerMMEHandleInferinsagemaker_server.cc) uses theHTTP_RESPOND_IF_ERRmacro which hardcodesEVHTP_RES_BADREQ(400) for all parse errors. So the same empty-body request returns 400 in MME mode but 500 in single-model mode.Relevant Code
src/http_server.cc—EVRequestToJsonImpl():src/http_server.cc—HttpCodeFromError():src/sagemaker_server.cc—HTTP_RESPOND_IF_ERR(MME path):Suggested Fix
Option 1 (minimal): In
EVRequestToJsonImpl, whenn == 0andallows_empty_body == false, returnTRITONSERVER_ERROR_INVALID_ARGinstead of letting it fall through to the JSON parser.INVALID_ARGmaps to 400 viaHttpCodeFromError.Option 2 (better): In
SagemakerAPIServer::Handle(), add an early return for empty-body POST to/invocations:Environment
sagemaker-tritonserver:25.04-py3--allow-sagemaker=true --allow-http=false)Steps to Reproduce
Via SageMaker Python SDK (confirmed on a live endpoint):
Or via curl directly against the container: