Skip to content

SageMaker single-model endpoint returns 500 for empty-body POST to /invocations (should be 400) #8762

@geeky-cheekly

Description

@geeky-cheekly

SageMaker single-model endpoint returns 500 for empty-body POST to /invocations (should be 400)

When running Triton in SageMaker single-model mode (--allow-sagemaker=true), an empty-body POST to /invocations returns HTTP 500 instead of 400.

This happens when a client ECS service boots up and its SageMakerRuntimeClient sends an empty or null body to /invocations during initialization. The 500 inflates server error metrics and can trigger false alarms in production monitoring.

Both null byte (\x00, fails "at 0") and whitespace-only (" ", fails "at 1") bodies trigger the same 500 response.

Root Cause

In single-model mode, /invocations routes through:

SagemakerAPIServer::Handle() → parent HTTPAPIServer::HandleInfer()EVRequestToJsonImpl()

When the body is empty, EVRequestToJsonImpl is called with allows_empty_body=false (hardcoded in EVRequestToJson). The empty buffer falls through to RapidJSON's Parse() which fails with "The document is empty". This error is created with TRITONSERVER_ERROR_INTERNAL, and HttpCodeFromError() maps TRITONSERVER_ERROR_INTERNALEVHTP_RES_SERVERR (500).

In contrast, the MME path (SageMakerMMEHandleInfer in sagemaker_server.cc) uses the HTTP_RESPOND_IF_ERR macro which hardcodes EVHTP_RES_BADREQ (400) for all parse errors. So the same empty-body request returns 400 in MME mode but 500 in single-model mode.

Relevant Code

src/http_server.ccEVRequestToJsonImpl():

// When n == 0 (empty body) and allows_empty_body == false,
// the function falls through to json_buffer.Parse() which fails
// with TRITONSERVER_ERROR_INTERNAL ("The document is empty")

src/http_server.ccHttpCodeFromError():

case TRITONSERVER_ERROR_INTERNAL:
    return EVHTP_RES_SERVERR;  // 500

src/sagemaker_server.ccHTTP_RESPOND_IF_ERR (MME path):

#define HTTP_RESPOND_IF_ERR(REQ, X)                   \
  do {                                                \
    TRITONSERVER_Error* err__ = (X);                  \
    if (err__ != nullptr) {                           \
      EVBufferAddErrorJson((REQ)->buffer_out, err__); \
      evhtp_send_reply((REQ), EVHTP_RES_BADREQ);     \  // 400 — correct
      ...

Suggested Fix

Option 1 (minimal): In EVRequestToJsonImpl, when n == 0 and allows_empty_body == false, return TRITONSERVER_ERROR_INVALID_ARG instead of letting it fall through to the JSON parser. INVALID_ARG maps to 400 via HttpCodeFromError.

Option 2 (better): In SagemakerAPIServer::Handle(), add an early return for empty-body POST to /invocations:

if (RE2::FullMatch(std::string(req->uri->path->full), invocations_regex_)) {
    if (evbuffer_get_length(req->buffer_in) == 0) {
        evhtp_send_reply(req, EVHTP_RES_BADREQ);  // 400, not 500
        return;
    }
    // ... existing HandleInfer logic
}

Environment

  • Triton container: sagemaker-tritonserver:25.04-py3
  • Mode: SageMaker single-model (--allow-sagemaker=true --allow-http=false)
  • Deployment: SageMaker real-time endpoint

Steps to Reproduce

Via SageMaker Python SDK (confirmed on a live endpoint):

import boto3
client = boto3.client('sagemaker-runtime', region_name='us-west-2')

# Null byte body — triggers "at 0"
resp = client.invoke_endpoint(
    EndpointName='<your-triton-endpoint>',
    ContentType='application/json',
    Body=b'\x00'
)
# Returns: ModelError HTTP 424
# Message: Received server error (500) from primary with message
#   {"error":"failed to parse the request JSON buffer: The document is empty. at 0"}

# Whitespace body — triggers "at 1"
resp = client.invoke_endpoint(
    EndpointName='<your-triton-endpoint>',
    ContentType='application/json',
    Body=b' '
)
# Returns: ModelError HTTP 424
# Message: Received server error (500) from primary with message
#   {"error":"failed to parse the request JSON buffer: The document is empty. at 1"}

Or via curl directly against the container:

# Start Triton in SageMaker mode with any model
docker run --rm -p 8080:8080 \
    763104351884.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tritonserver:25.04-py3 \
    tritonserver --allow-sagemaker=true --model-repository=/opt/ml/model

# Null body
curl -v -X POST http://localhost:8080/invocations --data-binary $'\x00'
# Actual: HTTP 500 {"error":"failed to parse the request JSON buffer: The document is empty. at 0"}
# Expected: HTTP 400

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions