Skip to content

fix(build): fix tritonfrontend wheel tags, auditwheel, venv consolidation#8752

Draft
mc-nv wants to merge 19 commits intomainfrom
mchornyi/TRI-983-tritonfrontend-platform-tag
Draft

fix(build): fix tritonfrontend wheel tags, auditwheel, venv consolidation#8752
mc-nv wants to merge 19 commits intomainfrom
mchornyi/TRI-983-tritonfrontend-platform-tag

Conversation

@mc-nv
Copy link
Copy Markdown
Contributor

@mc-nv mc-nv commented Apr 21, 2026

What does the PR do?

Fixes the tritonfrontend Python wheel platform tagging, consolidates all tool installs (cmake, patchelf) into a single /opt/venv-tritonserver virtualenv, and adds PYPI_RELEASE support.

Changes:

  • src/python/build_wheel.py: Add PYPI_RELEASE env-var support. Use CI_PIPELINE_ID as PEP 427 build tag. Pass --plat-name to setup.py bdist_wheel for correct linux_x86_64/linux_aarch64 tag. Run auditwheel repair to upgrade to manylinux_2_28_*.
  • build.py: Propagate CI_PIPELINE_ID, NVIDIA_UPSTREAM_VERSION, and PYPI_RELEASE into Docker container via -e. Replace ln -s with ln -sf for patchelf symlink (fixes RHEL "File exists" failure). Consolidate patchelf and cmake into /opt/venv-tritonserver.
  • Dockerfile.QA: Replace bare pip3 install (blocked by PEP 668 on Ubuntu Noble) with /opt/venv-tritonserver venv. ENV PATH="/opt/venv-tritonserver/bin:${PATH}" in both cibase and main stages.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging.
  • All template sections are filled out.

Commit Type:

  • fix
  • build

Related PRs:

triton-inference-server/core#494
dl/dgx/tritonserver!1745

Where should the reviewer start?

src/python/build_wheel.py for wheel-tag fixes; Dockerfile.QA lines 66–72 and 350–382 for venv consolidation; build.py for env propagation and patchelf.

Test plan:

  • CI Pipeline ID: 49163608
CI (internal): [#49163608](http://tritonserver.local/ci/pipelines/49163608)

Caveats:

  • auditwheel may refuse to bundle CUDA libraries if on its exclusion list — wheel keeps linux_x86_64 tag.
  • PYPI_RELEASE=true strips local-version suffix; wheels not yet on PyPI.

Background

Dockerfile.QA pip3 install cmake==4.0.3 was blocked by PEP 668 on Ubuntu Noble. The patchelf ln -s failed with "File exists" on RHEL.

Related Issues:

  • Resolves: TRI-983

mc-nv added 3 commits April 21, 2026 13:26
The tritonfrontend wheel ships an arch-specific CPython extension
(tritonfrontend/_c/<pybind>.so) but is produced with the default
"none-any" platform tag, which violates PEP 425 and breaks
hash-locked package managers (Poetry, pip-tools, uv) that see two
wheels with the same filename but different SHA256 across arches.

setup.py already honors a --plat-name flag and sets
root_is_pure = False, but build_wheel.py never passed one. Derive the
platform via sysconfig.get_platform() and forward it so the wheel is
tagged e.g. linux_x86_64 / linux_aarch64.

Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Raw linux_x86_64 / linux_aarch64 wheels are not accepted by canonical
PyPI — the platform tag must be manylinux_2_X_<arch>. Port the pattern
established for tritonclient in TRI-286: after bdist_wheel emits a
linux_<arch> wheel, run `auditwheel repair` to auto-discover the
minimum manylinux tag from the embedded .so's glibc symbol
dependencies, with a `python -m wheel tags --platform-tag
manylinux_2_28_<arch>` fallback for the "no ELF" pure-Python case
(documented in TRI-286 follow-up).

When auditwheel is not available on PATH (e.g. local non-container
builds), keep the linux_<arch> wheel and log a warning so builds do
not regress; the Poetry / pip-tools lock-file problem is already
solved by the distinct filename.

Also install `auditwheel` in the buildbase stage via build.py so the
container build image has the tool the wheel script expects.

Leaves a NOTE in setup.py.get_tag: the embedded binding .so is
CPython-ABI-specific, so the wheel will need cp<XY>-cp<XY> python+abi
tags once consumers are ready to gate installs on the exact
interpreter version.

Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983, TRI-286
Adopt PEP 427's optional build-tag slot so two wheels of the same
version (e.g. successive reruns of a CI pipeline) can coexist in the
same index without filename collision. Preferred source is GitLab's
CI_PIPELINE_ID with a BUILD_NUMBER fallback for other CI systems;
both are guaranteed to start with a digit as required by PEP 427.

Matches the build-tag slot already used by the RHEL .zip artifact
naming convention in .gitlab-ci.yml. Build-arg handoff through
build.py is a separate follow-up; this change is a no-op in local
non-CI builds since neither env var is set.

Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
mc-nv added 16 commits April 21, 2026 14:49
…v (TRI-983)

Three related fixes to make the tritonfrontend wheel filename match
the existing RHEL zip artifact convention and match the tritonserver
wheel's auto-derivation behaviour:

1. Remove the get_tag override in src/python/setup.py so setuptools
   derives cp<XY>-cp<XY>-<plat> automatically from root_is_pure=False.
   The override was hard-coding "py3-none-<PLATFORM_FLAG>", which
   under-specifies the wheel: the embedded binding .so is
   CPython-ABI-specific and fails to load on any other interpreter.
   Drop the stale --plat-name argv parsing along with it — bdist_wheel's
   stock finalize_options already picks up the flag forwarded by
   build_wheel.py.

2. Compose a PEP 440 local-version segment in build_wheel.py via a new
   _compose_version() helper. Appends "+nv<NVIDIA_UPSTREAM_VERSION>"
   and ".cu<MAJORMINOR>" when the corresponding env vars are set, so
   the wheel filename carries the same nv<X>.cu<Y> identifiers already
   used by the RHEL .zip artifact naming in .gitlab-ci.yml.

3. Propagate the wheel-naming env vars from the host into the build
   container via "-e NAME" on the docker-run invocation in build.py.
   CI_PIPELINE_ID and BUILD_NUMBER feed the PEP 427 build-tag slot;
   NVIDIA_UPSTREAM_VERSION and CUDA_VERSION feed the local-version
   segment.

Also adds wheel, setuptools, and auditwheel to the Ubuntu buildbase
pip install list (they were missing from the non-RHEL path, which is
why the first pipeline produced linux_<arch> instead of the expected
manylinux_2_28_<arch> tag).

Expected wheel filename under full CI with auditwheel present:
  tritonfrontend-<TRITON_VERSION>+nv<NV>.cu<CUDA>-<CI_PIPELINE_ID>-cp<XY>-cp<XY>-manylinux_2_28_<arch>.whl

Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983, TRI-286
…TRI-983)

Two refinements on the env-propagation path for wheel naming:

1. Use NVIDIA_BUILD_ID (from --build-id) instead of a separate
   CI_PIPELINE_ID / BUILD_NUMBER env var. .gitlab-ci.yml already
   passes `--build-id=${CI_JOB_ID}` to build.py per the existing
   Triton convention, so the wheel build-tag slot now aligns with the
   same identifier used elsewhere in the Triton build system instead
   of introducing a parallel env var. build.py forwards FLAGS.build_id
   into the build container via `-e NVIDIA_BUILD_ID=<value>` only
   when --build-id was supplied; build_wheel.py skips the build-tag
   slot when NVIDIA_BUILD_ID is unset or non-numeric ("<unknown>"
   default, non-digit leading char) to satisfy PEP 427.

2. Stop propagating CUDA_VERSION via docker-run. The CUDA base image
   already exports CUDA_VERSION as an ENV inside the container, while
   the host / CI runner does not — `-e CUDA_VERSION` with an empty
   host value would override (and erase) the container value.
   build_wheel.py now reads CUDA_VERSION from the container-local env
   with a /usr/local/cuda/version.json fallback (canonical location
   for the installed toolkit).

Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
… (TRI-983)

Switch the NVIDIA_UPSTREAM_VERSION passthrough to an explicit
-e NAME=VALUE form sourced from FLAGS.upstream_container_version,
matching the NVIDIA_BUILD_ID pattern introduced in the previous
commit. The value is well-defined in both CI (.gitlab-ci.yml passes
--upstream-container-version=${NVIDIA_UPSTREAM_VERSION}) and local
builds (falls back to DEFAULT_TRITON_VERSION_MAP's upstream default),
so the wheel's +nv<X> local-version segment is never empty by
accident. The previous "-e NAME" (inherit-from-host) form would have
propagated an empty string in local builds where the env var is not
exported, erasing any value set inside the container.

Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Replace the distro-managed system Python + PIP_BREAK_SYSTEM_PACKAGES=1
escape hatch with a dedicated virtualenv at /opt/venv-tritonserver in
each Dockerfile stage. Subsequent RUN steps pick up the venv's pip /
python / cmake / auditwheel / etc. via PATH without further wiring.

Stages converted:
- RHEL buildbase (create_dockerfile_buildbase_rhel): venv created
  after change_default_python_version_rhel so the venv inherits the
  pyenv-installed interpreter rather than the distro system Python.
- Ubuntu buildbase (create_dockerfile_buildbase): venv created after
  apt-get installs python3-related packages; pybind11[global] is kept
  only in this stage (it is only needed during wheel builds).
- Linux runtime RHEL python-backend branch (dockerfile_prepare_container_linux):
  venv created after pyenv, same pattern as RHEL buildbase.
- Linux runtime Ubuntu python-backend branch: split the combined
  apt-get + pip install into apt-get (adding python3-venv), venv
  creation, pip install.

PIP_BREAK_SYSTEM_PACKAGES=1 is retained only in the production-stage
Dockerfile (create_dockerfile_linux) where an early `pip3 install
patchelf==0.17.2` runs in the RHEL common-deps section before the
venv is created; removing it would break that install. The cibase
Dockerfile (create_dockerfile_cibase) inherits the buildbase image
and therefore the buildbase venv via PATH, so PIP_BREAK_SYSTEM_PACKAGES
is dropped there.

Windows build is left untouched (uses python3 -m pip directly; no
venv conversion needed for the Windows pip site).
… venv

Finishes the migration from system-Python + PIP_BREAK_SYSTEM_PACKAGES
to a clean venv model by addressing the two remaining pip3 installs
that previously ran against the distro system Python:

* create_dockerfile_linux (production stage): drop the
  `ENV PIP_BREAK_SYSTEM_PACKAGES=1` that was only needed as an escape
  hatch for the two patchelf installs handled below.
* dockerfile_prepare_container_linux / RHEL common deps: replace
  `pip3 install patchelf==0.17.2` with an ephemeral venv at
  /tmp/patchelf-venv, install patchelf into it, copy the binary to
  /usr/local/bin/patchelf, then remove the venv. Runs before the
  python-backend branch (where pyenv may later recreate the main
  /opt/venv-tritonserver venv), so this pattern survives the venv
  swap without needing to reinstall patchelf.
* add_cpu_libs_to_linux_dockerfile / pytorch CPU path: same
  ephemeral-venv pattern. The Ubuntu apt install now also pulls in
  python3-venv so `python3 -m venv` works in the runtime image.

The main /opt/venv-tritonserver venv is still only created inside the
`if "python" in backends:` branches, because the pyenv interpreter is
only required when Triton's python backend is built. The ephemeral
patchelf venv is independent of that gating and always runs when the
corresponding platform branch (RHEL common deps / CPU+pytorch) fires.
Mirror the tritonserver setup.py fix: replace the ineffective
bdist_wheel.root_is_pure=False override with a Distribution subclass
whose has_ext_modules() returns True. Modern setuptools (>=70) ignores
overrides registered against wheel.bdist_wheel, so Root-Is-Purelib
stayed "true" in the WHEEL metadata despite our override — causing
auditwheel to reject the repair on the paired tritonserver wheel path
and the same issue is imminent for tritonfrontend. has_ext_modules()
is the canonical setuptools hook for declaring a wheel as binary.

Drop the --plat-name argv parsing (superseded by setuptools's auto
platform derivation) and the wheel.bdist_wheel override block.
build_wheel.py still forwards --plat-name via bdist_wheel's stock
flag, which is honored by setuptools's own bdist_wheel command.

Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Four related refinements on top of the existing venv-based pip model:

1. create_dockerfile_linux production stage: add an unconditional
   `python3 -m venv /opt/venv-tritonserver` + PATH block right after
   `dockerfile_prepare_container_linux()` returns. The subsequent
   wheel installs (`pip install tritonserver-*.whl` /
   `tritonfrontend-*.whl`) and `pip install -r openai/requirements.txt`
   now use the venv's pip, removing the last place that relied on the
   legacy PIP_BREAK_SYSTEM_PACKAGES=1 escape hatch. Derived images
   (Dockerfile.QA) inherit the venv transparently via PATH — no
   changes needed there. Re-running `python3 -m venv` when the
   python-backend branch already created the venv is a safe no-op.

2. dockerfile_prepare_container_linux Ubuntu common-deps: add
   python3-venv to the apt install list so `python3 -m venv` works on
   minimal builds that omit the python backend (where python3-venv
   was previously only added inside the python-backend-specific
   branch).

3. Both patchelf install blocks (RHEL common-deps + CPU-only pytorch
   path): switch from cp-and-discard (/tmp/patchelf-venv) to
   symlink-from-persistent-venv (/opt/patchelf-venv). Makes future
   patchelf upgrades idempotent (`pip install -U patchelf` in the
   venv, symlink already points at the right place), and avoids the
   recreate-venv-just-to-copy-a-binary dance.

4. Change `pip3 install -r python/openai/requirements.txt` to
   `pip install` to match the rest of the venv-aware invocations in
   the same stage (both resolve to /opt/venv-tritonserver/bin/pip
   via PATH, but the style is now consistent).

Refs: Linear TRI-983
Wheels produced in pipeline 49141836 came out as
`tritonserver-2.69.0.dev0-<build>-cp312-cp312-linux_x86_64.whl` — no
`+nv26.04.cu132` local-version segment. The missing `+nv<X>` is
because the docker-run `-e NVIDIA_UPSTREAM_VERSION=<value>` path is
fragile: if build.py's `FLAGS.upstream_container_version` evaluates
to empty (e.g. `--upstream-container-version=` passed with no RHS)
the `-e` arg is skipped and the container-local env var is unset
when `python3 build_wheel.py` runs.

Two-pronged hardening:

1. `_compose_version()` now consults three env sources, first
   non-empty wins:
     - NVIDIA_UPSTREAM_VERSION (docker-run -e; fragile)
     - NVIDIA_TRITON_SERVER_VERSION (ENV in the buildbase image via
       the `TRITON_CONTAINER_VERSION` ARG -> ENV wiring; baked in at
       image-build time, survives even when -e forwarding fails)
     - TRITON_CONTAINER_VERSION (same value in CI, extra safety)
   In CI all three carry the same value so the effective output is
   unchanged when things work; when the -e hop fails, the image-ENV
   fallback keeps the `+nv<X>` suffix intact.

2. Diagnostic logging:
     - build.py logs wheel-naming inputs (build-id and
       upstream-container-version) alongside the existing container-
       version log line, so a missing value is obvious on the host
       before docker run even starts.
     - _compose_version() prints the env-var dict it saw plus the
       resolved nv/cuda tuple to stderr. Any future gap in the chain
       now surfaces in the wheel-build log rather than silently
       losing a version suffix.

Refs: Linear TRI-983
Two changes so the produced wheel carries a deterministic PEP 427
build tag matching the GitLab pipeline convention, with parity
between the tritonfrontend (this repo) and tritonserver (core repo)
wheels:

1. build.py's docker-run invocation now forwards CI_PIPELINE_ID
   from the host env into the build container via `-e`. Falls back
   to the host NVIDIA_UPSTREAM_VERSION env var when the CLI flag
   --upstream-container-version is empty.

2. build_wheel.py's build-tag resolution now prefers CI_PIPELINE_ID
   over NVIDIA_BUILD_ID (falls back further to BUILD_NUMBER for
   generic CI systems). CI_PIPELINE_ID is pipeline-scoped, matches
   the identifier used in the RHEL .zip artifact naming convention,
   and keeps all wheels in a pipeline sharing one build tag.

Also filters the "<unknown>" default build.py emits for local
builds without --build-id and adds a stderr diagnostic so any gap
in build-tag propagation is self-announcing.

Refs: Linear TRI-983
- build.py now propagates CI_JOB_ID (preferred build tag) and
  PYPI_RELEASE into the build container via docker run -e; removes
  CI_PIPELINE_ID propagation (replaced by CI_JOB_ID).
- build_wheel.py build tag source updated: CI_JOB_ID -> NVIDIA_BUILD_ID
  -> BUILD_NUMBER.
- _compose_version() returns bare base_version when PYPI_RELEASE=true,
  stripping the +nv<X>.cu<Y> local segment that PyPI rejects on upload.
All wheels from one pipeline share the same build tag so tritonserver
and tritonfrontend filenames stay consistent. CI should pass
--build-id=${CI_PIPELINE_ID} to build.py.
PyPI release wheels must have no build-tag segment so the filename is
the canonical bare form:
  tritonfrontend-2.69.0-cp312-cp312-manylinux_2_28_x86_64.whl
The RHEL base image already ships patchelf at /usr/local/bin/patchelf,
so ln -s fails with 'File exists'. Use -f to force-replace it with
the venv-managed binary.
Replace bare pip3 install and per-tool dedicated venvs (patchelf-venv,
cmake-venv) with a single /opt/venv-tritonserver whose bin/ is added to
PATH via ENV. This fixes PEP 668 externally-managed-environment errors on
Ubuntu Noble base images and provides a consistent install pattern across
Dockerfile.QA (cibase + main stages) and build.py (RHEL and cpu-pytorch
final-image stages).
@mc-nv mc-nv changed the title fix: tag tritonfrontend wheel with arch, platform, auditwheel, and build number (TRI-983) fix(build): fix tritonfrontend wheel tags, auditwheel, venv consolidation Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant