fix(build): fix tritonfrontend wheel tags, auditwheel, venv consolidation#8752
Draft
fix(build): fix tritonfrontend wheel tags, auditwheel, venv consolidation#8752
Conversation
The tritonfrontend wheel ships an arch-specific CPython extension (tritonfrontend/_c/<pybind>.so) but is produced with the default "none-any" platform tag, which violates PEP 425 and breaks hash-locked package managers (Poetry, pip-tools, uv) that see two wheels with the same filename but different SHA256 across arches. setup.py already honors a --plat-name flag and sets root_is_pure = False, but build_wheel.py never passed one. Derive the platform via sysconfig.get_platform() and forward it so the wheel is tagged e.g. linux_x86_64 / linux_aarch64. Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Raw linux_x86_64 / linux_aarch64 wheels are not accepted by canonical PyPI — the platform tag must be manylinux_2_X_<arch>. Port the pattern established for tritonclient in TRI-286: after bdist_wheel emits a linux_<arch> wheel, run `auditwheel repair` to auto-discover the minimum manylinux tag from the embedded .so's glibc symbol dependencies, with a `python -m wheel tags --platform-tag manylinux_2_28_<arch>` fallback for the "no ELF" pure-Python case (documented in TRI-286 follow-up). When auditwheel is not available on PATH (e.g. local non-container builds), keep the linux_<arch> wheel and log a warning so builds do not regress; the Poetry / pip-tools lock-file problem is already solved by the distinct filename. Also install `auditwheel` in the buildbase stage via build.py so the container build image has the tool the wheel script expects. Leaves a NOTE in setup.py.get_tag: the embedded binding .so is CPython-ABI-specific, so the wheel will need cp<XY>-cp<XY> python+abi tags once consumers are ready to gate installs on the exact interpreter version. Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983, TRI-286
Adopt PEP 427's optional build-tag slot so two wheels of the same version (e.g. successive reruns of a CI pipeline) can coexist in the same index without filename collision. Preferred source is GitLab's CI_PIPELINE_ID with a BUILD_NUMBER fallback for other CI systems; both are guaranteed to start with a digit as required by PEP 427. Matches the build-tag slot already used by the RHEL .zip artifact naming convention in .gitlab-ci.yml. Build-arg handoff through build.py is a separate follow-up; this change is a no-op in local non-CI builds since neither env var is set. Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
10 tasks
…v (TRI-983) Three related fixes to make the tritonfrontend wheel filename match the existing RHEL zip artifact convention and match the tritonserver wheel's auto-derivation behaviour: 1. Remove the get_tag override in src/python/setup.py so setuptools derives cp<XY>-cp<XY>-<plat> automatically from root_is_pure=False. The override was hard-coding "py3-none-<PLATFORM_FLAG>", which under-specifies the wheel: the embedded binding .so is CPython-ABI-specific and fails to load on any other interpreter. Drop the stale --plat-name argv parsing along with it — bdist_wheel's stock finalize_options already picks up the flag forwarded by build_wheel.py. 2. Compose a PEP 440 local-version segment in build_wheel.py via a new _compose_version() helper. Appends "+nv<NVIDIA_UPSTREAM_VERSION>" and ".cu<MAJORMINOR>" when the corresponding env vars are set, so the wheel filename carries the same nv<X>.cu<Y> identifiers already used by the RHEL .zip artifact naming in .gitlab-ci.yml. 3. Propagate the wheel-naming env vars from the host into the build container via "-e NAME" on the docker-run invocation in build.py. CI_PIPELINE_ID and BUILD_NUMBER feed the PEP 427 build-tag slot; NVIDIA_UPSTREAM_VERSION and CUDA_VERSION feed the local-version segment. Also adds wheel, setuptools, and auditwheel to the Ubuntu buildbase pip install list (they were missing from the non-RHEL path, which is why the first pipeline produced linux_<arch> instead of the expected manylinux_2_28_<arch> tag). Expected wheel filename under full CI with auditwheel present: tritonfrontend-<TRITON_VERSION>+nv<NV>.cu<CUDA>-<CI_PIPELINE_ID>-cp<XY>-cp<XY>-manylinux_2_28_<arch>.whl Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983, TRI-286
…TRI-983)
Two refinements on the env-propagation path for wheel naming:
1. Use NVIDIA_BUILD_ID (from --build-id) instead of a separate
CI_PIPELINE_ID / BUILD_NUMBER env var. .gitlab-ci.yml already
passes `--build-id=${CI_JOB_ID}` to build.py per the existing
Triton convention, so the wheel build-tag slot now aligns with the
same identifier used elsewhere in the Triton build system instead
of introducing a parallel env var. build.py forwards FLAGS.build_id
into the build container via `-e NVIDIA_BUILD_ID=<value>` only
when --build-id was supplied; build_wheel.py skips the build-tag
slot when NVIDIA_BUILD_ID is unset or non-numeric ("<unknown>"
default, non-digit leading char) to satisfy PEP 427.
2. Stop propagating CUDA_VERSION via docker-run. The CUDA base image
already exports CUDA_VERSION as an ENV inside the container, while
the host / CI runner does not — `-e CUDA_VERSION` with an empty
host value would override (and erase) the container value.
build_wheel.py now reads CUDA_VERSION from the container-local env
with a /usr/local/cuda/version.json fallback (canonical location
for the installed toolkit).
Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
… (TRI-983)
Switch the NVIDIA_UPSTREAM_VERSION passthrough to an explicit
-e NAME=VALUE form sourced from FLAGS.upstream_container_version,
matching the NVIDIA_BUILD_ID pattern introduced in the previous
commit. The value is well-defined in both CI (.gitlab-ci.yml passes
--upstream-container-version=${NVIDIA_UPSTREAM_VERSION}) and local
builds (falls back to DEFAULT_TRITON_VERSION_MAP's upstream default),
so the wheel's +nv<X> local-version segment is never empty by
accident. The previous "-e NAME" (inherit-from-host) form would have
propagated an empty string in local builds where the env var is not
exported, erasing any value set inside the container.
Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Replace the distro-managed system Python + PIP_BREAK_SYSTEM_PACKAGES=1 escape hatch with a dedicated virtualenv at /opt/venv-tritonserver in each Dockerfile stage. Subsequent RUN steps pick up the venv's pip / python / cmake / auditwheel / etc. via PATH without further wiring. Stages converted: - RHEL buildbase (create_dockerfile_buildbase_rhel): venv created after change_default_python_version_rhel so the venv inherits the pyenv-installed interpreter rather than the distro system Python. - Ubuntu buildbase (create_dockerfile_buildbase): venv created after apt-get installs python3-related packages; pybind11[global] is kept only in this stage (it is only needed during wheel builds). - Linux runtime RHEL python-backend branch (dockerfile_prepare_container_linux): venv created after pyenv, same pattern as RHEL buildbase. - Linux runtime Ubuntu python-backend branch: split the combined apt-get + pip install into apt-get (adding python3-venv), venv creation, pip install. PIP_BREAK_SYSTEM_PACKAGES=1 is retained only in the production-stage Dockerfile (create_dockerfile_linux) where an early `pip3 install patchelf==0.17.2` runs in the RHEL common-deps section before the venv is created; removing it would break that install. The cibase Dockerfile (create_dockerfile_cibase) inherits the buildbase image and therefore the buildbase venv via PATH, so PIP_BREAK_SYSTEM_PACKAGES is dropped there. Windows build is left untouched (uses python3 -m pip directly; no venv conversion needed for the Windows pip site).
… venv Finishes the migration from system-Python + PIP_BREAK_SYSTEM_PACKAGES to a clean venv model by addressing the two remaining pip3 installs that previously ran against the distro system Python: * create_dockerfile_linux (production stage): drop the `ENV PIP_BREAK_SYSTEM_PACKAGES=1` that was only needed as an escape hatch for the two patchelf installs handled below. * dockerfile_prepare_container_linux / RHEL common deps: replace `pip3 install patchelf==0.17.2` with an ephemeral venv at /tmp/patchelf-venv, install patchelf into it, copy the binary to /usr/local/bin/patchelf, then remove the venv. Runs before the python-backend branch (where pyenv may later recreate the main /opt/venv-tritonserver venv), so this pattern survives the venv swap without needing to reinstall patchelf. * add_cpu_libs_to_linux_dockerfile / pytorch CPU path: same ephemeral-venv pattern. The Ubuntu apt install now also pulls in python3-venv so `python3 -m venv` works in the runtime image. The main /opt/venv-tritonserver venv is still only created inside the `if "python" in backends:` branches, because the pyenv interpreter is only required when Triton's python backend is built. The ephemeral patchelf venv is independent of that gating and always runs when the corresponding platform branch (RHEL common deps / CPU+pytorch) fires.
Mirror the tritonserver setup.py fix: replace the ineffective bdist_wheel.root_is_pure=False override with a Distribution subclass whose has_ext_modules() returns True. Modern setuptools (>=70) ignores overrides registered against wheel.bdist_wheel, so Root-Is-Purelib stayed "true" in the WHEEL metadata despite our override — causing auditwheel to reject the repair on the paired tritonserver wheel path and the same issue is imminent for tritonfrontend. has_ext_modules() is the canonical setuptools hook for declaring a wheel as binary. Drop the --plat-name argv parsing (superseded by setuptools's auto platform derivation) and the wheel.bdist_wheel override block. build_wheel.py still forwards --plat-name via bdist_wheel's stock flag, which is honored by setuptools's own bdist_wheel command. Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Four related refinements on top of the existing venv-based pip model: 1. create_dockerfile_linux production stage: add an unconditional `python3 -m venv /opt/venv-tritonserver` + PATH block right after `dockerfile_prepare_container_linux()` returns. The subsequent wheel installs (`pip install tritonserver-*.whl` / `tritonfrontend-*.whl`) and `pip install -r openai/requirements.txt` now use the venv's pip, removing the last place that relied on the legacy PIP_BREAK_SYSTEM_PACKAGES=1 escape hatch. Derived images (Dockerfile.QA) inherit the venv transparently via PATH — no changes needed there. Re-running `python3 -m venv` when the python-backend branch already created the venv is a safe no-op. 2. dockerfile_prepare_container_linux Ubuntu common-deps: add python3-venv to the apt install list so `python3 -m venv` works on minimal builds that omit the python backend (where python3-venv was previously only added inside the python-backend-specific branch). 3. Both patchelf install blocks (RHEL common-deps + CPU-only pytorch path): switch from cp-and-discard (/tmp/patchelf-venv) to symlink-from-persistent-venv (/opt/patchelf-venv). Makes future patchelf upgrades idempotent (`pip install -U patchelf` in the venv, symlink already points at the right place), and avoids the recreate-venv-just-to-copy-a-binary dance. 4. Change `pip3 install -r python/openai/requirements.txt` to `pip install` to match the rest of the venv-aware invocations in the same stage (both resolve to /opt/venv-tritonserver/bin/pip via PATH, but the style is now consistent). Refs: Linear TRI-983
Wheels produced in pipeline 49141836 came out as
`tritonserver-2.69.0.dev0-<build>-cp312-cp312-linux_x86_64.whl` — no
`+nv26.04.cu132` local-version segment. The missing `+nv<X>` is
because the docker-run `-e NVIDIA_UPSTREAM_VERSION=<value>` path is
fragile: if build.py's `FLAGS.upstream_container_version` evaluates
to empty (e.g. `--upstream-container-version=` passed with no RHS)
the `-e` arg is skipped and the container-local env var is unset
when `python3 build_wheel.py` runs.
Two-pronged hardening:
1. `_compose_version()` now consults three env sources, first
non-empty wins:
- NVIDIA_UPSTREAM_VERSION (docker-run -e; fragile)
- NVIDIA_TRITON_SERVER_VERSION (ENV in the buildbase image via
the `TRITON_CONTAINER_VERSION` ARG -> ENV wiring; baked in at
image-build time, survives even when -e forwarding fails)
- TRITON_CONTAINER_VERSION (same value in CI, extra safety)
In CI all three carry the same value so the effective output is
unchanged when things work; when the -e hop fails, the image-ENV
fallback keeps the `+nv<X>` suffix intact.
2. Diagnostic logging:
- build.py logs wheel-naming inputs (build-id and
upstream-container-version) alongside the existing container-
version log line, so a missing value is obvious on the host
before docker run even starts.
- _compose_version() prints the env-var dict it saw plus the
resolved nv/cuda tuple to stderr. Any future gap in the chain
now surfaces in the wheel-build log rather than silently
losing a version suffix.
Refs: Linear TRI-983
Two changes so the produced wheel carries a deterministic PEP 427 build tag matching the GitLab pipeline convention, with parity between the tritonfrontend (this repo) and tritonserver (core repo) wheels: 1. build.py's docker-run invocation now forwards CI_PIPELINE_ID from the host env into the build container via `-e`. Falls back to the host NVIDIA_UPSTREAM_VERSION env var when the CLI flag --upstream-container-version is empty. 2. build_wheel.py's build-tag resolution now prefers CI_PIPELINE_ID over NVIDIA_BUILD_ID (falls back further to BUILD_NUMBER for generic CI systems). CI_PIPELINE_ID is pipeline-scoped, matches the identifier used in the RHEL .zip artifact naming convention, and keeps all wheels in a pipeline sharing one build tag. Also filters the "<unknown>" default build.py emits for local builds without --build-id and adds a stderr diagnostic so any gap in build-tag propagation is self-announcing. Refs: Linear TRI-983
- build.py now propagates CI_JOB_ID (preferred build tag) and PYPI_RELEASE into the build container via docker run -e; removes CI_PIPELINE_ID propagation (replaced by CI_JOB_ID). - build_wheel.py build tag source updated: CI_JOB_ID -> NVIDIA_BUILD_ID -> BUILD_NUMBER. - _compose_version() returns bare base_version when PYPI_RELEASE=true, stripping the +nv<X>.cu<Y> local segment that PyPI rejects on upload.
All wheels from one pipeline share the same build tag so tritonserver
and tritonfrontend filenames stay consistent. CI should pass
--build-id=${CI_PIPELINE_ID} to build.py.
PyPI release wheels must have no build-tag segment so the filename is the canonical bare form: tritonfrontend-2.69.0-cp312-cp312-manylinux_2_28_x86_64.whl
The RHEL base image already ships patchelf at /usr/local/bin/patchelf, so ln -s fails with 'File exists'. Use -f to force-replace it with the venv-managed binary.
Replace bare pip3 install and per-tool dedicated venvs (patchelf-venv, cmake-venv) with a single /opt/venv-tritonserver whose bin/ is added to PATH via ENV. This fixes PEP 668 externally-managed-environment errors on Ubuntu Noble base images and provides a consistent install pattern across Dockerfile.QA (cibase + main stages) and build.py (RHEL and cpu-pytorch final-image stages).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does the PR do?
Fixes the
tritonfrontendPython wheel platform tagging, consolidates all tool installs (cmake, patchelf) into a single/opt/venv-tritonservervirtualenv, and addsPYPI_RELEASEsupport.Changes:
src/python/build_wheel.py: AddPYPI_RELEASEenv-var support. UseCI_PIPELINE_IDas PEP 427 build tag. Pass--plat-nametosetup.py bdist_wheelfor correctlinux_x86_64/linux_aarch64tag. Runauditwheel repairto upgrade tomanylinux_2_28_*.build.py: PropagateCI_PIPELINE_ID,NVIDIA_UPSTREAM_VERSION, andPYPI_RELEASEinto Docker container via-e. Replaceln -swithln -sffor patchelf symlink (fixes RHEL "File exists" failure). Consolidate patchelf and cmake into/opt/venv-tritonserver.Dockerfile.QA: Replace barepip3 install(blocked by PEP 668 on Ubuntu Noble) with/opt/venv-tritonservervenv.ENV PATH="/opt/venv-tritonserver/bin:${PATH}"in bothcibaseand main stages.Checklist
<commit_type>: <Title>Commit Type:
Related PRs:
triton-inference-server/core#494
dl/dgx/tritonserver!1745
Where should the reviewer start?
src/python/build_wheel.pyfor wheel-tag fixes;Dockerfile.QAlines 66–72 and 350–382 for venv consolidation;build.pyfor env propagation and patchelf.Test plan:
- CI Pipeline ID: 49163608
CI (internal): [#49163608](http://tritonserver.local/ci/pipelines/49163608)Caveats:
auditwheelmay refuse to bundle CUDA libraries if on its exclusion list — wheel keepslinux_x86_64tag.PYPI_RELEASE=truestrips local-version suffix; wheels not yet on PyPI.Background
Dockerfile.QApip3 install cmake==4.0.3was blocked by PEP 668 on Ubuntu Noble. The patchelfln -sfailed with "File exists" on RHEL.Related Issues: