Skip to content

Arm backend: Split MLSDK setup flows and handle shaderFloat64 workaround#18942

Open
perheld wants to merge 1 commit intopytorch:mainfrom
perheld:ph-split-mlsdk
Open

Arm backend: Split MLSDK setup flows and handle shaderFloat64 workaround#18942
perheld wants to merge 1 commit intopytorch:mainfrom
perheld:ph-split-mlsdk

Conversation

@perheld
Copy link
Copy Markdown
Collaborator

@perheld perheld commented Apr 16, 2026

Keep examples/arm/setup.sh focused on the pip-installed MLSDK flow and move source builds into the dedicated setup-mlsdk-from-source.sh script. Retain the old MLSDK source-build-related setup.sh options as deprecated compatibility shims that guide users to the dedicated source-build flow.

When the Vulkan ICD reports shaderFloat64 = false, the pip-installed emulation layer is no longer accepted for that setup because it cannot be configured with the required workaround. In that case, setup.sh now points users to the source-build script, and the source-build script builds the emulation layer with VMEL_USE_FLOAT_AS_DOUBLE=ON.

Also simplify pip-based emulation-layer environment setup by deriving paths from the installed package layout instead of scraping shell export output from the emulation_layer helper.

Update the Arm README to document the split between the default pip workflow and the advanced source-build workflow.

Change-Id: I4bebe171ee3b3728fb126c9d00d73881cb332253

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

@perheld perheld requested a review from digantdesai as a code owner April 16, 2026 12:31
Copilot AI review requested due to automatic review settings April 16, 2026 12:31
@perheld perheld added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: arm Changes to the ARM backend delegate labels Apr 16, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 16, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18942

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 18 New Failures, 3 Unrelated Failures

As of commit 27991ae with merge base a43675c (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2026
Keep examples/arm/setup.sh focused on the pip-installed MLSDK flow and
move source builds into the dedicated setup-mlsdk-from-source.sh script.
Retain the old MLSDK source-build-related setup.sh options as deprecated
compatibility shims that guide users to the dedicated source-build flow.

When the Vulkan ICD reports shaderFloat64 = false, the pip-installed
emulation layer is no longer accepted for that setup because it cannot
be configured with the required workaround. In that case, setup.sh now
points users to the source-build script, and the source-build script
builds the emulation layer with VMEL_USE_FLOAT_AS_DOUBLE=ON.

Also simplify pip-based emulation-layer environment setup by deriving
paths from the installed package layout instead of scraping shell export
output from the emulation_layer helper.

Update the Arm README to document the split between the default pip
workflow and the advanced source-build workflow.

Change-Id: I4bebe171ee3b3728fb126c9d00d73881cb332253
Signed-off-by: Per Held <per.held@arm.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the Arm backend setup flows by making examples/arm/setup.sh focus on the default pip-installed MLSDK path while moving MLSDK source builds (including the shaderFloat64 workaround build flag) into a new dedicated setup-mlsdk-from-source.sh script.

Changes:

  • Split MLSDK setup into a default pip workflow (examples/arm/setup.sh) and a dedicated source-build workflow (backends/arm/scripts/setup-mlsdk-from-source.sh), with deprecated shims in setup.sh.
  • Add a shaderFloat64 compatibility probe for the pip flow and enable VMEL_USE_FLOAT_AS_DOUBLE when building the emulation layer from source.
  • Simplify pip emulation-layer env setup by deriving paths from the installed Python package layout; update Arm README to document the split.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
examples/arm/setup.sh Removes source-build flow, adds compatibility checks and deprecated option handling for MLSDK setup via pip.
backends/arm/scripts/vulkan_utils.sh Removes unused MLSDK manifest dir variable from Vulkan utilities.
backends/arm/scripts/setup-mlsdk-from-source.sh New script to build MLSDK components from source and apply the shaderFloat64 workaround via CMake flag.
backends/arm/scripts/mlsdk_utils.sh Drops source-build helpers and adds vulkaninfo probing + pip package layout–based emulation layer environment setup.
backends/arm/README.md Documents the default pip workflow and the advanced source-build workflow entrypoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

setup_vulkan_sdk
fi

setup_mlsdk_source "${mlsdk_manifest_dir}"
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup_mlsdk_source is invoked unconditionally, so the script will sync the MLSDK manifest even when the user only requests --enable-vulkan-sdk (or disables all MLSDK components). Consider guarding this call so the manifest sync/build only runs when at least one of model-converter/vgf-lib/emulation-layer is enabled.

Suggested change
setup_mlsdk_source "${mlsdk_manifest_dir}"
if [[ "${enable_model_converter}" -eq 1 || "${enable_vgf_lib}" -eq 1 || "${enable_emulation_layer}" -eq 1 ]]; then
setup_mlsdk_source "${mlsdk_manifest_dir}"
fi

Copilot uses AI. Check for mistakes.
Comment on lines +151 to +164
if [[ ! -f repo ]]; then
log_step "mlsdk" "Fetching repo tool"
curl https://storage.googleapis.com/git-repo-downloads/repo > repo
chmod u+x repo
fi

./repo init \
--depth=1 \
--no-repo-verify \
--manifest-url "${mlsdk_manifest_url}" \
--manifest-branch "${mlsdk_manifest_tag}" \
-g model-converter,emulation-layer,vgf-library

./repo sync --force-sync -j"${parallel_jobs}"
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script downloads the repo tool over the network without integrity verification and then runs repo init with --no-repo-verify, which disables signature checks. This is a supply-chain risk for anyone running the setup script; consider pinning and verifying a checksum/signature for the downloaded repo binary and avoiding --no-repo-verify unless there’s a documented, unavoidable reason.

Copilot uses AI. Check for mistakes.
if [[ "${cached_tag}" == "refs/tags/v2025.10.0" && \
"${mlsdk_manifest_tag}" == "refs/tags/v2025.12.0" ]]; then
pushd "${manifest_dir}/.." >/dev/null || return 1
log_step "mlsdk" "Deleting ${mlsdk_manifest_dir} and starting fresh"
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the v2025.10.0 -> v2025.12.0 migration workaround, the log message says it is deleting ${mlsdk_manifest_dir}, but the code actually deletes $(basename "${manifest_dir}") under $(dirname "${manifest_dir}"). If the user passed a custom --manifest-dir, this message can be inaccurate; consider logging the actual manifest_dir (or its basename) being removed.

Suggested change
log_step "mlsdk" "Deleting ${mlsdk_manifest_dir} and starting fresh"
log_step "mlsdk" "Deleting $(basename "${manifest_dir}") and starting fresh"

Copilot uses AI. Check for mistakes.
Comment thread examples/arm/setup.sh
Comment on lines +160 to +176
--mlsdk-manifest-url)
if [[ $# -lt 2 ]]; then
print_usage "$@"
exit 1
fi
log_step "mlsdk" \
"Deprecated option '--mlsdk-manifest-url' selected; use it with ./backends/arm/scripts/setup-mlsdk-from-source.sh instead"
shift 2
;;
--mlsdk-manifest-tag)
if [[ $# -lt 2 ]]; then
print_usage "$@"
exit 1
fi
log_step "mlsdk" \
"Deprecated option '--mlsdk-manifest-tag' selected; use it with ./backends/arm/scripts/setup-mlsdk-from-source.sh instead"
shift 2
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecated flags --mlsdk-manifest-url / --mlsdk-manifest-tag are accepted but then ignored (only a log + shift 2). This can silently fall back to the pip flow even though the user explicitly tried to influence the source-build manifest. Consider exiting non-zero (similar to --install-mlsdk-deps-from-src) after printing the guidance, so callers don’t think the flag took effect.

Copilot uses AI. Check for mistakes.
Comment on lines +96 to 97
echo "[mlsdk_utils] Failed to query emulation_layer environment; skipping"
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback error message is now misleading: this code no longer queries the emulation_layer CLI output/environment, it locates the Python package directory and checks for a deploy/ folder. Update the message (and ideally write it to stderr) so failures are actionable (e.g., package not installed / no deploy dir found).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants