Summary
In environments where no Metal device is visible (headless/sandboxed/virtualized/macOS automation sessions),
MLX initialization aborts the process with an uncaught Objective-C exception instead of returning
a recoverable Python error.
This makes downstream tooling fail hard, including quality/lint/test pipelines that do not need GPU execution.
Reproduction
- Run in a session with no visible Metal device.
- Execute:
python -c "import mlx.core as mx; print(mx.default_device())"
(Equivalent failure also occurs during import mlx via dependency probes.)
Actual behavior
Process exits with signal/abort (-6) and an uncaught exception similar to:
NSRangeException: -[__NSArray0 objectAtIndex:]: index 0 beyond bounds for empty array
Expected behavior
- No hard abort.
- Either:
- raise a clear Python exception (e.g.,
RuntimeError: No Metal device available), or
- allow a documented CPU/no-op mode for import-time checks.
- Error path should be machine-detectable so CI tools can handle it gracefully.
Why this matters
Hard aborts break non-inference workflows (lint/type/test/packaging checks) when MLX is installed
but GPU is unavailable. A recoverable error would allow callers to skip MLX-dependent runtime tests
without crashing the whole process.
Suggested fix
- Guard the zero-device path before indexing into device arrays.
load_device() in mlx/backend/metal/device.cpp is the primary fix site (empty-device guard + graceful error path).
- Convert this failure path to a typed Python exception instead of process termination.
- Optionally provide an env flag to skip Metal probing at import time in CI/headless contexts.
The main code path is:
Related existing report: Issue #2691.
Summary
In environments where no Metal device is visible (headless/sandboxed/virtualized/macOS automation sessions),
MLX initialization aborts the process with an uncaught Objective-C exception instead of returning
a recoverable Python error.
This makes downstream tooling fail hard, including quality/lint/test pipelines that do not need GPU execution.
Reproduction
python -c "import mlx.core as mx; print(mx.default_device())"(Equivalent failure also occurs during
import mlxvia dependency probes.)Actual behavior
Process exits with signal/abort (
-6) and an uncaught exception similar to:Expected behavior
RuntimeError: No Metal device available), orWhy this matters
Hard aborts break non-inference workflows (lint/type/test/packaging checks) when MLX is installed
but GPU is unavailable. A recoverable error would allow callers to skip MLX-dependent runtime tests
without crashing the whole process.
Suggested fix
load_device()inmlx/backend/metal/device.cppis the primary fix site (empty-device guard + graceful error path).The main code path is:
mlx/backend/metal/device.cppload_device()It currently does
devices->object(0)without checking ifCopyAllDevices()returned an empty list, which matches theNSRangeExceptionwe're seeing.mlx/backend/metal/device.cppDevice::Device()This calls
load_device()during backend init.mlx/device.cppdefault_device_is initialized at static init time viametal::is_available(), so any hard failure in Metal probing can crash import-time flows instead of surfacing a recoverable error.Related existing report: Issue #2691.