Headless/no-GPU environments crash on MLX import (NSRangeException) instead of failing gracefully


### Summary

In environments where no Metal device is visible (headless/sandboxed/virtualized/macOS automation sessions),
MLX initialization aborts the process with an uncaught Objective-C exception instead of returning
a recoverable Python error.

This makes downstream tooling fail hard, including quality/lint/test pipelines that do not need GPU execution.

### Reproduction

1. Run in a session with no visible Metal device.
2. Execute:

```bash
python -c "import mlx.core as mx; print(mx.default_device())"
```

(Equivalent failure also occurs during `import mlx` via dependency probes.)

### Actual behavior

Process exits with signal/abort (`-6`) and an uncaught exception similar to:

```text
NSRangeException: -[__NSArray0 objectAtIndex:]: index 0 beyond bounds for empty array
```

### Expected behavior

- No hard abort.
- Either:
  1. raise a clear Python exception (e.g., `RuntimeError: No Metal device available`), or
  2. allow a documented CPU/no-op mode for import-time checks.
- Error path should be machine-detectable so CI tools can handle it gracefully.

### Why this matters

Hard aborts break non-inference workflows (lint/type/test/packaging checks) when MLX is installed
but GPU is unavailable. A recoverable error would allow callers to skip MLX-dependent runtime tests
without crashing the whole process.

### Suggested fix

- Guard the zero-device path before indexing into device arrays.  `load_device()` in `mlx/backend/metal/device.cpp` is the primary fix site (empty-device guard + graceful error path).  
- Convert this failure path to a typed Python exception instead of process termination.
- Optionally provide an env flag to skip Metal probing at import time in CI/headless contexts.

The main code path is:

- [`mlx/backend/metal/device.cpp` `load_device()`](https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/device.cpp#L1623-L1629)  
  It currently does `devices->object(0)` without checking if `CopyAllDevices()` returned an empty list, which matches the `NSRangeException` we're seeing.

- [`mlx/backend/metal/device.cpp` `Device::Device()`](https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/device.cpp#L1911-L1917)  
  This calls `load_device()` during backend init.

- [`mlx/device.cpp`](https://github.com/ml-explore/mlx/blob/main/mlx/device.cpp)  
  `default_device_` is initialized at static init time via `metal::is_available()`, so any hard failure in Metal probing can crash import-time flows instead of surfacing a recoverable error.

Related existing report: [Issue #2691](https://github.com/ml-explore/mlx/issues/2691).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Headless/no-GPU environments crash on MLX import (NSRangeException) instead of failing gracefully #3148

Summary

Reproduction

Actual behavior

Expected behavior

Why this matters

Suggested fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Headless/no-GPU environments crash on MLX import (NSRangeException) instead of failing gracefully #3148

Description

Summary

Reproduction

Actual behavior

Expected behavior

Why this matters

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions