-
Notifications
You must be signed in to change notification settings - Fork 263
Description
Hello all, thanks for the nice library!
After installing cuda-core==0.6.0 + cuda-bindings==13.1.1 (latest on PyPI, conda-forge, and the NVIDIA conda channel), CUDA_BINDINGS_NVML_IS_COMPATIBLE is always False and Device is never exported from cuda.core.system.
Tracing through the source: _system.pyx in the 0.6.0 tag checks _BINDINGS_VERSION >= (13, 1, 2), but cuda-bindings 13.1.2 was never published — PR #1404 bumped the source version to unblock cuda-core development, and issue #1521 (the actual release) was closed as not_planned. So every user of cuda-core==0.6.0 gets Device silently unavailable.
Additionally, five symbols are missing from _nvml 13.1.1 that _device.pyx / _fan.pxi / _event.pxi / _system_events.pyx alias at import time: DeviceArch, FieldId, ClocksEventReasons, EventType, FanControlPolicy, SystemEventType.
I was able to find a work around by creating a cuda/bindings/nvml.py re-exporting _nvml with the six missing stubs and patching_version.py to report 13.1.2:
# cuda/bindings/_version.py
__version__ = "13.1.2"
__version_tuple__ = (13, 1, 2)
# cuda/bindings/nvml.py
import enum
from cuda.bindings._nvml import *
class DeviceArch(enum.IntEnum):
DEVICE_ARCH_AMPERE = 7; DEVICE_ARCH_ADA = 8; DEVICE_ARCH_HOPPER = 9; ...
class FieldId(enum.IntEnum): pass
class ClocksEventReasons(enum.IntFlag): GpuIdle = 0x01; ...
class EventType(enum.IntFlag): SingleBitEccError = 0x1; ...
class SystemEventType(enum.IntFlag): GpuDriverUnbind = 0x1; ...
class FanControlPolicy(enum.IntEnum): TEMPERATURE_CONTINOUS_SW = 0; MANUAL = 1After that, Device(index=0).name, .memory_info, .temperature.sensor() etc. all work correctly against 13.1.1.
Suggested fix: either publish cuda-bindings 13.1.2, or lower the version check to match what's actually available in 13.1.1.