Skip to content

Fix CI crash on RTX6000#1184

Merged
rgsl888prabhu merged 3 commits into
NVIDIA:mainfrom
nguidotti:fix-rtx-6000-crash
May 8, 2026
Merged

Fix CI crash on RTX6000#1184
rgsl888prabhu merged 3 commits into
NVIDIA:mainfrom
nguidotti:fix-rtx-6000-crash

Conversation

@nguidotti
Copy link
Copy Markdown
Contributor

@nguidotti nguidotti commented May 6, 2026

This PR disables the warpspeed scan in CUB, which is causing the CI test on RTX6000 to crash. More specifically, there is a Warp MMU Fault in cub::detail::scan::DeviceScanKernel during thrust::inclusive_scan(..., thrust::maximum<int>{}) called from trivial_presolve.cuh:124.

CCCL 3.4.0 introduced an SM90+ "warpspeed" scan kernel that uses Hopper/Blackwell TMA (cp_async_bulk). When computing the byte mask for a partial TMA copy (cp_async_bulk_cp_mask), the code has two branches:

  #if _CCCL_CUDA_COMPILER(NVCC, >=, 13, 2)
      byteMaskSmall = byteMaskStart & byteMaskEnd;              // correct
  #else
      byteMaskSmall = byteMaskStart & (byteMask >> (16 - (ptrGmemEnd - ptrGmemStartAlignDown)));

On NVCC 13.1, the #else formula can produce a non-contiguous byte mask. Blackwell's TMA hardware requires a strictly contiguous bit range in the mask — a non-contiguous mask causes a hardware MMU fault.

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti nguidotti added this to the 26.06 milestone May 6, 2026
@nguidotti nguidotti self-assigned this May 6, 2026
@nguidotti nguidotti requested review from a team as code owners May 6, 2026 17:29
@nguidotti nguidotti requested a review from rgsl888prabhu May 6, 2026 17:29
@nguidotti nguidotti added bug Something isn't working non-breaking Introduces a non-breaking change labels May 6, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0dc99e8a-3baf-47bc-928a-aa1874dcf534

📥 Commits

Reviewing files that changed from the base of the PR and between 861c324 and 4508695.

📒 Files selected for processing (1)
  • cpp/CMakeLists.txt

📝 Walkthrough

Walkthrough

A compile-time preprocessor definition CCCL_DISABLE_WARPSPEED_SCAN is added to cpp/CMakeLists.txt, guarded by a CUDA version check (<= 13.2) with explanatory comments to disable CCCL warp-speed scanning and avoid Warp MMU faults. No public APIs or exported signatures were changed.

Changes

Build/CMake change

Layer / File(s) Summary
Build Definition
cpp/CMakeLists.txt
Adds CCCL_DISABLE_WARPSPEED_SCAN to target compile definitions.
Version Guard / Rationale
cpp/CMakeLists.txt
Wraps the define with a CUDA compiler version check (<= 13.2) and includes explanatory comments about Warp MMU fault behavior.
Documentation in-code
cpp/CMakeLists.txt
Comments inserted near existing DEFINE_PDLP_VERBOSE_MODE block to explain the reason for the compile-time toggle.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix CI crash on RTX6000' directly describes the main change—disabling warpspeed scan to resolve a hardware-specific crash issue.
Description check ✅ Passed The description clearly explains the problem (Warp MMU Fault), root cause (non-contiguous byte mask in NVCC 13.1), and solution (disabling warpspeed scan), all directly relevant to the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cpp/CMakeLists.txt (1)

166-166: ⚡ Quick win

Gate CCCL_DISABLE_WARPSPEED_SCAN to CUDA 13.1 and earlier to avoid unnecessary performance loss on newer toolchains.

Your own comments confirm this workaround is required only for CUDA 13.1 and fixed in NVCC ≥ 13.2. Applying it unconditionally disables the optimized warpspeed scan path on all later versions. Gate the definition by CUDA compiler version to preserve performance where the bug doesn't exist.

Suggested patch
-add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN)
+if (CMAKE_CUDA_COMPILER_VERSION VERSION_LESS 13.2)
+    add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN)
+endif ()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/CMakeLists.txt` at line 166, Gate the
add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) so it only applies to NVCC <=
13.1: check the CUDA compiler version (CMAKE_CUDA_COMPILER_VERSION) and wrap the
add_definitions(...) call in an if block that only executes when
CMAKE_CUDA_COMPILER_VERSION VERSION_LESS "13.2" (or VERSION_LESS_EQUAL "13.1"),
leaving the definition out for >= 13.2 so the optimized warpspeed scan remains
enabled; update CMakeLists.txt around the existing
add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) line accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cpp/CMakeLists.txt`:
- Line 166: Gate the add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) so it only
applies to NVCC <= 13.1: check the CUDA compiler version
(CMAKE_CUDA_COMPILER_VERSION) and wrap the add_definitions(...) call in an if
block that only executes when CMAKE_CUDA_COMPILER_VERSION VERSION_LESS "13.2"
(or VERSION_LESS_EQUAL "13.1"), leaving the definition out for >= 13.2 so the
optimized warpspeed scan remains enabled; update CMakeLists.txt around the
existing add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) line accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bbf808d6-5876-4748-b966-14128204a756

📥 Commits

Reviewing files that changed from the base of the PR and between 285990b and 861c324.

📒 Files selected for processing (1)
  • cpp/CMakeLists.txt

Comment thread cpp/CMakeLists.txt Outdated
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@rgsl888prabhu rgsl888prabhu merged commit f3ab099 into NVIDIA:main May 8, 2026
99 of 106 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants