Skip to content

fix(non_uniform_work_group): respect per-dimension work item size lim…#2666

Open
Lurie97 wants to merge 1 commit intoKhronosGroup:mainfrom
Lurie97:fix_non_uniform_work_group
Open

fix(non_uniform_work_group): respect per-dimension work item size lim…#2666
Lurie97 wants to merge 1 commit intoKhronosGroup:mainfrom
Lurie97:fix_non_uniform_work_group

Conversation

@Lurie97
Copy link
Copy Markdown

@Lurie97 Lurie97 commented Apr 17, 2026

…its when selecting localSize

Problem:
Test cases selected localSize based only on CL_KERNEL_WORK_GROUP_SIZE, without considering CL_DEVICE_MAX_WORK_ITEM_SIZES per-dimension limits.

This caused issues because:

  1. Constructor rounds globalSize based on the original localSize
  2. prepareDevice() later trims _enqueuedLocalSize to device limits
  3. Result: globalSize was rounded with wrong localSize, causing mismatch

For example, with maxWorkItemSizes=[256,256,64] and localSize={512,1}:

  • globalSize rounded assuming localSize[0]=512
  • prepareDevice() limits localSize[0] to 256
  • globalSize and localSize no longer match

Fix:
Query CL_DEVICE_MAX_WORK_ITEM_SIZES in test cases and compute:
effectiveMax = min(maxWgSize, maxWorkItemSizes[dim])
before selecting localSize for each dimension.

This ensures localSize is valid before rounding, so globalSize and localSize remain consistent throughout the test.

Affected files:

  • test_advanced_2d.cpp
  • test_advanced_3d.cpp
  • test_advanced_other.cpp
  • test_basic.cpp

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 17, 2026

CLA assistant check
All committers have signed the CLA.

@Lurie97 Lurie97 force-pushed the fix_non_uniform_work_group branch from 0cec4a4 to 80b52dd Compare April 17, 2026 03:57
@rjodinchr
Copy link
Copy Markdown
Contributor

I wonder if #2598 is fixing the same issue or not.

…its when selecting localSize

Problem:
Test cases selected localSize based only on CL_KERNEL_WORK_GROUP_SIZE,
without considering CL_DEVICE_MAX_WORK_ITEM_SIZES per-dimension limits.

This caused issues because:
1. Constructor rounds globalSize based on the original localSize
2. prepareDevice() later trims _enqueuedLocalSize to device limits
3. Result: globalSize was rounded with wrong localSize, causing mismatch

For example, with maxWorkItemSizes=[256,256,64] and localSize={512,1}:
- globalSize rounded assuming localSize[0]=512
- prepareDevice() limits localSize[0] to 256
- globalSize and localSize no longer match

Fix:
Query CL_DEVICE_MAX_WORK_ITEM_SIZES in test cases and compute:
  effectiveMax = min(maxWgSize, maxWorkItemSizes[dim])
before selecting localSize for each dimension.

This ensures localSize is valid before rounding, so globalSize and
localSize remain consistent throughout the test.

Affected files:
- test_advanced_2d.cpp
- test_advanced_3d.cpp
- test_advanced_other.cpp
- test_basic.cpp

Signed-off-by: jiajia Qian <jiajia.qian@nxp.com>
@Lurie97
Copy link
Copy Markdown
Author

Lurie97 commented Apr 20, 2026

Yes, PR #2598 fixes the same issue. That said, I believe both changes can coexist. My patch addresses the root cause at test generation time, while the prepareDevice() change provides a runtime safeguard. Together, they improve robustness by both preventing and handling the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants