fix(non_uniform_work_group): respect per-dimension work item size lim…#2666
Open
Lurie97 wants to merge 1 commit intoKhronosGroup:mainfrom
Open
fix(non_uniform_work_group): respect per-dimension work item size lim…#2666Lurie97 wants to merge 1 commit intoKhronosGroup:mainfrom
Lurie97 wants to merge 1 commit intoKhronosGroup:mainfrom
Conversation
0cec4a4 to
80b52dd
Compare
Contributor
|
I wonder if #2598 is fixing the same issue or not. |
…its when selecting localSize
Problem:
Test cases selected localSize based only on CL_KERNEL_WORK_GROUP_SIZE,
without considering CL_DEVICE_MAX_WORK_ITEM_SIZES per-dimension limits.
This caused issues because:
1. Constructor rounds globalSize based on the original localSize
2. prepareDevice() later trims _enqueuedLocalSize to device limits
3. Result: globalSize was rounded with wrong localSize, causing mismatch
For example, with maxWorkItemSizes=[256,256,64] and localSize={512,1}:
- globalSize rounded assuming localSize[0]=512
- prepareDevice() limits localSize[0] to 256
- globalSize and localSize no longer match
Fix:
Query CL_DEVICE_MAX_WORK_ITEM_SIZES in test cases and compute:
effectiveMax = min(maxWgSize, maxWorkItemSizes[dim])
before selecting localSize for each dimension.
This ensures localSize is valid before rounding, so globalSize and
localSize remain consistent throughout the test.
Affected files:
- test_advanced_2d.cpp
- test_advanced_3d.cpp
- test_advanced_other.cpp
- test_basic.cpp
Signed-off-by: jiajia Qian <jiajia.qian@nxp.com>
Author
|
Yes, PR #2598 fixes the same issue. That said, I believe both changes can coexist. My patch addresses the root cause at test generation time, while the prepareDevice() change provides a runtime safeguard. Together, they improve robustness by both preventing and handling the issue. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…its when selecting localSize
Problem:
Test cases selected localSize based only on CL_KERNEL_WORK_GROUP_SIZE, without considering CL_DEVICE_MAX_WORK_ITEM_SIZES per-dimension limits.
This caused issues because:
For example, with maxWorkItemSizes=[256,256,64] and localSize={512,1}:
Fix:
Query CL_DEVICE_MAX_WORK_ITEM_SIZES in test cases and compute:
effectiveMax = min(maxWgSize, maxWorkItemSizes[dim])
before selecting localSize for each dimension.
This ensures localSize is valid before rounding, so globalSize and localSize remain consistent throughout the test.
Affected files: