TL;DR: Most math_brute_force tests compute 4 billion (1<<32) values, which results in excessive execution times, particularly on mobile GPUs. This issue proposes a strategy to reduce these workloads while maintaining a high level of confidence in the correctness of the implementations.
Current Implementation
1. Tests evaluating all 64K (1<<16) values:
(This behavior is correct and working as intended)
unary_half
unary_two_results_half
unary_two_results_i_half
i_unary_half
macro_unary_half
2. Tests evaluating 4 billion (1<<32) values redundantly:
(These test the exact same inputs multiple times)
3. Tests evaluating 4 billion (1<<32) values using combinations:
(These test all possible combinations of special values, followed by combinations of randomly selected values)
binary_double
binary_float
binary_half
binary_i_double
binary_i_float
binary_i_half
binary_operator_double
binary_operator_float
binary_operator_half
macro_binary_double
macro_binary_float
macro_binary_half
ternary_double
ternary_float
ternary_half
4. Tests evaluating 4 billion (1<<32) values using purely random combinations:
(These completely ignore special values)
binary_two_results_i_double
binary_two_results_i_float
binary_two_results_i_half
mad_double
mad_float
mad_half
unary_u_double
5. Tests evaluating 4 billion (1<<32) values uniformly:
(These spread values uniformly across the range, but completely ignore special values)
i_unary_double
macro_unary_double
unary_double
unary_two_results_double
unary_two_results_i_double
6. Tests evaluating all 4 billion (1<<32) values exhaustively:
i_unary_float
macro_unary_float
unary_float
unary_two_results_float
unary_two_results_i_float
unary_u_float
Identified Issues
- Redundancy: Group 2 tests the exact same values multiple times.
- Missing Edge Cases: Groups 4 and 5 completely ignore special values (e.g., NaN, Infinity, zero).
- Poor Coverage of Mixed Cases: Group 3 combines special values with other special values, but never tests special values against randomly selected values.
- Performance Bottleneck: Groups 3, 4, 5, and 6 evaluate
1<<32 values, which takes an excessively long time to execute.
- Code Duplication: Although these tests perform similar operations, they do not share common code. This leads to heavy duplication of test logic and redundantly copy-pasted special value arrays.
Proposal
- Standardize Unary Half Tests: Modify all unary half-precision tests to evaluate 64K values (specifically, fix
unary_u_half).
- Consolidate Special Values:
- Merge all FP32 special values into a single C++ array shared across all tests.
- Merge all FP64 special values into a single C++ array shared across all tests.
- Revamp Unary Testing: For unary tests, use all special values and fill the rest of the buffer with randomly selected values spread uniformly across the range, up to a total of
n values.
- Revamp Binary/Ternary Testing: Test
n total combinations. To ensure comprehensive mixing of values:
- We need
m = n^(1/2) unique values for binary operations, and m = n^(1/3) unique values for ternary operations.
- These
m values will consist of s special values, plus r random values spread uniformly across the range (where r = m - s).
- Determine
n: Figure out the optimal baseline size for n. Note that this value may vary depending on the number of inputs (unary/binary/ternary) and/or the data type (FP16/FP32/FP64).
- Retain Exhaustive Testing Option: Keep a command-line flag or option to run the full
1<<32 value suite.
- Note: Running
1<<32 would now use this new dataset generation methodology, except for unary float tests where 1<<32 covers the entire exhaustive range (making special/random selection unnecessary).
- Update Execution Modes: "Wimpy" and "Embedded" modes will simply scale down the value of
n.
Next Steps
- Discuss the proposal.
- Agree on an action plan.
- Implement the agreed-upon plan.
Note: This is a long-standing issue as described in KhronosGroup/OpenCL-CTS#1054. This issue addresses only math_brute_force.
TL;DR: Most
math_brute_forcetests compute 4 billion (1<<32) values, which results in excessive execution times, particularly on mobile GPUs. This issue proposes a strategy to reduce these workloads while maintaining a high level of confidence in the correctness of the implementations.Current Implementation
1. Tests evaluating all 64K (
1<<16) values:(This behavior is correct and working as intended)
unary_halfunary_two_results_halfunary_two_results_i_halfi_unary_halfmacro_unary_half2. Tests evaluating 4 billion (
1<<32) values redundantly:(These test the exact same inputs multiple times)
unary_u_half3. Tests evaluating 4 billion (
1<<32) values using combinations:(These test all possible combinations of special values, followed by combinations of randomly selected values)
binary_doublebinary_floatbinary_halfbinary_i_doublebinary_i_floatbinary_i_halfbinary_operator_doublebinary_operator_floatbinary_operator_halfmacro_binary_doublemacro_binary_floatmacro_binary_halfternary_doubleternary_floatternary_half4. Tests evaluating 4 billion (
1<<32) values using purely random combinations:(These completely ignore special values)
binary_two_results_i_doublebinary_two_results_i_floatbinary_two_results_i_halfmad_doublemad_floatmad_halfunary_u_double5. Tests evaluating 4 billion (
1<<32) values uniformly:(These spread values uniformly across the range, but completely ignore special values)
i_unary_doublemacro_unary_doubleunary_doubleunary_two_results_doubleunary_two_results_i_double6. Tests evaluating all 4 billion (
1<<32) values exhaustively:i_unary_floatmacro_unary_floatunary_floatunary_two_results_floatunary_two_results_i_floatunary_u_floatIdentified Issues
1<<32values, which takes an excessively long time to execute.Proposal
unary_u_half).nvalues.ntotal combinations. To ensure comprehensive mixing of values:m = n^(1/2)unique values for binary operations, andm = n^(1/3)unique values for ternary operations.mvalues will consist ofsspecial values, plusrrandom values spread uniformly across the range (wherer = m - s).n: Figure out the optimal baseline size forn. Note that this value may vary depending on the number of inputs (unary/binary/ternary) and/or the data type (FP16/FP32/FP64).1<<32value suite.1<<32would now use this new dataset generation methodology, except for unary float tests where1<<32covers the entire exhaustive range (making special/random selection unnecessary).n.Next Steps
Note: This is a long-standing issue as described in KhronosGroup/OpenCL-CTS#1054. This issue addresses only
math_brute_force.