math_brute_force: Optimize test execution time and improve coverage methodology

**TL;DR:** Most `math_brute_force` tests compute 4 billion (`1<<32`) values, which results in excessive execution times, particularly on mobile GPUs. This issue proposes a strategy to reduce these workloads while maintaining a high level of confidence in the correctness of the implementations.

### Current Implementation

**1. Tests evaluating all 64K (`1<<16`) values:**
*(This behavior is correct and working as intended)*
- `unary_half`
- `unary_two_results_half`
- `unary_two_results_i_half`
- `i_unary_half`
- `macro_unary_half`

**2. Tests evaluating 4 billion (`1<<32`) values redundantly:**
*(These test the exact same inputs multiple times)*
- `unary_u_half`

**3. Tests evaluating 4 billion (`1<<32`) values using combinations:**
*(These test all possible combinations of special values, followed by combinations of randomly selected values)*
- `binary_double`
- `binary_float`
- `binary_half`
- `binary_i_double`
- `binary_i_float`
- `binary_i_half`
- `binary_operator_double`
- `binary_operator_float`
- `binary_operator_half`
- `macro_binary_double`
- `macro_binary_float`
- `macro_binary_half`
- `ternary_double`
- `ternary_float`
- `ternary_half`

**4. Tests evaluating 4 billion (`1<<32`) values using purely random combinations:**
*(These completely ignore special values)*
- `binary_two_results_i_double`
- `binary_two_results_i_float`
- `binary_two_results_i_half`
- `mad_double`
- `mad_float`
- `mad_half`
- `unary_u_double`

**5. Tests evaluating 4 billion (`1<<32`) values uniformly:**
*(These spread values uniformly across the range, but completely ignore special values)*
- `i_unary_double`
- `macro_unary_double`
- `unary_double`
- `unary_two_results_double`
- `unary_two_results_i_double`

**6. Tests evaluating all 4 billion (`1<<32`) values exhaustively:**
- `i_unary_float`
- `macro_unary_float`
- `unary_float`
- `unary_two_results_float`
- `unary_two_results_i_float`
- `unary_u_float`

### Identified Issues
- **Redundancy:** Group 2 tests the exact same values multiple times.
- **Missing Edge Cases:** Groups 4 and 5 completely ignore special values (e.g., NaN, Infinity, zero).
- **Poor Coverage of Mixed Cases:** Group 3 combines special values with other special values, but never tests special values against randomly selected values.
- **Performance Bottleneck:** Groups 3, 4, 5, and 6 evaluate `1<<32` values, which takes an excessively long time to execute.
- **Code Duplication:** Although these tests perform similar operations, they do not share common code. This leads to heavy duplication of test logic and redundantly copy-pasted special value arrays.

### Proposal
- **Standardize Unary Half Tests:** Modify all unary half-precision tests to evaluate 64K values (specifically, fix `unary_u_half`).
- **Consolidate Special Values:**
  - Merge all FP32 special values into a single C++ array shared across all tests.
  - Merge all FP64 special values into a single C++ array shared across all tests.
- **Revamp Unary Testing:** For unary tests, use all special values and fill the rest of the buffer with randomly selected values spread uniformly across the range, up to a total of `n` values.
- **Revamp Binary/Ternary Testing:** Test `n` total combinations. To ensure comprehensive mixing of values:
  - We need `m = n^(1/2)` unique values for binary operations, and `m = n^(1/3)` unique values for ternary operations.
  - These `m` values will consist of `s` special values, plus `r` random values spread uniformly across the range (where `r = m - s`).
- **Determine `n`:** Figure out the optimal baseline size for `n`. Note that this value may vary depending on the number of inputs (unary/binary/ternary) and/or the data type (FP16/FP32/FP64).
- **Retain Exhaustive Testing Option:** Keep a command-line flag or option to run the full `1<<32` value suite. 
  - *Note:* Running `1<<32` would now use this new dataset generation methodology, except for unary float tests where `1<<32` covers the entire exhaustive range (making special/random selection unnecessary).
- **Update Execution Modes:** "Wimpy" and "Embedded" modes will simply scale down the value of `n`.

### Next Steps
1. Discuss the proposal.
2. Agree on an action plan.
3. Implement the agreed-upon plan.

***

**Note:** This is a long-standing issue as described in [KhronosGroup/OpenCL-CTS#1054](https://github.com/KhronosGroup/OpenCL-CTS/issues/1054). This issue addresses only `math_brute_force`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

math_brute_force: Optimize test execution time and improve coverage methodology #2669

Current Implementation

Identified Issues

Proposal

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

math_brute_force: Optimize test execution time and improve coverage methodology #2669

Description

Current Implementation

Identified Issues

Proposal

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions