Skip to content

math_brute_force: Optimize test execution time and improve coverage methodology #2669

@rjodinchr

Description

@rjodinchr

TL;DR: Most math_brute_force tests compute 4 billion (1<<32) values, which results in excessive execution times, particularly on mobile GPUs. This issue proposes a strategy to reduce these workloads while maintaining a high level of confidence in the correctness of the implementations.

Current Implementation

1. Tests evaluating all 64K (1<<16) values:
(This behavior is correct and working as intended)

  • unary_half
  • unary_two_results_half
  • unary_two_results_i_half
  • i_unary_half
  • macro_unary_half

2. Tests evaluating 4 billion (1<<32) values redundantly:
(These test the exact same inputs multiple times)

  • unary_u_half

3. Tests evaluating 4 billion (1<<32) values using combinations:
(These test all possible combinations of special values, followed by combinations of randomly selected values)

  • binary_double
  • binary_float
  • binary_half
  • binary_i_double
  • binary_i_float
  • binary_i_half
  • binary_operator_double
  • binary_operator_float
  • binary_operator_half
  • macro_binary_double
  • macro_binary_float
  • macro_binary_half
  • ternary_double
  • ternary_float
  • ternary_half

4. Tests evaluating 4 billion (1<<32) values using purely random combinations:
(These completely ignore special values)

  • binary_two_results_i_double
  • binary_two_results_i_float
  • binary_two_results_i_half
  • mad_double
  • mad_float
  • mad_half
  • unary_u_double

5. Tests evaluating 4 billion (1<<32) values uniformly:
(These spread values uniformly across the range, but completely ignore special values)

  • i_unary_double
  • macro_unary_double
  • unary_double
  • unary_two_results_double
  • unary_two_results_i_double

6. Tests evaluating all 4 billion (1<<32) values exhaustively:

  • i_unary_float
  • macro_unary_float
  • unary_float
  • unary_two_results_float
  • unary_two_results_i_float
  • unary_u_float

Identified Issues

  • Redundancy: Group 2 tests the exact same values multiple times.
  • Missing Edge Cases: Groups 4 and 5 completely ignore special values (e.g., NaN, Infinity, zero).
  • Poor Coverage of Mixed Cases: Group 3 combines special values with other special values, but never tests special values against randomly selected values.
  • Performance Bottleneck: Groups 3, 4, 5, and 6 evaluate 1<<32 values, which takes an excessively long time to execute.
  • Code Duplication: Although these tests perform similar operations, they do not share common code. This leads to heavy duplication of test logic and redundantly copy-pasted special value arrays.

Proposal

  • Standardize Unary Half Tests: Modify all unary half-precision tests to evaluate 64K values (specifically, fix unary_u_half).
  • Consolidate Special Values:
    • Merge all FP32 special values into a single C++ array shared across all tests.
    • Merge all FP64 special values into a single C++ array shared across all tests.
  • Revamp Unary Testing: For unary tests, use all special values and fill the rest of the buffer with randomly selected values spread uniformly across the range, up to a total of n values.
  • Revamp Binary/Ternary Testing: Test n total combinations. To ensure comprehensive mixing of values:
    • We need m = n^(1/2) unique values for binary operations, and m = n^(1/3) unique values for ternary operations.
    • These m values will consist of s special values, plus r random values spread uniformly across the range (where r = m - s).
  • Determine n: Figure out the optimal baseline size for n. Note that this value may vary depending on the number of inputs (unary/binary/ternary) and/or the data type (FP16/FP32/FP64).
  • Retain Exhaustive Testing Option: Keep a command-line flag or option to run the full 1<<32 value suite.
    • Note: Running 1<<32 would now use this new dataset generation methodology, except for unary float tests where 1<<32 covers the entire exhaustive range (making special/random selection unnecessary).
  • Update Execution Modes: "Wimpy" and "Embedded" modes will simply scale down the value of n.

Next Steps

  1. Discuss the proposal.
  2. Agree on an action plan.
  3. Implement the agreed-upon plan.

Note: This is a long-standing issue as described in KhronosGroup/OpenCL-CTS#1054. This issue addresses only math_brute_force.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions