Skip to content

Commit 5c9b590

Browse files
authored
update cl_intel_subgroup_matrix_multiply_accumulate to v1.1 (KhronosGroup#1296)
* initial draft adding SPIR-V support * update copyright dates * fix table column widths
1 parent 362f919 commit 5c9b590

1 file changed

Lines changed: 174 additions & 2 deletions

File tree

extensions/cl_intel_subgroup_matrix_multiply_accumulate.asciidoc

Lines changed: 174 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ Complete
3636

3737
== Version
3838

39-
Built On: {docdate} +
40-
Revision: 1.0.0
39+
Built On: 2025-01-07 +
40+
Revision: 1.1.0
4141

4242
== Dependencies
4343

@@ -343,6 +343,177 @@ int2 intel_sub_group_i8_i8_matrix_mad_k32(int2 a, int8 b, int2 acc)
343343
}
344344
----
345345

346+
== Modifications to the OpenCL SPIR-V Environment Specification
347+
348+
[NOTE]
349+
====
350+
SPIR-V support was added in extension version 1.1.0.
351+
====
352+
353+
=== Add a new section 5.2.X - `cl_intel_subgroup_matrix_multiply_accumulate`
354+
355+
If the OpenCL environment supports the extension `cl_intel_subgroup_matrix_multiply_accumulate` then the environment must accept modules that declare use of the extension `SPV_INTEL_subgroup_matrix_multiply_accumulate` and that declare the SPIR-V capability *SubgroupMatrixMultiplyAccumulateINTEL*.
356+
357+
For devices where the minimum subgroup size is 8, the following matrix dimensions and types are supported.
358+
For these devices, the subgroup size must be 8 (the minimum subgroup size).
359+
Behavior is undefined if these functions are called on other devices or from kernels with a different subgroup size:
360+
361+
[cols="^1,^1,^1,^2,^2,^2,^2",width="100%"]
362+
[options="header"]
363+
|=====
364+
| M Dimension | N Dimension | K Dimension | Result Type | Matrix A Type | Matrix B Type | Matrix C Type
365+
366+
// i32 = i8 x i8 + i32
367+
// i32 = i8 x u8 + i32
368+
// i32 = u8 x i8 + i32
369+
// i32 = u8 x u8 + i32
370+
7+<| *8-bit integer matrix sources (signed and unsigned), 32-bit integer accumulator*: +
371+
| 1, 2, 4, 8 | 8 | 32 | `M x int32_t`
372+
| `M x int32_t` with *MatrixAPackedInt8INTEL* and *MatrixASignedComponentsINTEL*
373+
| `8 x int32_t` with *MatrixBPackedInt8INTEL* and *MatrixBSignedComponentsINTEL*
374+
| `M x int32_t`
375+
376+
| 1, 2, 4, 8 | 8 | 32 | `M x int32_t`
377+
| `M x int32_t` with *MatrixAPackedInt8INTEL* and *MatrixASignedComponentsINTEL*
378+
| `8 x int32_t` with *MatrixBPackedInt8INTEL*
379+
| `M x int32_t`
380+
381+
| 1, 2, 4, 8 | 8 | 32 | `M x int32_t`
382+
| `M x int32_t` with *MatrixAPackedInt8INTEL*
383+
| `8 x int32_t` with *MatrixBPackedInt8INTEL* and *MatrixBSignedComponentsINTEL*
384+
| `M x int32_t`
385+
386+
| 1, 2, 4, 8 | 8 | 32 | `M x int32_t`
387+
| `M x int32_t` with *MatrixAPackedInt8INTEL*
388+
| `8 x int32_t` with *MatrixBPackedInt8INTEL*
389+
| `M x int32_t`
390+
391+
// i32 = i4 x i4 + i32
392+
// i32 = i4 x u4 + i32
393+
// i32 = u4 x i4 + i32
394+
// i32 = u4 x u4 + i32
395+
7+<| *4-bit integer matrix sources (signed and unsigned), 32-bit integer accumulator*: +
396+
| 1, 2, 4, 8 | 8 | 64 | `M x int32_t`
397+
| `M x int32_t` with *MatrixAPackedInt4INTEL* and *MatrixASignedComponentsINTEL*
398+
| `8 x int32_t` with *MatrixBPackedInt4INTEL* and *MatrixBSignedComponentsINTEL*
399+
| `M x int32_t`
400+
401+
| 1, 2, 4, 8 | 8 | 64 | `M x int32_t`
402+
| `M x int32_t` with *MatrixAPackedInt4INTEL* and *MatrixASignedComponentsINTEL*
403+
| `8 x int32_t` with *MatrixBPackedInt4INTEL*
404+
| `M x int32_t`
405+
406+
| 1, 2, 4, 8 | 8 | 64 | `M x int32_t`
407+
| `M x int32_t` with *MatrixAPackedInt4INTEL*
408+
| `8 x int32_t` with *MatrixBPackedInt4INTEL* and *MatrixBSignedComponentsINTEL*
409+
| `M x int32_t`
410+
411+
| 1, 2, 4, 8 | 8 | 64 | `M x int32_t`
412+
| `M x int32_t` with *MatrixAPackedInt4INTEL*
413+
| `8 x int32_t` with *MatrixBPackedInt4INTEL*
414+
| `M x int32_t`
415+
416+
// f32 = f16 x f16 + f32
417+
7+<| *fp16 matrix sources, fp32 accumulator*:
418+
| 1, 2, 4, 8 | 8 | 16 | `M x float32_t` | `M x int32_t` with *MatrixAPackedFloat16INTEL* | `8 x int32_t` with *MatrixBPackedFloat16INTEL* | `M x float32_t`
419+
420+
// f32 = bf16 x bf16 + f32
421+
7+<| *bf16 matrix sources, fp32 accumulator*:
422+
| 1, 2, 4, 8 | 8 | 16 | `M x float32_t` | `M x int32_t` with *MatrixAPackedBFloat16INTEL* | `8 x int32_t` with *MatrixBPackedBFloat16INTEL* | `M x float32_t`
423+
424+
|=====
425+
426+
For devices where the minimum subgroup size is 16, the following matrix dimensions and types are supported.
427+
For these devices, the subgroup size must be 16 (the minimum subgroup size).
428+
Behavior is undefined if these functions are called on other devices or from kernels with a different subgroup size:
429+
430+
[cols="^1,^1,^1,^2,^2,^2,^2",width="100%"]
431+
[options="header"]
432+
|=====
433+
| M Dimension | N Dimension | K Dimension | Result Type | Matrix A Type | Matrix B Type | Matrix C Type
434+
435+
// i32 = i8 x i8 + i32
436+
// i32 = i8 x u8 + i32
437+
// i32 = u8 x i8 + i32
438+
// i32 = u8 x u8 + i32
439+
7+<| *8-bit integer matrix sources (signed and unsigned), 32-bit integer accumulator*: +
440+
| 1, 2, 4, 8 | 16 | 32 | `M x int32_t`
441+
| `M x int16_t` with *MatrixAPackedInt8INTEL* and *MatrixASignedComponentsINTEL*
442+
| `8 x int32_t` with *MatrixBPackedInt8INTEL* and *MatrixBSignedComponentsINTEL*
443+
| `M x int32_t`
444+
445+
| 1, 2, 4, 8 | 16 | 32 | `M x int32_t`
446+
| `M x int16_t` with *MatrixAPackedInt8INTEL* and *MatrixASignedComponentsINTEL*
447+
| `8 x int32_t` with *MatrixBPackedInt8INTEL*
448+
| `M x int32_t`
449+
450+
| 1, 2, 4, 8 | 16 | 32 | `M x int32_t`
451+
| `M x int16_t` with *MatrixAPackedInt8INTEL*
452+
| `8 x int32_t` with *MatrixBPackedInt8INTEL* and *MatrixBSignedComponentsINTEL*
453+
| `M x int32_t`
454+
455+
| 1, 2, 4, 8 | 16 | 32 | `M x int32_t`
456+
| `M x int16_t` with *MatrixAPackedInt8INTEL*
457+
| `8 x int32_t` with *MatrixBPackedInt8INTEL*
458+
| `M x int32_t`
459+
460+
// i32 = i4 x i4 + i32
461+
// i32 = i4 x u4 + i32
462+
// i32 = u4 x i4 + i32
463+
// i32 = u4 x u4 + i32
464+
7+<| *4-bit integer matrix sources (signed and unsigned), 32-bit integer accumulator*: +
465+
| 1, 2, 4, 8 | 16 | 64 | `M x int32_t`
466+
| `M x int16_t` with *MatrixAPackedInt4INTEL* and *MatrixASignedComponentsINTEL*
467+
| `8 x int32_t` with *MatrixBPackedInt4INTEL* and *MatrixBSignedComponentsINTEL*
468+
| `M x int32_t`
469+
470+
| 1, 2, 4, 8 | 16 | 64 | `M x int32_t`
471+
| `M x int16_t` with *MatrixAPackedInt4INTEL* and *MatrixASignedComponentsINTEL*
472+
| `8 x int32_t` with *MatrixBPackedInt4INTEL*
473+
| `M x int32_t`
474+
475+
| 1, 2, 4, 8 | 16 | 64 | `M x int32_t`
476+
| `M x int16_t` with *MatrixAPackedInt4INTEL*
477+
| `8 x int32_t` with *MatrixBPackedInt4INTEL* and *MatrixBSignedComponentsINTEL*
478+
| `M x int32_t`
479+
480+
| 1, 2, 4, 8 | 16 | 64 | `M x int32_t`
481+
| `M x int16_t` with *MatrixAPackedInt4INTEL*
482+
| `8 x int32_t` with *MatrixBPackedInt4INTEL*
483+
| `M x int32_t`
484+
485+
// f32 = f16 x f16 + f32
486+
7+<| *fp16 matrix sources, fp32 accumulator*:
487+
| 1, 2, 4, 8 | 16 | 16 | `M x float32_t`
488+
| `M x int16_t` with *MatrixAPackedFloat16INTEL*
489+
| `8 x int32_t` with *MatrixBPackedFloat16INTEL*
490+
| `M x float32_t`
491+
492+
// f32 = bf16 x bf16 + f32
493+
7+<| *bf16 matrix sources, fp32 accumulator*:
494+
| 1, 2, 4, 8 | 16 | 16 | `M x float32_t`
495+
| `M x int16_t` with *MatrixAPackedBFloat16INTEL*
496+
| `8 x int32_t` with *MatrixBPackedBFloat16INTEL*
497+
| `M x float32_t`
498+
499+
// f16 = f16 x f16 + f16
500+
7+<| *fp16 matrix sources, fp16 accumulator*:
501+
| 1, 2, 4, 8 | 16 | 16 | `M x float16_t`
502+
| `M x int16_t` with *MatrixAPackedFloat16INTEL*
503+
| `8 x int32_t` with *MatrixBPackedFloat16INTEL*
504+
| `M x float16_t`
505+
506+
// bf16 = bf16 x bf16 + bf16
507+
7+<| *bf16 matrix sources, bf16 accumulator*:
508+
| 1, 2, 4, 8 | 16 | 16 | `M x int16_t` with *MatrixResultBFloat16INTEL*
509+
| `M x int16_t` with *MatrixAPackedBFloat16INTEL*
510+
| `8 x int32_t` with *MatrixBPackedBFloat16INTEL*
511+
| `M x int16_t` with *MatrixCBFloat16INTEL*
512+
513+
// Note: other types (e.g. tf32) will be described in their respective extension documents.
514+
515+
|=====
516+
346517
== Issues
347518

348519
. Should this extension use signed or unsigned types to represent fp16 and bf16 data?
@@ -362,6 +533,7 @@ Applications are encouraged to use `as_type` to reinterpret unsigned data as sig
362533
|Rev|Date|Author|Changes
363534
|1.0.0|2022-05-18|Ben Ashbaugh|*Initial public revision*
364535
|1.0.0|2024-06-06|Ben Ashbaugh|Document additional functions.
536+
|1.1.0|2025-01-07|Ben Ashbaugh|Added SPIR-V support.
365537
|========================================
366538

367539
//************************************************************************

0 commit comments

Comments
 (0)