Skip to content

Commit 22ea113

Browse files
authored
cl_khr_extended_bit_ops, cl_khr_suggested_local_work_size (#605)
* publish two KHR extensions cl_khr_extended_bit_ops cl_khr_suggested_local_work_size * update XML file for extensions * remove placeholder text * remove test plans
1 parent ddb23a1 commit 22ea113

5 files changed

Lines changed: 266 additions & 0 deletions

File tree

OpenCL_Ext.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,9 @@ include::ext/cl_khr_subgroup_extensions.asciidoc[]
8686

8787
include::ext/cl_khr_pci_bus_info.asciidoc[]
8888

89+
include::ext/cl_khr_extended_bit_ops.asciidoc[]
90+
include::ext/cl_khr_suggested_local_work_size.asciidoc[]
91+
8992
// NOTE: To keep meaningful section numbers, new
9093
// extension documents should be added above here!
9194

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
// Copyright 2018-2021 The Khronos Group. This work is licensed under a
2+
// Creative Commons Attribution 4.0 International License; see
3+
// http://creativecommons.org/licenses/by/4.0/
4+
5+
[[cl_khr_extended_bit_ops]]
6+
== Extended Bit Operations
7+
8+
This extension adds OpenCL C functions for performing extended bit operations.
9+
Specifically, the following functions are added:
10+
11+
* bitfield insert: insert bits from one source operand into another source operand.
12+
* bitfield extract: extract bits from a source operand, with sign- or zero-extension.
13+
* bit reverse: reverse the bits of a source operand.
14+
15+
=== General Information
16+
17+
==== Name Strings
18+
19+
`cl_khr_extended_bit_ops`
20+
21+
==== Version History
22+
23+
[cols="1,1,3",options="header",]
24+
|====
25+
| *Date* | *Version* | *Description*
26+
| 2021-04-22 | 1.0.0 | Initial version.
27+
|====
28+
29+
==== Dependencies
30+
31+
This extension is written against the OpenCL 3.0 C Language Specification and the OpenCL SPIR-V Environment Specification Version V3.0.6.
32+
33+
This extension requires OpenCL 1.0.
34+
35+
=== New OpenCL C Functions
36+
37+
[source]
38+
----
39+
gentype bitfield_insert( gentype base, gentype insert, uint offset, uint count )
40+
igentype bitfield_extract_signed( gentype base, uint offset, uint count )
41+
ugentype bitfield_extract_unsigned( gentype base, uint offset, uint count )
42+
gentype bit_reverse( gentype base )
43+
----
44+
45+
=== Modifications to the OpenCL C Specification
46+
47+
==== Modify Section 6.15.3. Integer Functions:
48+
49+
Add a new Section 6.15.3.X. Extended Bit Operations: ::
50+
+
51+
--
52+
The functions described in the following table can be used with built-in scalar or vector integer types to perform extended bit operations.
53+
The functions that operate on vector types operate component-wise.
54+
The description is per-component.
55+
56+
In the table below, the generic type name `gentype` refers to the built-in integer types `char`, `char__n__`, `uchar`, `uchar__n__`, `short`, `short__n__`, `ushort`, `ushort__n__`, `int`, `int__n__`, `uint`, `uint__n__`, `long`, `long__n__`, `ulong`, and `ulong__n__`.
57+
The generic type name `igentype` refers to the built-in signed integer types `char`, `char__n__`, `short`, `short__n__`, `int`, `int__n__`, `long`, and `long__n__`.
58+
The generic type name `ugentype` refers to the built-in unsigned integer types `uchar`, `uchar__n__`, `ushort`, `ushort__n__`, `uint`, `uint__n__`, `ulong`, and `ulong__n__`.
59+
_n_ is 2, 3, 4, 8, or 16.
60+
61+
.Built-in Scalar and Vector Extended Bit Operations
62+
[cols="1a,1", options="header"]
63+
|===
64+
|*Function*
65+
|*Description*
66+
67+
|[source,c]
68+
----
69+
gentype bitfield_insert(
70+
gentype base, gentype insert,
71+
uint offset, uint count)
72+
----
73+
74+
|Returns a copy of _base_, with a modified bitfield that comes from _insert_.
75+
76+
Any bits of the result value numbered outside [_offset_, _offset_ + _count_ - 1] (inclusive) will come from the corresponding bits in _base_.
77+
78+
Any bits of the result value numbered inside [_offset_, _offset_ + _count_ - 1] (inclusive) will come from the bits numbered [0, _count_ - 1] (inclusive) of _insert_.
79+
80+
_count_ is the number of bits to be modified.
81+
If _count_ equals 0, the return value will be equal to _base_.
82+
83+
If _count_ or _offset_ or _offset_ + _count_ is greater than number of bits in `gentype` (for scalar types) or components of `gentype` (for vector types), the result is undefined.
84+
85+
|[source,c]
86+
----
87+
igentype bitfield_extract_signed(
88+
gentype base,
89+
uint offset, uint count)
90+
----
91+
92+
|Returns an extracted bitfield from _base_ with sign extension.
93+
The type of the return value is always a signed type.
94+
95+
The bits of _base_ numbered in [_offset_, _offset_ + _count_ - 1] (inclusive) are returned as the bits numbered in [0, _count_ - 1] (inclusive) of the result.
96+
The remaining bits in the result will be sign extended by replicating the bit numbered _offset_ + _count_ - 1 of _base_.
97+
98+
_count_ is the number of bits to be extracted.
99+
If _count_ equals 0, the result is 0.
100+
101+
If the _count_ or _offset_ or _offset_ + _count_ is greater than number of bits in `gentype` (for scalar types) or components of `gentype` (for vector types), the result is undefined.
102+
103+
|[source,c]
104+
----
105+
ugentype bitfield_extract_unsigned(
106+
gentype base,
107+
uint offset, uint count)
108+
----
109+
110+
|Returns an extracted bitfield from _base_ with zero extension.
111+
The type of the return value is always an unsigned type.
112+
113+
The bits of _base_ numbered in [_offset_, _offset_ + _count_ - 1] (inclusive) are returned as the bits numbered in [0, _count_ - 1] (inclusive) of the result.
114+
The remaining bits in the result will be zero.
115+
116+
_count_ is the number of bits to be extracted.
117+
If _count_ equals 0, the result is 0.
118+
119+
If the _count_ or _offset_ or _offset_ + _count_ is greater than number of bits in `gentype` (for scalar types) or components of `gentype` (for vector types), the result is undefined.
120+
121+
|[source,c]
122+
----
123+
gentype bit_reverse(
124+
gentype base)
125+
----
126+
127+
|Returns the value of _base_ with reversed bits.
128+
That is, the bit numbered _n_ of the result value will be taken from the bit numbered _width_ - _n_ - 1 of _base_ (for scalar types) or a component of _base_ (for vector types), where _width_ is number of bits of `gentype` (for scalar types) or components of `gentype` (for vector types).
129+
130+
|===
131+
--
132+
133+
=== Modifications to the OpenCL SPIR-V Environment Specification
134+
135+
==== Add to Section 5 - OpenCL Extensions
136+
137+
Add a new Section 5.2.X - `cl_khr_extended_bit_ops`: ::
138+
+
139+
--
140+
If the OpenCL environment supports the extension `cl_khr_extended_bit_ops`, then the environment must accept modules that declare use of the extension `SPV_KHR_bit_instructions` via *OpExtension*.
141+
142+
If the OpenCL environment supports the extension `cl_khr_extended_bit_ops` and use of the SPIR-V extension `SPV_KHR_bit_instructions` is declared in the module via *OpExtension*, then the environment must accept modules that declare the *BitInstructions* capability.
143+
--
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
// Copyright 2018-2021 The Khronos Group. This work is licensed under a
2+
// Creative Commons Attribution 4.0 International License; see
3+
// http://creativecommons.org/licenses/by/4.0/
4+
5+
[[cl_khr_suggested_local_work_size]]
6+
== Suggested Local Work Size Query
7+
8+
This extension adds the ability to query a suggested local work group size for a kernel running on a device for a specified global work size and global work offset.
9+
The suggested local work group size will match the work group size that would be chosen if the kernel were enqueued with the specified global work size and global work offset and a `NULL` local work size.
10+
11+
By using the suggested local work group size query an application has greater insight into the local work group size chosen by the OpenCL implementation, and the OpenCL implementation need not re-compute the local work group size if the same kernel is enqueued multiple times with the same parameters.
12+
13+
=== General Information
14+
15+
==== Name Strings
16+
17+
`cl_khr_suggested_local_work_size`
18+
19+
==== Version History
20+
21+
[cols="1,1,3",options="header",]
22+
|====
23+
| *Date* | *Version* | *Description*
24+
| 2021-04-22 | 1.0.0 | Initial version.
25+
|====
26+
27+
==== Dependencies
28+
29+
This extension is written against the OpenCL API Specification Version V3.0.6.
30+
31+
This extension requires OpenCL 1.0.
32+
33+
=== New API Functions
34+
35+
[source]
36+
----
37+
cl_int clGetKernelSuggestedLocalWorkSizeKHR(
38+
cl_command_queue command_queue,
39+
cl_kernel kernel,
40+
cl_uint work_dim,
41+
const size_t *global_work_offset,
42+
const size_t *global_work_size,
43+
size_t *suggested_local_work_size);
44+
----
45+
46+
=== Modifications to the OpenCL API Specification
47+
48+
==== Section 5.9 - Kernel Objects:
49+
50+
===== New Section 5.9.4.X - Suggested Local Work Size Query
51+
52+
To query a suggested local work size for a kernel object, call the function
53+
54+
[source]
55+
----
56+
cl_int clGetKernelSuggestedLocalWorkSizeKHR(
57+
cl_command_queue command_queue,
58+
cl_kernel kernel,
59+
cl_uint work_dim,
60+
const size_t *global_work_offset,
61+
const size_t *global_work_size,
62+
size_t *suggested_local_work_size);
63+
----
64+
65+
The returned suggested local work size is expected to match the local work size that would be chosen if the specified kernel object, with the same kernel arguments, were enqueued into the specified command queue with the specified global work size, specified global work offset, and with a `NULL` local work size.
66+
67+
* _command_queue_ specifies the command queue and device for the query.
68+
* _kernel_ specifies the kernel object and kernel arguments for the query.
69+
The OpenCL context associated with _kernel_ and _command_queue_ must the same.
70+
* _work_dim_ specifies the number of work dimensions in the input global work offset and global work size, and the output suggested local work size.
71+
* _global_work_offset_ can be used to specify an array of at least _work_dim_ global ID offset values for the query.
72+
This is optional and may be `NULL` to indicate there is no global ID offset.
73+
* _global_work_size_ is an array of at least _work_dim_ values describing the global work size for the query.
74+
* _suggested_local_work_size_ is an output array of at least _work_dim_ values that will contain the result of the query.
75+
76+
*clGetKernelSuggestedLocalWorkSizeKHR* returns `CL_SUCCESS` if the query executed successfully.
77+
Otherwise, it returns one of the following errors:
78+
79+
* `CL_INVALID_COMMAND_QUEUE` if _command_queue_ is not a valid host command queue.
80+
* `CL_INVALID_KERNEL` if _kernel_ is not a valid kernel object.
81+
* `CL_INVALID_CONTEXT` if the context associated with _kernel_ is not the same as the context associated with _command_queue_.
82+
* `CL_INVALID_PROGRAM_EXECUTABLE` if there is no successfully built program executable available for _kernel_ for the device associated with _command_queue_.
83+
* `CL_INVALID_KERNEL_ARGS` if all argument values for _kernel_ have not been set.
84+
* `CL_MISALIGNED_SUB_BUFFER_OFFSET` if a sub-buffer object is set as an argument to _kernel_ and the offset specified when the sub-buffer object was created is not aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN` for the device associated with _command_queue_.
85+
* `CL_INVALID_IMAGE_SIZE` if an image object is set as an argument to _kernel_ and the image dimensions are not supported by device associated with _command_queue_.
86+
* `CL_IMAGE_FORMAT_NOT_SUPPORTED` if an image object is set as an argument to _kernel_ and the image format is not supported by the device associated with _command_queue_.
87+
* `CL_INVALID_OPERATION` if an SVM pointer is set as an argument to _kernel_ and the device associated with _command_queue_ does not support SVM or the required SVM capabilities for the SVM pointer.
88+
* `CL_INVALID_WORK_DIMENSION` if _work_dim_ is not a valid value (i.e. a value between 1 and `CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS`).
89+
* `CL_INVALID_GLOBAL_WORK_SIZE` if _global_work_size_ is NULL or if any of the values specified in _global_work_size_ are 0.
90+
* `CL_INVALID_GLOBAL_WORK_SIZE` if any of the values specified in _global_work_size_ exceed the maximum value representable by `size_t` on the device associated with _command_queue_.
91+
* `CL_INVALID_GLOBAL_OFFSET` if the value specified in _global_work_size_ plus the corresponding value in _global_work_offset_ for dimension exceeds the maximum value representable by `size_t` on the device associated with _command_queue_.
92+
* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device.
93+
* `CL_OUT_OF_HOST_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the host.
94+
95+
NOTE: These error conditions are consistent with error conditions for *clEnqueueNDRangeKernel*.

ext/quick_reference.asciidoc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,10 @@
6565
| 2D and 3D Async Copies
6666
| Provisional Extension
6767

68+
| <<cl_khr_extended_bit_ops,cl_khr_extended_bit_ops>>
69+
| Bit Insert, Extract, and Reverse Operations
70+
| Extension
71+
6872
| <<cl_khr_extended_versioning,cl_khr_extended_versioning>>
6973
| Extend versioning of platform, devices, extensions, etc.
7074
| Extension
@@ -197,6 +201,10 @@
197201
| Relative Shuffles Among Sub-Groupings of Work Items
198202
| Extension
199203

204+
| <<cl_khr_suggested_local_work_size,cl_khr_suggested_local_work_size>>
205+
| Query a Suggested Local Work Size
206+
| Extension
207+
200208
| <<cl_khr_terminate_context,cl_khr_terminate_context>>
201209
| Terminate an OpenCL Context
202210
| Extension

xml/cl.xml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2306,6 +2306,15 @@ server's OpenCL/api-docs repository.
23062306
<param><type>void</type>* <name>param_value</name></param>
23072307
<param><type>size_t</type>* <name>param_value_size_ret</name></param>
23082308
</command>
2309+
<command suffix="CL_API_SUFFIX__VERSION_3_0">
2310+
<proto><type>cl_int</type> <name>clGetKernelSuggestedLocalWorkSizeKHR</name></proto>
2311+
<param><type>cl_command_queue</type> <name>command_queue</name></param>
2312+
<param><type>cl_kernel</type> <name>kernel</name></param>
2313+
<param><type>cl_uint</type> <name>work_dim</name></param>
2314+
<param>const <type>size_t</type>* <name>global_work_offset</name></param>
2315+
<param>const <type>size_t</type>* <name>global_work_size</name></param>
2316+
<param><type>size_t</type>* <name>suggested_local_work_size</name></param>
2317+
</command>
23092318
<command suffix="CL_API_SUFFIX__VERSION_1_0">
23102319
<proto><type>cl_mem</type> <name>clImportMemoryARM</name></proto>
23112320
<param><type>cl_context</type> <name>context</name></param>
@@ -6083,5 +6092,13 @@ server's OpenCL/api-docs repository.
60836092
<enum name="CL_DEVICE_PCI_BUS_INFO_KHR"/>
60846093
</require>
60856094
</extension>
6095+
<extension name="cl_khr_suggested_local_work_size" supported="opencl">
6096+
<require>
6097+
<type name="CL/cl.h"/>
6098+
</require>
6099+
<require>
6100+
<command name="clGetKernelSuggestedLocalWorkSizeKHR"/>
6101+
</require>
6102+
</extension>
60866103
</extensions>
60876104
</registry>

0 commit comments

Comments
 (0)