Skip to content

Commit 75df78c

Browse files
authored
spec source for cl_khr_kernel_clock (#1103)
* spec source for cl_khr_kernel_clock * updated after March 26th teleconference Clarified that this is a provisional extension Removed ext from feature names and feature test macros Added undefined behavior description to the SPIR-V environment spec * fix a few more places where the extension should be marked provisional * clarify in a few more places that this extension is provisional * remove provisional_notice.asciidoc, since it should not be used anymore
1 parent 2515b1d commit 75df78c

10 files changed

Lines changed: 255 additions & 16 deletions

OpenCL_API.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ include::config/version-local-links.asciidoc[]
3939
// Formatting and links for API functions and enums.
4040
include::api/dictionary.asciidoc[]
4141

42-
// Feature Dictionary - used by some extensions.
42+
// Feature Dictionary.
4343
include::c/feature-dictionary.asciidoc[]
4444

4545
// External Footnotes

OpenCL_C.txt

Lines changed: 80 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -224,14 +224,28 @@ ifdef::cl_khr_integer_dot_product[]
224224
(when the `<<cl_khr_integer_dot_product>>` extension macro is defined)
225225

226226
| The OpenCL C compiler supports built-in functions that perform dot
227-
products on 4x8 bit packed integer vectors
227+
products on 4x8 bit packed integer vectors.
228228

229229
| {opencl_c_integer_dot_product_input_4x8bit} +
230230
(when the `<<cl_khr_integer_dot_product>>` extension macro is defined)
231231
| The OpenCL C compiler supports built-in functions that perform dot
232-
products on 4x8 bit integer vectors
232+
products on 4x8 bit integer vectors.
233233
endif::cl_khr_integer_dot_product[]
234234

235+
ifdef::cl_khr_kernel_clock[]
236+
| {opencl_c_kernel_clock_scope_device}
237+
| The OpenCL C compiler supports built-in functions that sample the value from a
238+
clock shared by all work-items executing on the device.
239+
240+
| {opencl_c_kernel_clock_scope_work_group}
241+
| The OpenCL C compiler supports built-in functions that sample the value from a
242+
clock shared by all work-items executing in the same work-group.
243+
244+
| {opencl_c_kernel_clock_scope_sub_group}
245+
| The OpenCL C compiler supports built-in functions that sample the value from a
246+
clock shared by all work-items executing in the same sub-group.
247+
endif::cl_khr_kernel_clock[]
248+
235249
|====
236250

237251
In OpenCL C 3.0 or newer, feature macros must expand to the value `1` if the
@@ -462,6 +476,16 @@ The extension provides new <<table-builtin-functions, built-in vector
462476
integer argument functions>> operating on these types.
463477
endif::cl_khr_integer_dot_product[]
464478

479+
ifdef::cl_khr_kernel_clock[]
480+
[[cl_khr_kernel_clock,cl_khr_kernel_clock]]
481+
==== Kernel Clock
482+
483+
The `cl_khr_kernel_clock` extension adds support for SPIR-V instructions and
484+
OpenCL C built-in functions to sample the value from one of three clocks
485+
provided by compute units. The extension provides the following functions:
486+
487+
* <<table-kernel-clock-functions,Built-in Kernel Clock Functions>>
488+
endif::cl_khr_kernel_clock[]
465489

466490
ifdef::cl_khr_local_int32_base_atomics[]
467491
[[cl_khr_local_int32_base_atomics,cl_khr_local_int32_base_atomics]]
@@ -15306,6 +15330,60 @@ endif::cl_khr_subgroup_shuffle_relative[]
1530615330

1530715331
|====
1530815332

15333+
ifdef::cl_khr_kernel_clock[]
15334+
[[kernel-clock-functions]]
15335+
=== Kernel Clock Functions
15336+
15337+
NOTE: The functionality described in this section <<unified-spec, requires>>
15338+
support for the `<<cl_khr_kernel_clock>>` extension. +
15339+
The `clock_read_device` and `clock_read_hilo_device` functions require support
15340+
for the {opencl_c_kernel_clock_scope_device} feature.
15341+
The `clock_read_work_group` and `clock_read_hilo_work_group` functions require
15342+
support for the {opencl_c_kernel_clock_scope_work_group} feature.
15343+
The `clock_read_sub_group` and `clock_read_hilo_sub_group` functions require
15344+
support for the {opencl_c_kernel_clock_scope_sub_group} feature.
15345+
15346+
This section describes OpenCL C built-in functions that sample the value from
15347+
one of three clocks provided by compute units.
15348+
15349+
[[table-kernel-clock-functions]]
15350+
.Built-in Kernel Clock Functions
15351+
[cols="1a,1",options="header",]
15352+
|====
15353+
| Function | Description
15354+
15355+
|[source,opencl_c]
15356+
----
15357+
ulong clock_read_device();
15358+
ulong clock_read_work_group();
15359+
ulong clock_read_sub_group();
15360+
----
15361+
| Returns a sampled value of a clock as seen by the compute unit.
15362+
15363+
An idealized clock is an unbounded unsigned scalar integer tick count
15364+
increasing monotonically over time. A clock’s rate of progress may vary
15365+
within the lifetime of a work-item, may vary across different
15366+
executions of the program, and may be affected by conditions beyond the
15367+
control of the programmer. The sampled value read by this function consists of
15368+
the least significant bits of the idealized clock’s tick count at the time the
15369+
instruction was executed. In particular, an observer may see sampled values wrap
15370+
around zero.
15371+
15372+
|[source,opencl_c]
15373+
----
15374+
uint2 clock_read_hilo_device();
15375+
uint2 clock_read_hilo_work_group();
15376+
uint2 clock_read_hilo_sub_group();
15377+
----
15378+
| Performs the same operation as `clock_read`, but returns the value as a
15379+
`uint2` whose `.lo` component contains the 32 least significant bits of the
15380+
result and `.hi` component contains the 32 most significant bits of the
15381+
result.
15382+
15383+
|====
15384+
15385+
endif::cl_khr_kernel_clock[]
15386+
1530915387

1531015388
[[opencl-numerical-compliance]]
1531115389
= OpenCL Numerical Compliance

api/appendix_e.asciidoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -598,3 +598,8 @@ Changes from *v3.0.14*:
598598
** Restricted semaphores to a single associated device, see {khronos-opencl-pr}/996[#996].
599599
* `<<cl_khr_subgroup_rotate>>`:
600600
** Clarified that only rotating within a subgroup is supported, see {khronos-opencl-pr}/967[#967].
601+
602+
Changes from *v3.0.15*:
603+
604+
* Added new extensions:
605+
** `<<cl_khr_kernel_clock>>` (provisional)

api/cl_khr_kernel_clock.asciidoc

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
// Copyright 2024 The Khronos Group Inc.
2+
// SPDX-License-Identifier: CC-BY-4.0
3+
4+
include::{generated}/meta/{refprefix}cl_khr_kernel_clock.txt[]
5+
6+
=== Other Extension Metadata
7+
8+
*Last Modified Date*::
9+
2024-03-25
10+
*IP Status*::
11+
No known IP claims.
12+
*Contributors*::
13+
- Kevin Petit, Arm Ltd. +
14+
- Paul Fradgley, Imagination Technologies +
15+
- Jeremy Kemp, Imagination Technologies +
16+
- Ben Ashbaugh, Intel +
17+
- Balaji Calidas, Qualcomm Technologies, Inc. +
18+
- Ruihao Zhang, Qualcomm Technologies, Inc.
19+
20+
=== Description
21+
22+
`cl_khr_kernel_clock` adds the ability for a kernel to sample the value from one
23+
of three clocks provided by compute units.
24+
25+
OpenCL C compilers supporting this extension will define the extension macro
26+
`cl_khr_kernel_clock`, and may define corresponding feature macros
27+
{opencl_c_kernel_clock_scope_device},
28+
{opencl_c_kernel_clock_scope_work_group}, and
29+
{opencl_c_kernel_clock_scope_sub_group} depending on the reported
30+
capabilities.
31+
32+
See the link:{OpenCLCSpecURL}#cl_khr_kernel_clock[Kernel Clock] section of the
33+
OpenCL C specification for more information.
34+
35+
=== Interactions With Other Extensions
36+
37+
On devices that implement the `EMBEDDED` profile, the `cles_khr_int64` extension
38+
is required for the `clock_read_device`, `clock_read_work_group` and
39+
`clock_read_sub_group` functions to be present.
40+
41+
Support for sub-groups is required for the `clock_read_sub_group` and
42+
`clock_read_hilo_sub_group` functions to be present.
43+
44+
// The 'New ...' section can be auto-generated
45+
46+
=== New Types
47+
48+
* {cl_device_kernel_clock_capabilities_khr_TYPE}
49+
50+
=== New Enums
51+
52+
* {cl_device_info_TYPE}
53+
** {CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR}
54+
* {cl_device_kernel_clock_capabilities_khr_TYPE}
55+
** {CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR}
56+
** {CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR}
57+
** {CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR}
58+
59+
=== Version History
60+
61+
* Revision 0.9.0, 2024-03-25
62+
** First assigned version (provisional).

api/opencl_platform_layer.asciidoc

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1985,6 +1985,26 @@ include::{generated}/api/version-notes/CL_DEVICE_INTEGER_DOT_PRODUCT_ACCELERATIO
19851985
is missing before version 2.0 of the extension.
19861986
endif::cl_khr_integer_dot_product[]
19871987

1988+
ifdef::cl_khr_kernel_clock[]
1989+
| {CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR_anchor}
1990+
1991+
include::{generated}/api/version-notes/CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR.asciidoc[]
1992+
| {cl_device_kernel_clock_capabilities_khr_TYPE}
1993+
| Returns the kernel clock capabilities of the device. +
1994+
1995+
{CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR_anchor} is set when kernels are
1996+
allowed to call the `clock_read_device` and `clock_read_hilo_device`
1997+
OpenCL-C functions.
1998+
1999+
{CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR_anchor} is set when kernels
2000+
are allowed to call the `clock_read_work_group` and
2001+
`clock_read_hilo_work_group` OpenCL-C functions.
2002+
2003+
{CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR_anchor} is set when kernels
2004+
are allowed to call the `clock_read_sub_group` and
2005+
`clock_read_hilo_sub_group` OpenCL-C functions.
2006+
endif::cl_khr_kernel_clock[]
2007+
19882008
ifdef::cl_khr_pci_bus_info[]
19892009
| {CL_DEVICE_PCI_BUS_INFO_KHR_anchor}
19902010

@@ -2080,6 +2100,23 @@ returned for {CL_DEVICE_INTEGER_DOT_PRODUCT_CAPABILITIES_KHR}:
20802100
|====
20812101
endif::cl_khr_integer_dot_product[]
20822102

2103+
ifdef::cl_khr_kernel_clock[]
2104+
OpenCL 3 devices must report the following feature macros via
2105+
{CL_DEVICE_OPENCL_C_FEATURES} when the corresponding bit is set in the bitfield
2106+
returned for {CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR}:
2107+
2108+
[cols="1,1",options="header"]
2109+
|====
2110+
| Feature Bit | Feature Macro
2111+
| {CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR}
2112+
| {opencl_c_kernel_clock_scope_device}
2113+
| {CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR}
2114+
| {opencl_c_kernel_clock_scope_work_group}
2115+
| {CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR}
2116+
| {opencl_c_kernel_clock_scope_sub_group}
2117+
|====
2118+
endif::cl_khr_kernel_clock[]
2119+
20832120
ifdef::cl_khr_external_semaphore[]
20842121
One of the two queries {CL_DEVICE_SEMAPHORE_IMPORT_HANDLE_TYPES_KHR} and
20852122
{CL_DEVICE_SEMAPHORE_EXPORT_HANDLE_TYPES_KHR} must return a non-empty list

c/feature-dictionary.asciidoc

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,3 +145,27 @@ endif::[]
145145
ifndef::backend-html5[]
146146
:opencl_c_integer_dot_product_input_4x8bit_packed: pass:q[`\__opencl_c_&#8203;integer_&#8203;dot_&#8203;product_&#8203;input_&#8203;4x8bit_&#8203;packed`]
147147
endif::[]
148+
149+
// opencl_c_kernel_clock_scope_device
150+
ifdef::backend-html5[]
151+
:opencl_c_kernel_clock_scope_device: pass:q[`\__opencl_c_<wbr>kernel_<wbr>clock_<wbr>scope_<wbr>device`]
152+
endif::[]
153+
ifndef::backend-html5[]
154+
:opencl_c_kernel_clock_scope_device: pass:q[`\__opencl_c_&#8203;kernel_&#8203;clock_&#8203;scope_&#8203;device`]
155+
endif::[]
156+
157+
// opencl_c_kernel_clock_scope_work_group
158+
ifdef::backend-html5[]
159+
:opencl_c_kernel_clock_scope_work_group: pass:q[`\__opencl_c_<wbr>kernel_<wbr>clock_<wbr>scope_<wbr>work_<wbr>group`]
160+
endif::[]
161+
ifndef::backend-html5[]
162+
:opencl_c_kernel_clock_scope_work_group: pass:q[`\__opencl_c_&#8203;kernel_&#8203;clock_&#8203;scope_&#8203;work_&#8203;group`]
163+
endif::[]
164+
165+
// opencl_c_kernel_clock_scope_sub_group
166+
ifdef::backend-html5[]
167+
:opencl_c_kernel_clock_scope_sub_group: pass:q[`\__opencl_c_<wbr>kernel_<wbr>clock_<wbr>scope_<wbr>sub_<wbr>group`]
168+
endif::[]
169+
ifndef::backend-html5[]
170+
:opencl_c_kernel_clock_scope_sub_group: pass:q[`\__opencl_c_&#8203;kernel_&#8203;clock_&#8203;scope_&#8203;sub_&#8203;group`]
171+
endif::[]

env/extensions.asciidoc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,22 @@ Otherwise, for the *GroupUniformArithmeticKHR* scan and reduction instructions,
379379
** *OpTypeInt* with _Width_ equal to `32` or `64` (equivalent to `int`, `uint`, `long`, and `ulong`)
380380
** *OpTypeFloat* (equivalent to `half`, `float`, and `double`)
381381

382+
==== `cl_khr_kernel_clock`
383+
384+
If the OpenCL environment supports the extension `cl_khr_kernel_clock`, then the environment must accept modules that declare use of the extension `SPV_KHR_shader_clock` via *OpExtension*.
385+
386+
If the OpenCL environment supports the extension `cl_khr_kernel_clock` and use of the SPIR-V extension `SPV_KHR_shader_clock` is declared in the module via *OpExtension*, then the environment must accept modules that declare the following SPIR-V capability:
387+
388+
* *ShaderClockKHR*
389+
390+
For the *OpReadClockKHR* instruction requiring this capability, supported values for _Scope_ are:
391+
392+
* *Device*, if `CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR` is supported
393+
* *Workgroup*, if `CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR` is supported
394+
* *Subgroup*, if `CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR` is supported
395+
396+
For unsupported _Scope_ values, the behavior of *OpReadClockKHR* is undefined.
397+
382398
=== Embedded Profile Extensions
383399

384400
==== `cles_khr_int64`

ext/provisional_notice.asciidoc

Lines changed: 0 additions & 12 deletions
This file was deleted.

ext/quick_reference.asciidoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,10 @@ Language Specifications.
208208
| Integer dot product operations
209209
| Extension
210210

211+
| [[cl_khr_kernel_clock]] link:{APISpecURL}#cl_khr_kernel_clock[`cl_khr_kernel_clock`]
212+
| Sample Clock Values Within a Kernel
213+
| Extension
214+
211215
| [[cl_khr_mipmap_image]] link:{APISpecURL}#cl_khr_mipmap_image[`cl_khr_mipmap_image`]
212216
| Create and Use Images with Mipmaps
213217
| Extension

xml/cl.xml

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,7 @@ server's OpenCL/api-docs repository.
254254
<type category="define">typedef <type>cl_uint</type> <name>cl_image_requirements_info_ext</name>;</type>
255255
<type category="define">typedef <type>cl_bitfield</type> <name>cl_platform_command_buffer_capabilities_khr</name>;</type>
256256
<type category="define">typedef <type>cl_bitfield</type> <name>cl_mutable_dispatch_asserts_khr</name></type>
257+
<type category="define">typedef <type>cl_bitfield</type> <name>cl_device_kernel_clock_capabilities_khr</name>;</type>
257258

258259
<comment>Structure types</comment>
259260
<type category="struct" name="cl_dx9_surface_info_khr">
@@ -1386,6 +1387,13 @@ server's OpenCL/api-docs repository.
13861387
<unused start="19" end="63"/>
13871388
</enums>
13881389

1390+
<enums name="cl_device_kernel_clock_capabilities_khr" vendor="Khronos" type="bitmask">
1391+
<enum bitpos="0" name="CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR"/>
1392+
<enum bitpos="1" name="CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR"/>
1393+
<enum bitpos="2" name="CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR"/>
1394+
<unused start="3" end="63"/>
1395+
</enums>
1396+
13891397
<enums start="0x10000" end="0x1FFFF" name="cl_khronos_vendor_id" vendor="Khronos">
13901398
<comment>
13911399
In order to synchronize vendor IDs across Khronos APIs, Vulkan's vk.xml
@@ -1545,7 +1553,8 @@ server's OpenCL/api-docs repository.
15451553
<enum value="0x1073" name="CL_DEVICE_INTEGER_DOT_PRODUCT_CAPABILITIES_KHR"/>
15461554
<enum value="0x1074" name="CL_DEVICE_INTEGER_DOT_PRODUCT_ACCELERATION_PROPERTIES_8BIT_KHR"/>
15471555
<enum value="0x1075" name="CL_DEVICE_INTEGER_DOT_PRODUCT_ACCELERATION_PROPERTIES_4x8BIT_PACKED_KHR"/>
1548-
<unused start="0x1076" end="0x107F" comment="Reserved for cl_device_info"/>
1556+
<enum value="0x1076" name="CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR"/>
1557+
<unused start="0x1077" end="0x107F" comment="Reserved for cl_device_info"/>
15491558
<enum value="0x1080" name="CL_CONTEXT_REFERENCE_COUNT"/>
15501559
<enum value="0x1081" name="CL_CONTEXT_DEVICES"/>
15511560
<enum value="0x1082" name="CL_CONTEXT_PROPERTIES"/>
@@ -7477,5 +7486,21 @@ server's OpenCL/api-docs repository.
74777486
<command name="clCancelCommandsIMG"/>
74787487
</require>
74797488
</extension>
7489+
<extension name="cl_khr_kernel_clock" supported="opencl" ratified="opencl" provisional="true">
7490+
<require>
7491+
<type name="CL/cl.h"/>
7492+
</require>
7493+
<require comment="cl_device_info">
7494+
<enum name="CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR"/>
7495+
</require>
7496+
<require>
7497+
<type name="cl_device_kernel_clock_capabilities_khr"/>
7498+
</require>
7499+
<require comment="cl_device_kernel_clock_capabilities_khr">
7500+
<enum name="CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR"/>
7501+
<enum name="CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR"/>
7502+
<enum name="CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR"/>
7503+
</require>
7504+
</extension>
74807505
</extensions>
74817506
</registry>

0 commit comments

Comments
 (0)