Skip to content

Commit c81008b

Browse files
author
Ewan Crawford
committed
Redefine command-buffer simultaneous-use
As discussed in KhronosGroup#891 the current definition of [simultaneous use](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#_command_buffers) is hard for users to reason about. Instead a better model is one simultaneous-use isn't a contraint of the command-buffer submission, but it's execution. That is simultaneous-use occurs when a command-buffer is enqueued for execution while any previous submission of the command-buffer is still in-flight, and there any no scheduling dependencies expressed to serialize execution. This means that pipelined submissions of a command-buffer, where there is an in-order queue, barrier, or cl_events to serialize execution, is always valid usage without this optional simultaneous-use device feature. To avoid the runtime having to incur overheads from monitoring when simultaneous-use occurs so it can throw an error, violating this valid usage for non simultaneous-use command-buffers is UB. Following from this related to KhronosGroup#1311 the pending state has also been removed from the command-buffer lifecycle, and there is only the binary states of recording and executable. This is because a user can use the existing OpenCL mechanisms of host waits and event queries to inspect the state of individual command-buffer enqueues to avoid simultaneous-use, and having this stored as a command-buffer state incurs the runtime overhead of tracking a previous submissions. The pending count concept also now makes less sense. The implications for updating a command-buffer is that the error behavior is removed from update, so simultaneous-use is a property of execution rather than enqueue. See CTS test ideas for how this could work. I think the CTS changes that are required for this as follows, but I could create a separate CTS issue to track work once/if this PR merge. Or we could try prototype the CTS changes to give confidence in the spec change. * Remove test for pending state query * Either rework existing simulataneous use tests so that it tests the new definition, or delete them and create new more suitable tests without tech-debt from old definition. * Add tests for pipelined submission of a command-buffer not created with simultaneous-use. Ideally stressing indriect dependencies as well as direct ones: ** in-order queue to express depdencies ** out-of-order queue with event dependencies to express depenencies ** barrier to express for dependencies * Add cl_khr_command_buffer_mutable_dispatch tests for updating and enqueueing pipelined submissions of a non-simultaneous use command-buffer with depdencies between each enqueue, and only do a blocking wait at the end. * Add cl_khr_command_buffer_mutable_dispatch tests for a simultaneous-use command-buffer, where two invocations are scheduled such that they can run concurrently, but the second invocation is updated such that it uses different inputs/outputs to avoid race conditions in the kernel.
1 parent 85da0d1 commit c81008b

4 files changed

Lines changed: 54 additions & 66 deletions

File tree

api/cl_khr_command_buffer.asciidoc

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@
44
include::{generated}/meta/{refprefix}cl_khr_command_buffer.txt[]
55

66
// *Revision*::
7-
// 0.9.6
7+
// 0.9.7
88
// *Extension and Version Dependencies*::
99
// This extension requires OpenCL 1.2 or later.
1010
// Buffering of SVM commands requires OpenCL 2.0 or later.
1111

1212
=== Other Extension Metadata
1313

1414
*Last Modified Date*::
15-
2024-12-13
15+
2025-07-10
1616
*IP Status*::
1717
No known IP claims.
1818
*Contributors*::
@@ -126,11 +126,16 @@ from the sync-point values returned is implementation defined.
126126

127127
==== Simultaneous Use
128128

129-
The optional simultaneous use capability was added to the extension so that
130-
vendors can support pipelined workflows, where command-buffers are repeatedly
131-
enqueued without blocking in user code. However, simultaneous use may result in
132-
command-buffers being more expensive to enqueue than in a sequential model, so
133-
the capability is optional to enable optimizations on command-buffer recording.
129+
The optional <<simultaneous-use, simultaneous use>> capability was added to the
130+
extension so that vendors could support concurrent execution of the same
131+
command-buffer. However, simultaneous use may result in command-buffers having
132+
a larger overhead to implement, so the capability is optional to enable
133+
optimizations when this usage isn't required by a user.
134+
135+
Instead the goal for the base level of functionality provided by the extension
136+
is to support pipelined workflows, where a command-buffer is repeatedly
137+
enqueued, with each enqueue expressing depdencies on any previous submissions,
138+
without the enqueue calls blocking in user code.
134139

135140
=== Interactions With Other Extensions
136141

@@ -257,7 +262,6 @@ features:
257262
* {cl_command_buffer_state_khr_TYPE}
258263
** {CL_COMMAND_BUFFER_STATE_RECORDING_KHR}
259264
** {CL_COMMAND_BUFFER_STATE_EXECUTABLE_KHR}
260-
** {CL_COMMAND_BUFFER_STATE_PENDING_KHR}
261265
* {cl_command_type_TYPE}
262266
** {CL_COMMAND_COMMAND_BUFFER_KHR}
263267
* New Error Codes
@@ -470,3 +474,6 @@ features:
470474
* 0.9.7, 2024-12-13
471475
** Refactor queue compatability between command-buffer creation and enqueue
472476
(experimental).
477+
* 0.9.8, 2025-07-10
478+
** Rework simultaneous use definition and remove pending state
479+
(experimental).

api/opencl_runtime_layer.asciidoc

Lines changed: 38 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -14618,13 +14618,14 @@ on one or more command-queues without any application code interaction.
1461814618
Grouping the operations together allows efficient enqueuing of repetitive
1461914619
operations, as well as enabling driver optimizations.
1462014620

14621-
Command-buffers are _sequential use_ by default, but may also be set to
14622-
_simultaneous use_ on creation if the device optionally supports this
14623-
capability.
14624-
A sequential use command-buffer must have a <<pending_count, Pending Count>>
14625-
of 0 or 1.
14626-
The simultaneous use capability removes this restriction and allows
14627-
command-buffers to have a <<pending_count, Pending Count>> greater than 1.
14621+
Upon creation a command-buffer is defined as being in the <<recording,
14622+
Recording>> state, in order for the command-buffer to be enqueued
14623+
it must first be finalized using {clFinalizeCommandBufferKHR} after which no
14624+
further commands can be recorded. A command-buffer is submitted for execution
14625+
on command-queues with a call to {clEnqueueCommandBufferKHR}. It is
14626+
always valid to call {clEnqueueCommandBufferKHR} with a command-buffer that
14627+
has previosuly been enqueued, provided the call doesn't violate the definition
14628+
of <<simultaneous-use, simultaneous use>>.
1462814629

1462914630
Command-buffers are created using an ordered list of command-queues that
1463014631
commands are recorded to and execute on by default. All these queue objects
@@ -14690,6 +14691,24 @@ If using layered extension {cl_khr_command_buffer_mutable_dispatch_EXT},
1469014691
usage>>.
1469114692
====
1469214693

14694+
Simultaneous use is defined using the _prerequisite_ terminology from the
14695+
<<_execution_model, execution model>>, and is an optional feature for devices
14696+
to support concurrent executions of a command-buffer. A command-buffer must
14697+
be created with {CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR} to avoid undefined
14698+
behavior if a simultaneous use usage pattern occurs.
14699+
14700+
[[simultaneous-use]]
14701+
Simultaneous Use:: When a command-buffer is submitted for
14702+
execution without a prerequisite on all the previous submissions of the same
14703+
command-buffer which are not in the {CL_COMPLETE} state.
14704+
14705+
An example of simultaneous use would be two submissions of the same
14706+
command-buffer to a single out-of-order queue, without any events or barriers
14707+
used to express a dependency between the two enqueue calls. Using a single
14708+
in-order queue, events, or barriers to express depdencies between submissions
14709+
of the same command-buffer would each be ways to avoid simultaneous use and are
14710+
valid usage of command-buffers created without the
14711+
{CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR} flag.
1469314712

1469414713
ifdef::cl_khr_command_buffer_multi_device[]
1469514714
=== Command-Buffers and Multiple Devices
@@ -14733,7 +14752,9 @@ endif::cl_khr_command_buffer_multi_device[]
1473314752

1473414753
=== Command-Buffer Lifecycle
1473514754

14736-
A command-buffer is always in one of the following states:
14755+
A command-buffer is created in the recording state and transitions to the
14756+
executable state when finalized, at which point it cannot move back to
14757+
the recording state.
1473714758

1473814759
[[recording]]
1473914760
Recording:: Initial state of a command-buffer on creation, where commands can be
@@ -14743,11 +14764,6 @@ recorded to the command-buffer.
1474314764
Executable:: State after command recording has finished with
1474414765
{clFinalizeCommandBufferKHR} and the command-buffer may be enqueued.
1474514766

14746-
[[pending]]
14747-
Pending:: Once a command-buffer has been enqueued to a command-queue it enters
14748-
the Pending state until completion, at which point it moves back to the
14749-
<<executable, Executable>> state.
14750-
1475114767
// Image generated from the following mermaid diagram description using https://mermaid.live
1475214768
// Ideally we'd use the asciidoctor-diagram extension to generate the rendered diagram, but
1475314769
// there are issues installing the gem with ruby 2.3.3
@@ -14757,21 +14773,10 @@ the Pending state until completion, at which point it moves back to the
1475714773
// stateDiagram-v2
1475814774
// [*] --> Recording: Create
1475914775
// Recording -->Executable: Finalize
14760-
// Executable --> Pending: Enqueue
14761-
// Pending --> Executable: Completion
1476214776
// ....
1476314777

1476414778
image::images/commandbuffer_lifecycle.png[align="center", title="Lifecycle of a command-buffer."]
1476514779

14766-
[[pending_count]]
14767-
The Pending Count is the number of copies of the command
14768-
buffer in the <<pending, Pending>> state.
14769-
By default a command-buffer's Pending Count must be 0 or 1.
14770-
If the command-buffer was created with
14771-
{CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR} then the command-buffer may have a
14772-
Pending Count greater than 1.
14773-
14774-
1477514780
=== Creating Command-Buffer Objects
1477614781

1477714782
[open,refpage='clCreateCommandBufferKHR',desc='Create a command-buffer',type='protos']
@@ -14813,9 +14818,9 @@ include::{generated}/api/version-notes/CL_COMMAND_BUFFER_FLAGS_KHR.asciidoc[]
1481314818
| This is a bitfield and can be set to a combination of the following values:
1481414819

1481514820
{CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR_anchor} - Allow multiple
14816-
instances of the command-buffer to be submitted to the device for
14817-
execution.
14818-
If set, devices must support
14821+
instances of the command-buffer to be scheduled for execution on the
14822+
device in a usage pattern that exhibits <<simultaneous-use,
14823+
simultaneous use>>. If set, devices must support
1481914824
{CL_COMMAND_BUFFER_CAPABILITY_SIMULTANEOUS_USE_KHR}.
1482014825

1482114826
include::{generated}/api/version-notes/CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR.asciidoc[]
@@ -14898,16 +14903,6 @@ ifdef::cl_khr_command_buffer_multi_device[]
1489814903
|====
1489914904
endif::cl_khr_command_buffer_multi_device[]
1490014905

14901-
[NOTE]
14902-
====
14903-
Upon creation the command-buffer is defined as being in the
14904-
<<recording, Recording>> state, in order for the command-buffer to be enqueued
14905-
it must first be finalized using {clFinalizeCommandBufferKHR} after which no
14906-
further commands can be recorded.
14907-
A command-buffer is submitted for execution on command-queues with a call to
14908-
{clEnqueueCommandBufferKHR}.
14909-
====
14910-
1491114906
// refError
1491214907

1491314908
{clCreateCommandBufferKHR} returns a valid non-zero command-buffer and
@@ -15089,9 +15084,6 @@ execution was successfully queued, or one of the errors below:
1508915084
* {CL_INVALID_COMMAND_BUFFER_KHR} if _command_buffer_ is not a valid
1509015085
command-buffer.
1509115086
* {CL_INVALID_OPERATION} if _command_buffer_ has not been finalized.
15092-
* {CL_INVALID_OPERATION} if _command_buffer_ was not created with the
15093-
{CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR} flag and is in the <<pending,
15094-
Pending>> state.
1509515087
* {CL_INVALID_VALUE} if _queues_ is `NULL` and _num_queues_ is > 0, or
1509615088
_queues_ is not `NULL` and _num_queues_ is 0.
1509715089
* {CL_INVALID_VALUE} if _num_queues_ is > 0 and not the same value as
@@ -15125,6 +15117,10 @@ execution was successfully queued, or one of the errors below:
1512515117
required by the OpenCL implementation on the host.
1512615118
--
1512715119

15120+
Calling {clEnqueueCommandBufferKHR} in a usage pattern that exhbits
15121+
<<simultaneous-use, simultaneous use>> when _command_buffer_ was not created
15122+
with the {CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR} flag results in undefined
15123+
behavior.
1512815124

1512915125
=== Recording Commands to a Command-Buffer
1513015126

@@ -16553,8 +16549,7 @@ include::{generated}/api/version-notes/clRemapCommandBufferKHR.asciidoc[]
1655316549
* _errcode_ret_ returns an appropriate error code.
1655416550
If _errcode_ret_ is `NULL`, no error code is returned.
1655516551

16556-
The returned command-buffer has the same state as the input command-buffer,
16557-
unless the input command-buffer is in the <<pending, Pending>> state, in
16552+
The returned command-buffer has the same state as the input command-buffer.
1655816553
which case the returned command-buffer has state <<executable, Executable>>.
1655916554

1656016555
// refError
@@ -16682,10 +16677,6 @@ one of the errors below is returned:
1668216677
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
1668316678
required by the OpenCL implementation on the host.
1668416679

16685-
Using this function when _command_buffer_ is in the <<pending, pending>>
16686-
state and not created with the {CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR} flag
16687-
causes undefined behavior.
16688-
1668916680
[NOTE]
1669016681
====
1669116682
Performant usage is to call {clUpdateMutableCommandsKHR} only when the
@@ -16903,18 +16894,10 @@ include::{generated}/api/version-notes/CL_COMMAND_BUFFER_STATE_KHR.asciidoc[]
1690316894
include::{generated}/api/version-notes/CL_COMMAND_BUFFER_STATE_RECORDING_KHR.asciidoc[]
1690416895

1690516896
{CL_COMMAND_BUFFER_STATE_EXECUTABLE_KHR_anchor} is returned when
16906-
_command_buffer_ has been finalized and there is not a <<pending,
16907-
Pending>> instance of _command_buffer_ awaiting completion on a
16908-
command_queue.
16897+
_command_buffer_ has been finalized.
1690916898

1691016899
include::{generated}/api/version-notes/CL_COMMAND_BUFFER_STATE_EXECUTABLE_KHR.asciidoc[]
1691116900

16912-
{CL_COMMAND_BUFFER_STATE_PENDING_KHR_anchor} is returned when an
16913-
instance of _command_buffer_ has been enqueued for execution but not
16914-
yet completed.
16915-
16916-
include::{generated}/api/version-notes/CL_COMMAND_BUFFER_STATE_PENDING_KHR.asciidoc[]
16917-
1691816901
| {CL_COMMAND_BUFFER_PROPERTIES_ARRAY_KHR_anchor}
1691916902

1692016903
include::{generated}/api/version-notes/CL_COMMAND_BUFFER_PROPERTIES_ARRAY_KHR.asciidoc[]

images/commandbuffer_lifecycle.png

-17.4 KB
Loading

xml/cl.xml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1360,7 +1360,6 @@ server's OpenCL/api-docs repository.
13601360
<enums name="cl_command_buffer_state_khr" vendor="Khronos">
13611361
<enum value="0" name="CL_COMMAND_BUFFER_STATE_RECORDING_KHR"/>
13621362
<enum value="1" name="CL_COMMAND_BUFFER_STATE_EXECUTABLE_KHR"/>
1363-
<enum value="2" name="CL_COMMAND_BUFFER_STATE_PENDING_KHR"/>
13641363
</enums>
13651364
<enums name="cl_mutable_dispatch_fields_khr" vendor="Khronos" type="bitmask">
13661365
<enum bitpos="0" name="CL_MUTABLE_DISPATCH_GLOBAL_OFFSET_KHR"/>
@@ -7284,7 +7283,7 @@ server's OpenCL/api-docs repository.
72847283
<enum name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
72857284
</require>
72867285
</extension>
7287-
<extension name="cl_khr_command_buffer" revision="0.9.7" supported="opencl" depends="CL_VERSION_1_2" ratified="opencl" experimental="true">
7286+
<extension name="cl_khr_command_buffer" revision="0.9.8" supported="opencl" depends="CL_VERSION_1_2" ratified="opencl" experimental="true">
72887287
<require>
72897288
<type name="CL/cl.h"/>
72907289
</require>
@@ -7331,7 +7330,6 @@ server's OpenCL/api-docs repository.
73317330
<require comment="cl_command_buffer_state_khr">
73327331
<enum name="CL_COMMAND_BUFFER_STATE_RECORDING_KHR"/>
73337332
<enum name="CL_COMMAND_BUFFER_STATE_EXECUTABLE_KHR"/>
7334-
<enum name="CL_COMMAND_BUFFER_STATE_PENDING_KHR"/>
73357333
</require>
73367334
<require comment="cl_command_type">
73377335
<enum name="CL_COMMAND_COMMAND_BUFFER_KHR"/>

0 commit comments

Comments
 (0)