Skip to content

Commit d205cd5

Browse files
authored
clarify async copies and wait group events must be convergent (#1015)
1 parent 69c4a0b commit d205cd5

1 file changed

Lines changed: 17 additions & 19 deletions

File tree

OpenCL_C.txt

Lines changed: 17 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -6382,6 +6382,23 @@ The OpenCL C programming language implements the <<table-builtin-async-copy,
63826382
following functions>> that provide asynchronous copies between `global` and
63836383
local memory and a prefetch from `global` memory.
63846384

6385+
The async copy and wait group events functions are performed by all work-items
6386+
in a work-group and therefore must be encountered by all work-items in a
6387+
work-group executing the kernel with the same argument values, otherwise the
6388+
results are undefined.
6389+
This rule applies to ND-ranges implemented with uniform and non-uniform
6390+
work-groups.
6391+
6392+
If an async copy or wait group events function is inside a conditional statement
6393+
then all work-items in the work-group must enter the conditional if any
6394+
work-item in the work-group enters the conditional statement and executes the
6395+
async copy or wait group events function.
6396+
6397+
If an async copy or wait group events function is inside a loop then all
6398+
work-items in the work-group must execute the async copy or wait group events
6399+
function on each iteration of the loop if any work-item executes the async copy
6400+
or wait group events function on that iteration.
6401+
63856402
We use the generic type name `gentype` to indicate the built-in data types `char`,
63866403
`char__n__`, `uchar`, `uchar__n__`, `short`, `short__n__`,
63876404
`ushort`, `ushort__n__`, `int`, `int__n__`, `uint`,
@@ -6402,13 +6419,6 @@ _n_ is 2, 3 footnote:[{fn-vec3-async-copy}], 4, 8, or 16.
64026419
const {local} gentype *_src_, size_t _num_gentypes_, event_t _event_)
64036420
| Perform an async copy of _num_gentypes_ gentype elements from _src_ to
64046421
_dst_.
6405-
The async copy is performed by all work-items in a work-group and this
6406-
built-in function must therefore be encountered by all work-items in a
6407-
work-group executing the kernel with the same argument values;
6408-
otherwise the results are undefined.
6409-
This rule applies to ND-ranges implemented with uniform and
6410-
non-uniform work-groups.
6411-
64126422
Returns an event object that can be used by *wait_group_events* to
64136423
wait for the async copy to finish.
64146424
The _event_ argument can also be used to associate the
@@ -6436,12 +6446,6 @@ _n_ is 2, 3 footnote:[{fn-vec3-async-copy}], 4, 8, or 16.
64366446
element read from _src_.
64376447
The _dst_stride_ is the stride in elements for each `gentype` element
64386448
written to _dst_.
6439-
The async gather is performed by all work-items in a work-group.
6440-
This built-in function must therefore be encountered by all work-items
6441-
in a work-group executing the kernel with the same argument values;
6442-
otherwise the results are undefined.
6443-
This rule applies to ND-ranges implemented with uniform and
6444-
non-uniform work-groups
64456449

64466450
Returns an event object that can be used by *wait_group_events* to
64476451
wait for the async copy to finish.
@@ -6470,12 +6474,6 @@ _n_ is 2, 3 footnote:[{fn-vec3-async-copy}], 4, 8, or 16.
64706474
to complete.
64716475
The event objects specified in _event_list_ will be released after the
64726476
wait is performed.
6473-
6474-
This function must be encountered by all work-items in a work-group
6475-
executing the kernel with the same _num_events_ and event objects
6476-
specified in _event_list_; otherwise the results are undefined.
6477-
This rule applies to ND-ranges implemented with uniform and
6478-
non-uniform work-groups
64796477
| |
64806478
| void **prefetch**(const {global} gentype *_p_, size_t _num_gentypes_)
64816479
| Prefetch `_num_gentypes_ * sizeof(gentype)` bytes into the global

0 commit comments

Comments
 (0)