@@ -6382,6 +6382,23 @@ The OpenCL C programming language implements the <<table-builtin-async-copy,
63826382following functions>> that provide asynchronous copies between `global` and
63836383local memory and a prefetch from `global` memory.
63846384
6385+ The async copy and wait group events functions are performed by all work-items
6386+ in a work-group and therefore must be encountered by all work-items in a
6387+ work-group executing the kernel with the same argument values, otherwise the
6388+ results are undefined.
6389+ This rule applies to ND-ranges implemented with uniform and non-uniform
6390+ work-groups.
6391+
6392+ If an async copy or wait group events function is inside a conditional statement
6393+ then all work-items in the work-group must enter the conditional if any
6394+ work-item in the work-group enters the conditional statement and executes the
6395+ async copy or wait group events function.
6396+
6397+ If an async copy or wait group events function is inside a loop then all
6398+ work-items in the work-group must execute the async copy or wait group events
6399+ function on each iteration of the loop if any work-item executes the async copy
6400+ or wait group events function on that iteration.
6401+
63856402We use the generic type name `gentype` to indicate the built-in data types `char`,
63866403`char__n__`, `uchar`, `uchar__n__`, `short`, `short__n__`,
63876404`ushort`, `ushort__n__`, `int`, `int__n__`, `uint`,
@@ -6402,13 +6419,6 @@ _n_ is 2, 3 footnote:[{fn-vec3-async-copy}], 4, 8, or 16.
64026419 const {local} gentype *_src_, size_t _num_gentypes_, event_t _event_)
64036420 | Perform an async copy of _num_gentypes_ gentype elements from _src_ to
64046421 _dst_.
6405- The async copy is performed by all work-items in a work-group and this
6406- built-in function must therefore be encountered by all work-items in a
6407- work-group executing the kernel with the same argument values;
6408- otherwise the results are undefined.
6409- This rule applies to ND-ranges implemented with uniform and
6410- non-uniform work-groups.
6411-
64126422 Returns an event object that can be used by *wait_group_events* to
64136423 wait for the async copy to finish.
64146424 The _event_ argument can also be used to associate the
@@ -6436,12 +6446,6 @@ _n_ is 2, 3 footnote:[{fn-vec3-async-copy}], 4, 8, or 16.
64366446 element read from _src_.
64376447 The _dst_stride_ is the stride in elements for each `gentype` element
64386448 written to _dst_.
6439- The async gather is performed by all work-items in a work-group.
6440- This built-in function must therefore be encountered by all work-items
6441- in a work-group executing the kernel with the same argument values;
6442- otherwise the results are undefined.
6443- This rule applies to ND-ranges implemented with uniform and
6444- non-uniform work-groups
64456449
64466450 Returns an event object that can be used by *wait_group_events* to
64476451 wait for the async copy to finish.
@@ -6470,12 +6474,6 @@ _n_ is 2, 3 footnote:[{fn-vec3-async-copy}], 4, 8, or 16.
64706474 to complete.
64716475 The event objects specified in _event_list_ will be released after the
64726476 wait is performed.
6473-
6474- This function must be encountered by all work-items in a work-group
6475- executing the kernel with the same _num_events_ and event objects
6476- specified in _event_list_; otherwise the results are undefined.
6477- This rule applies to ND-ranges implemented with uniform and
6478- non-uniform work-groups
64796477| |
64806478| void **prefetch**(const {global} gentype *_p_, size_t _num_gentypes_)
64816479 | Prefetch `_num_gentypes_ * sizeof(gentype)` bytes into the global
0 commit comments