Skip to content

Commit 5211e88

Browse files
authored
clarify atomic operation descriptions (KhronosGroup#1500)
1 parent 8908648 commit 5211e88

2 files changed

Lines changed: 55 additions & 59 deletions

File tree

api/footnotes.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ To create an image object from another image object that share the data store be
6161
]
6262

6363
:fn-image-mem-fence: pass:n[ \
64-
This value for memory_scope can only be used with atomic_work_item_fence with flags set to `CLK_IMAGE_MEM_FENCE`. \
64+
This value for *memory_scope* can only be used with *atomic_work_item_fence* with flags set to `CLK_IMAGE_MEM_FENCE`. \
6565
]
6666

6767
:fn-int64-performance: pass:n[ \

api/opencl_architecture.asciidoc

Lines changed: 54 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1148,9 +1148,10 @@ enforced at a synchronization point.
11481148
IMPORTANT: This memory consistency model is <<unified-spec, missing
11491149
before>> version 2.0.
11501150

1151-
The OpenCL 2.x memory model tells programmers what they can expect from an
1152-
OpenCL 2.x implementation; which memory operations are guaranteed to happen in
1153-
which order and which memory values each read operation will return.
1151+
The OpenCL 2.x memory consistency model tells programmers what they can expect
1152+
from an OpenCL 2.x or newer implementation; which memory operations are
1153+
guaranteed to happen in which order and which memory values each read operation
1154+
will return.
11541155
The memory model tells compiler writers which restrictions they must follow
11551156
when implementing compiler optimizations; which variables they can cache in
11561157
registers and when they can move reads or writes around a barrier or atomic
@@ -1159,7 +1160,7 @@ The memory model also tells hardware designers about limitations on hardware
11591160
optimizations; for example, when they must flush or invalidate hardware
11601161
caches.
11611162

1162-
The memory consistency model in OpenCL 2.x is based on the memory model from
1163+
The OpenCL 2.x memory consistency model is based on the memory model from
11631164
the ISO C11 programming language.
11641165
To help make the presentation more precise and self-contained, we include
11651166
modified paragraphs taken verbatim from the ISO C11 international standard.
@@ -1175,24 +1176,24 @@ Each access to a memory location sees the last assignment to that location
11751176
in that interleaving.
11761177
While sequential consistency is relatively straightforward for a programmer
11771178
to reason about, implementing sequential consistency is expensive.
1178-
Therefore, OpenCL 2.x implements a relaxed memory consistency model; i.e. it is
1179-
possible to write programs where the loads from memory violate sequential
1180-
consistency.
1179+
Therefore, the OpenCL 2.x memory consistency model is a relaxed memory
1180+
consistency model; i.e. it is possible to write programs where the loads from
1181+
memory violate sequential consistency.
11811182
Fortunately, if a program does not contain any races and if the program only
11821183
uses atomic operations that utilize the sequentially consistent memory order
1183-
(the default memory ordering for OpenCL 2.x), OpenCL programs appear to execute
1184-
with sequential consistency.
1184+
(the default memory ordering for OpenCL C 2.x), OpenCL programs appear to
1185+
execute with sequential consistency.
11851186

11861187
Programmers can to some degree control how the memory model is relaxed by
11871188
choosing the memory order for synchronization operations.
11881189
The precise semantics of synchronization and the memory orders are formally
11891190
defined in <<memory-ordering-rules, Memory Ordering Rules>>.
11901191
Here, we give a high level description of how these memory orders apply to
11911192
atomic operations on atomic objects shared between units of execution.
1192-
OpenCL 2.x memory_order choices are based on those from the ISO C11 standard
1193+
The OpenCL 2.x memory orders are based on those from the ISO C11 standard
11931194
memory model.
1194-
They are specified in certain OpenCL functions through the following
1195-
enumeration constants:
1195+
They are specified in certain OpenCL C functions through the following
1196+
*memory_order* enumeration constants:
11961197

11971198
* *memory_order_relaxed*: implies no order constraints.
11981199
This memory order can be used safely to increment counters that are
@@ -1230,13 +1231,13 @@ detailed rules for when synchronisation must occur.
12301231
loads and stores from different units of execution appear to be simply
12311232
interleaved.
12321233

1233-
Regardless of which memory_order is specified, resolving constraints on
1234+
Regardless of which memory order is specified, resolving constraints on
12341235
memory operations across a heterogeneous platform adds considerable overhead
12351236
to the execution of a program.
12361237
An OpenCL platform may be able to optimize certain operations that depend on
12371238
the features of the memory consistency model by restricting the scope of the
12381239
memory operations.
1239-
Distinct memory scopes are defined by the values of the memory_scope
1240+
Distinct memory scopes are defined by the values of the *memory_scope*
12401241
enumeration constant:
12411242

12421243
* *memory_scope_work_item*: memory-ordering constraints only apply within
@@ -1299,8 +1300,8 @@ detailed rules behind the relaxed memory models and go directly to
12991300

13001301
=== Overview of Atomic and Fence Operations
13011302

1302-
OpenCL 2.x has a number of _synchronization operations_ that are used to define
1303-
memory order constraints in a program.
1303+
OpenCL C 2.x has a number of _synchronization operations_ that are used to
1304+
define memory order constraints in a program.
13041305
They play a special role in controlling how memory operations in one unit of
13051306
execution (such as work-items or, when using SVM a host thread) are made
13061307
visible to another.
@@ -1310,17 +1311,13 @@ operations_ and _fences_.
13101311
Atomic operations are indivisible.
13111312
They either occur completely or not at all.
13121313
These operations are used to order memory operations between units of
1313-
execution and hence they are parameterized with the memory_order and
1314-
memory_scope parameters defined by the OpenCL memory consistency model.
1314+
execution and hence they are parameterized with the memory order and
1315+
memory scope parameters defined by the OpenCL memory consistency model.
13151316
The atomic operations for OpenCL kernel languages are similar to the
13161317
corresponding operations defined by the C11 standard.
13171318

1318-
The OpenCL 2.x atomic operations apply to variables of an atomic type (a
1319-
subset of those in the C11 standard) including atomic versions of the int,
1320-
uint, long, ulong, float, double, half, intptr_t, uintptr_t, size_t, and
1321-
ptrdiff_t types.
1322-
However, support for some of these atomic types depends on support for the
1323-
corresponding regular types.
1319+
The OpenCL C 2.x atomic operations apply to variables of an atomic type (a
1320+
subset of those in the C11 standard).
13241321

13251322
An atomic operation on one or more memory locations is either an acquire
13261323
operation, a release operation, or both an acquire and release operation.
@@ -1336,40 +1333,41 @@ The orders *memory_order_acquire* (used for reads), *memory_order_release*
13361333
(used for writes), and *memory_order_acq_rel* (used for read-modify-write
13371334
operations) are used for simple communication between units of execution
13381335
using shared variables.
1339-
Informally, executing a *memory_order_release* on an atomic object A makes
1336+
Informally, executing a *memory_order_release* on an atomic object *A* makes
13401337
all previous side effects visible to any unit of execution that later
1341-
executes a *memory_order_acquire* on A.
1338+
executes a *memory_order_acquire* on *A*.
13421339
The orders *memory_order_acquire*, *memory_order_release*, and
13431340
*memory_order_acq_rel* do not provide sequential consistency for race-free
13441341
programs because they will not ensure that atomic stores followed by atomic
13451342
loads become visible to other threads in that order.
13461343

13471344
[[atomic-fence-orders]]
1348-
The fence operation is atomic_work_item_fence, which includes a memory_order
1349-
argument as well as the memory_scope and cl_mem_fence_flags arguments.
1350-
Depending on the memory_order argument, this operation:
1351-
1352-
* has no effects, if *memory_order_relaxed*;
1353-
* is an acquire fence, if *memory_order_acquire*;
1354-
* is a release fence, if *memory_order_release*;
1355-
* is both an acquire fence and a release fence, if *memory_order_acq_rel*;
1345+
The fence operation is *atomic_work_item_fence*, which includes a memory order
1346+
argument as well as memory scope and memory flag arguments.
1347+
Depending on the memory order argument, this operation:
1348+
1349+
* has no effects, if the memory order is *memory_order_relaxed*;
1350+
* is an acquire fence, if the memory order is *memory_order_acquire*;
1351+
* is a release fence, if the memory order is *memory_order_release*;
1352+
* is both an acquire fence and a release fence, if the memory order is
1353+
*memory_order_acq_rel*;
13561354
* is a sequentially-consistent fence with both acquire and release
1357-
semantics, if *memory_order_seq_cst*.
1355+
semantics, if the memory order is *memory_order_seq_cst*.
13581356

13591357
If specified, the cl_mem_fence_flags argument must be `CLK_IMAGE_MEM_FENCE`,
13601358
`CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, or `CLK_GLOBAL_MEM_FENCE |
13611359
CLK_LOCAL_MEM_FENCE`.
13621360

1363-
The `atomic_work_item_fence(CLK_IMAGE_MEM_FENCE, ...)` built-in function must be
1364-
used to make sure that sampler-less writes are visible to later reads by the
1365-
same work-item.
1366-
Without use of the atomic_work_item_fence function, write-read coherence on
1361+
The *atomic_work_item_fence* built-in function must be used with
1362+
`CLK_IMAGE_MEM_FENCE` to make sure that sampler-less writes are visible to later
1363+
reads by the same work-item.
1364+
Without use of the *atomic_work_item_fence* function, write-read coherence on
13671365
image objects is not guaranteed: if a work-item reads from an image to which
1368-
it has previously written without an intervening atomic_work_item_fence, it
1366+
it has previously written without an intervening *atomic_work_item_fence*, it
13691367
is not guaranteed that those previous writes are visible to the work-item.
13701368

1371-
The synchronization operations in OpenCL 2.x can be parameterized by a
1372-
memory_scope.
1369+
The synchronization operations in OpenCL C 2.x can be parameterized by a
1370+
memory scope.
13731371
Memory scopes control the extent that an atomic operation or fence is
13741372
visible with respect to the memory model.
13751373
These memory scopes may be used when performing atomic operations and fences
@@ -1500,9 +1498,9 @@ A local memory action *A* local-happens-before a local memory action *B* if
15001498
* For some local memory action *C*, *A* local-happens-before *C* and *C*
15011499
local-happens-before *B*.
15021500

1503-
An OpenCL 2.x implementation shall ensure that no program execution
1504-
demonstrates a cycle in either the local-happens-before relation or the
1505-
global-happens-before relation.
1501+
An implementation of the OpenCL 2.x memory consistency model shall ensure that
1502+
no program execution demonstrates a cycle in either the local-happens-before
1503+
relation or the global-happens-before relation.
15061504

15071505
NOTE: The global- and local-happens-before relations are critical to
15081506
defining what values are read and when data races occur.
@@ -1584,12 +1582,12 @@ This requirement is known as write-read coherence.
15841582
This and following sections describe how different program actions in kernel
15851583
C code and the host program contribute to the local- and
15861584
global-happens-before relations.
1587-
This section discusses ordering rules for OpenCL 2.x atomic operations.
1585+
This section discusses ordering rules for OpenCL C 2.x atomic operations.
15881586

1589-
<<device-side-enqueue, Device-side enqueue>> defines the enumerated type
1590-
memory_order.
1587+
The <<memory-consistency-model>> section defines the enumerated type
1588+
*memory_order*.
15911589

1592-
* For *memory_order_relaxed*, no operation orders memory.
1590+
* For *memory_order_relaxed*, there is no memory ordering.
15931591
* For *memory_order_release*, *memory_order_acq_rel*, and
15941592
*memory_order_seq_cst*, a store operation performs a release operation
15951593
on the affected memory location.
@@ -1714,16 +1712,16 @@ reasonable amount of time.
17141712
<<iso-c11,[C11 standard, Section 7.17.3, paragraph 16.]>>
17151713

17161714
As long as the following conditions are met, a host program sharing SVM memory
1717-
with a kernel executing on one or more OpenCL 2.x devices may use atomic and
1718-
synchronization operations to ensure that its assignments, and those of the
1719-
kernel, are visible to each other:
1715+
with a kernel executing on one or more OpenCL 2.x or newer devices may use
1716+
atomic and synchronization operations to ensure that its assignments, and those
1717+
of the kernel, are visible to each other:
17201718

17211719
. Either fine-grained buffer or fine-grained system SVM must be used to
17221720
share memory.
17231721
While coarse-grained buffer SVM allocations may support atomic
17241722
operations, visibility on these allocations is not guaranteed except at
17251723
map and unmap operations.
1726-
. The optional OpenCL 2.x SVM atomic-controlled visibility specified by
1724+
. The optional OpenCL SVM atomic-controlled visibility specified by
17271725
provision of the {CL_MEM_SVM_ATOMICS} flag must be supported by the device
17281726
and the flag provided to the SVM buffer on allocation.
17291727
. The host atomic and synchronization operations must be compatible with
@@ -1739,11 +1737,11 @@ all_svm_devices scope.
17391737
[[memory-ordering-fence]]
17401738
==== Fence Operations
17411739

1742-
This section describes how the OpenCL 2.x fence operations contribute to the
1740+
This section describes how the OpenCL C 2.x fence operations contribute to the
17431741
local- and global-happens-before relations.
17441742

17451743
Earlier, we introduced synchronization primitives called fences.
1746-
Fences can utilize the acquire memory_order, release memory_order, or both.
1744+
Fences can utilize the acquire memory order, release memory order, or both.
17471745
A fence with acquire semantics is called an acquire fence; a fence with
17481746
release semantics is called a release fence. The <<atomic-fence-orders,
17491747
overview of atomic and fence operations>> section describes the memory orders
@@ -1981,8 +1979,6 @@ following:
19811979
command-queue, then the event (including the event implied between *C*
19821980
and *C1* due to the in-order queue) signaling *C*'s completion
19831981
global-synchronizes-with *C1*.
1984-
Note that in OpenCL 2.x, only a host command-queue can be configured as
1985-
an in-order queue.
19861982
. If an API call enqueues a marker command *C* with an empty list of
19871983
events upon which *C* should wait, then the events of all commands
19881984
enqueued prior to *C* in the command-queue global-synchronize-with *C*.
@@ -2044,7 +2040,7 @@ In this situation:
20442040
enqueued from the host to a single device is guaranteed under the memory
20452041
ordering rules described earlier in this section.
20462042

2047-
If fine-grain SVM is used but without support for the OpenCL 2.x atomic
2043+
If fine-grain SVM is used but without support for SVM atomic
20482044
operations, then the host and devices can concurrently read the same memory
20492045
locations and can concurrently update non-overlapping memory regions, but
20502046
attempts to update the same memory locations are undefined.

0 commit comments

Comments
 (0)