Skip to content

Commit eeb3d80

Browse files
committed
clarify atomic operation descriptions
1 parent d1eb455 commit eeb3d80

2 files changed

Lines changed: 54 additions & 58 deletions

File tree

api/footnotes.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ To create an image object from another image object that share the data store be
6161
]
6262

6363
:fn-image-mem-fence: pass:n[ \
64-
This value for memory_scope can only be used with atomic_work_item_fence with flags set to `CLK_IMAGE_MEM_FENCE`. \
64+
This value for *memory_scope* can only be used with *atomic_work_item_fence* with flags set to `CLK_IMAGE_MEM_FENCE`. \
6565
]
6666

6767
:fn-int64-performance: pass:n[ \

api/opencl_architecture.asciidoc

Lines changed: 53 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1157,9 +1157,10 @@ enforced at a synchronization point.
11571157
IMPORTANT: This memory consistency model is <<unified-spec, missing
11581158
before>> version 2.0.
11591159

1160-
The OpenCL 2.x memory model tells programmers what they can expect from an
1161-
OpenCL 2.x implementation; which memory operations are guaranteed to happen in
1162-
which order and which memory values each read operation will return.
1160+
The OpenCL 2.x memory consistency model tells programmers what they can expect
1161+
from an OpenCL 2.x or newer implementation; which memory operations are
1162+
guaranteed to happen in which order and which memory values each read operation
1163+
will return.
11631164
The memory model tells compiler writers which restrictions they must follow
11641165
when implementing compiler optimizations; which variables they can cache in
11651166
registers and when they can move reads or writes around a barrier or atomic
@@ -1168,7 +1169,7 @@ The memory model also tells hardware designers about limitations on hardware
11681169
optimizations; for example, when they must flush or invalidate hardware
11691170
caches.
11701171

1171-
The memory consistency model in OpenCL 2.x is based on the memory model from
1172+
The OpenCL 2.x memory consistency model is based on the memory model from
11721173
the ISO C11 programming language.
11731174
To help make the presentation more precise and self-contained, we include
11741175
modified paragraphs taken verbatim from the ISO C11 international standard.
@@ -1184,24 +1185,24 @@ Each access to a memory location sees the last assignment to that location
11841185
in that interleaving.
11851186
While sequential consistency is relatively straightforward for a programmer
11861187
to reason about, implementing sequential consistency is expensive.
1187-
Therefore, OpenCL 2.x implements a relaxed memory consistency model; i.e. it is
1188-
possible to write programs where the loads from memory violate sequential
1189-
consistency.
1188+
Therefore, the OpenCL 2.x memory consistency model is a relaxed memory
1189+
consistency model; i.e. it is possible to write programs where the loads from
1190+
memory violate sequential consistency.
11901191
Fortunately, if a program does not contain any races and if the program only
11911192
uses atomic operations that utilize the sequentially consistent memory order
1192-
(the default memory ordering for OpenCL 2.x), OpenCL programs appear to execute
1193-
with sequential consistency.
1193+
(the default memory ordering for OpenCL C 2.x), OpenCL programs appear to
1194+
execute with sequential consistency.
11941195

11951196
Programmers can to some degree control how the memory model is relaxed by
11961197
choosing the memory order for synchronization operations.
11971198
The precise semantics of synchronization and the memory orders are formally
11981199
defined in <<memory-ordering-rules, Memory Ordering Rules>>.
11991200
Here, we give a high level description of how these memory orders apply to
12001201
atomic operations on atomic objects shared between units of execution.
1201-
OpenCL 2.x memory_order choices are based on those from the ISO C11 standard
1202+
The OpenCL C 2.x memory orders are based on those from the ISO C11 standard
12021203
memory model.
12031204
They are specified in certain OpenCL functions through the following
1204-
enumeration constants:
1205+
*memory_order* enumeration constants:
12051206

12061207
* *memory_order_relaxed*: implies no order constraints.
12071208
This memory order can be used safely to increment counters that are
@@ -1239,13 +1240,13 @@ detailed rules for when synchronisation must occur.
12391240
loads and stores from different units of execution appear to be simply
12401241
interleaved.
12411242

1242-
Regardless of which memory_order is specified, resolving constraints on
1243+
Regardless of which memory order is specified, resolving constraints on
12431244
memory operations across a heterogeneous platform adds considerable overhead
12441245
to the execution of a program.
12451246
An OpenCL platform may be able to optimize certain operations that depend on
12461247
the features of the memory consistency model by restricting the scope of the
12471248
memory operations.
1248-
Distinct memory scopes are defined by the values of the memory_scope
1249+
Distinct memory scopes are defined by the values of the *memory_scope*
12491250
enumeration constant:
12501251

12511252
* *memory_scope_work_item*: memory-ordering constraints only apply within
@@ -1308,8 +1309,8 @@ detailed rules behind the relaxed memory models and go directly to
13081309

13091310
=== Overview of Atomic and Fence Operations
13101311

1311-
OpenCL 2.x has a number of _synchronization operations_ that are used to define
1312-
memory order constraints in a program.
1312+
OpenCL C 2.x has a number of _synchronization operations_ that are used to
1313+
define memory order constraints in a program.
13131314
They play a special role in controlling how memory operations in one unit of
13141315
execution (such as work-items or, when using SVM a host thread) are made
13151316
visible to another.
@@ -1319,17 +1320,13 @@ operations_ and _fences_.
13191320
Atomic operations are indivisible.
13201321
They either occur completely or not at all.
13211322
These operations are used to order memory operations between units of
1322-
execution and hence they are parameterized with the memory_order and
1323-
memory_scope parameters defined by the OpenCL memory consistency model.
1323+
execution and hence they are parameterized with the memory order and
1324+
memory scope parameters defined by the OpenCL memory consistency model.
13241325
The atomic operations for OpenCL kernel languages are similar to the
13251326
corresponding operations defined by the C11 standard.
13261327

1327-
The OpenCL 2.x atomic operations apply to variables of an atomic type (a
1328-
subset of those in the C11 standard) including atomic versions of the int,
1329-
uint, long, ulong, float, double, half, intptr_t, uintptr_t, size_t, and
1330-
ptrdiff_t types.
1331-
However, support for some of these atomic types depends on support for the
1332-
corresponding regular types.
1328+
The OpenCL C 2.x atomic operations apply to variables of an atomic type (a
1329+
subset of those in the C11 standard).
13331330

13341331
An atomic operation on one or more memory locations is either an acquire
13351332
operation, a release operation, or both an acquire and release operation.
@@ -1345,40 +1342,41 @@ The orders *memory_order_acquire* (used for reads), *memory_order_release*
13451342
(used for writes), and *memory_order_acq_rel* (used for read-modify-write
13461343
operations) are used for simple communication between units of execution
13471344
using shared variables.
1348-
Informally, executing a *memory_order_release* on an atomic object A makes
1345+
Informally, executing a *memory_order_release* on an atomic object *A* makes
13491346
all previous side effects visible to any unit of execution that later
1350-
executes a *memory_order_acquire* on A.
1347+
executes a *memory_order_acquire* on *A*.
13511348
The orders *memory_order_acquire*, *memory_order_release*, and
13521349
*memory_order_acq_rel* do not provide sequential consistency for race-free
13531350
programs because they will not ensure that atomic stores followed by atomic
13541351
loads become visible to other threads in that order.
13551352

13561353
[[atomic-fence-orders]]
1357-
The fence operation is atomic_work_item_fence, which includes a memory_order
1358-
argument as well as the memory_scope and cl_mem_fence_flags arguments.
1359-
Depending on the memory_order argument, this operation:
1360-
1361-
* has no effects, if *memory_order_relaxed*;
1362-
* is an acquire fence, if *memory_order_acquire*;
1363-
* is a release fence, if *memory_order_release*;
1364-
* is both an acquire fence and a release fence, if *memory_order_acq_rel*;
1354+
The fence operation is *atomic_work_item_fence*, which includes a memory order
1355+
argument as well as memory scope and memory flag arguments.
1356+
Depending on the memory order argument, this operation:
1357+
1358+
* has no effects, if the memory order is *memory_order_relaxed*;
1359+
* is an acquire fence, if the memory order is *memory_order_acquire*;
1360+
* is a release fence, if the memory order is *memory_order_release*;
1361+
* is both an acquire fence and a release fence, if the memory order is
1362+
*memory_order_acq_rel*;
13651363
* is a sequentially-consistent fence with both acquire and release
1366-
semantics, if *memory_order_seq_cst*.
1364+
semantics, if the memory order is *memory_order_seq_cst*.
13671365

13681366
If specified, the cl_mem_fence_flags argument must be `CLK_IMAGE_MEM_FENCE`,
13691367
`CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, or `CLK_GLOBAL_MEM_FENCE |
13701368
CLK_LOCAL_MEM_FENCE`.
13711369

1372-
The `atomic_work_item_fence(CLK_IMAGE_MEM_FENCE, ...)` built-in function must be
1373-
used to make sure that sampler-less writes are visible to later reads by the
1374-
same work-item.
1375-
Without use of the atomic_work_item_fence function, write-read coherence on
1370+
The *atomic_work_item_fence* built-in function must be used with
1371+
`CLK_IMAGE_MEM_FENCE` to make sure that sampler-less writes are visible to later
1372+
reads by the same work-item.
1373+
Without use of the *atomic_work_item_fence* function, write-read coherence on
13761374
image objects is not guaranteed: if a work-item reads from an image to which
1377-
it has previously written without an intervening atomic_work_item_fence, it
1375+
it has previously written without an intervening *atomic_work_item_fence*, it
13781376
is not guaranteed that those previous writes are visible to the work-item.
13791377

1380-
The synchronization operations in OpenCL 2.x can be parameterized by a
1381-
memory_scope.
1378+
The synchronization operations in OpenCL C 2.x can be parameterized by a
1379+
memory scope.
13821380
Memory scopes control the extent that an atomic operation or fence is
13831381
visible with respect to the memory model.
13841382
These memory scopes may be used when performing atomic operations and fences
@@ -1509,9 +1507,9 @@ A local memory action *A* local-happens-before a local memory action *B* if
15091507
* For some local memory action *C*, *A* local-happens-before *C* and *C*
15101508
local-happens-before *B*.
15111509

1512-
An OpenCL 2.x implementation shall ensure that no program execution
1513-
demonstrates a cycle in either the local-happens-before relation or the
1514-
global-happens-before relation.
1510+
An implementation of the OpenCL 2.x memory consistency model shall ensure that
1511+
no program execution demonstrates a cycle in either the local-happens-before
1512+
relation or the global-happens-before relation.
15151513

15161514
NOTE: The global- and local-happens-before relations are critical to
15171515
defining what values are read and when data races occur.
@@ -1593,12 +1591,12 @@ This requirement is known as write-read coherence.
15931591
This and following sections describe how different program actions in kernel
15941592
C code and the host program contribute to the local- and
15951593
global-happens-before relations.
1596-
This section discusses ordering rules for OpenCL 2.x atomic operations.
1594+
This section discusses ordering rules for OpenCL C 2.x atomic operations.
15971595

1598-
<<device-side-enqueue, Device-side enqueue>> defines the enumerated type
1599-
memory_order.
1596+
The <<memory-consistency-model>> section defines the enumerated type
1597+
*memory_order*.
16001598

1601-
* For *memory_order_relaxed*, no operation orders memory.
1599+
* For *memory_order_relaxed*, there is no memory ordering.
16021600
* For *memory_order_release*, *memory_order_acq_rel*, and
16031601
*memory_order_seq_cst*, a store operation performs a release operation
16041602
on the affected memory location.
@@ -1723,16 +1721,16 @@ reasonable amount of time.
17231721
<<iso-c11,[C11 standard, Section 7.17.3, paragraph 16.]>>
17241722

17251723
As long as the following conditions are met, a host program sharing SVM memory
1726-
with a kernel executing on one or more OpenCL 2.x devices may use atomic and
1727-
synchronization operations to ensure that its assignments, and those of the
1728-
kernel, are visible to each other:
1724+
with a kernel executing on one or more OpenCL 2.x or newer devices may use
1725+
atomic and synchronization operations to ensure that its assignments, and those
1726+
of the kernel, are visible to each other:
17291727

17301728
. Either fine-grained buffer or fine-grained system SVM must be used to
17311729
share memory.
17321730
While coarse-grained buffer SVM allocations may support atomic
17331731
operations, visibility on these allocations is not guaranteed except at
17341732
map and unmap operations.
1735-
. The optional OpenCL 2.x SVM atomic-controlled visibility specified by
1733+
. The optional OpenCL SVM atomic-controlled visibility specified by
17361734
provision of the {CL_MEM_SVM_ATOMICS} flag must be supported by the device
17371735
and the flag provided to the SVM buffer on allocation.
17381736
. The host atomic and synchronization operations must be compatible with
@@ -1748,11 +1746,11 @@ all_svm_devices scope.
17481746
[[memory-ordering-fence]]
17491747
==== Fence Operations
17501748

1751-
This section describes how the OpenCL 2.x fence operations contribute to the
1749+
This section describes how the OpenCL C 2.x fence operations contribute to the
17521750
local- and global-happens-before relations.
17531751

17541752
Earlier, we introduced synchronization primitives called fences.
1755-
Fences can utilize the acquire memory_order, release memory_order, or both.
1753+
Fences can utilize the acquire memory order, release memory order, or both.
17561754
A fence with acquire semantics is called an acquire fence; a fence with
17571755
release semantics is called a release fence. The <<atomic-fence-orders,
17581756
overview of atomic and fence operations>> section describes the memory orders
@@ -1990,8 +1988,6 @@ following:
19901988
command-queue, then the event (including the event implied between *C*
19911989
and *C1* due to the in-order queue) signaling *C*'s completion
19921990
global-synchronizes-with *C1*.
1993-
Note that in OpenCL 2.x, only a host command-queue can be configured as
1994-
an in-order queue.
19951991
. If an API call enqueues a marker command *C* with an empty list of
19961992
events upon which *C* should wait, then the events of all commands
19971993
enqueued prior to *C* in the command-queue global-synchronize-with *C*.
@@ -2053,7 +2049,7 @@ In this situation:
20532049
enqueued from the host to a single device is guaranteed under the memory
20542050
ordering rules described earlier in this section.
20552051

2056-
If fine-grain SVM is used but without support for the OpenCL 2.x atomic
2052+
If fine-grain SVM is used but without support for SVM atomic
20572053
operations, then the host and devices can concurrently read the same memory
20582054
locations and can concurrently update non-overlapping memory regions, but
20592055
attempts to update the same memory locations are undefined.

0 commit comments

Comments
 (0)