@@ -1157,9 +1157,10 @@ enforced at a synchronization point.
11571157IMPORTANT: This memory consistency model is <<unified-spec, missing
11581158before>> version 2.0.
11591159
1160- The OpenCL 2.x memory model tells programmers what they can expect from an
1161- OpenCL 2.x implementation; which memory operations are guaranteed to happen in
1162- which order and which memory values each read operation will return.
1160+ The OpenCL 2.x memory consistency model tells programmers what they can expect
1161+ from an OpenCL 2.x or newer implementation; which memory operations are
1162+ guaranteed to happen in which order and which memory values each read operation
1163+ will return.
11631164The memory model tells compiler writers which restrictions they must follow
11641165when implementing compiler optimizations; which variables they can cache in
11651166registers and when they can move reads or writes around a barrier or atomic
@@ -1168,7 +1169,7 @@ The memory model also tells hardware designers about limitations on hardware
11681169optimizations; for example, when they must flush or invalidate hardware
11691170caches.
11701171
1171- The memory consistency model in OpenCL 2.x is based on the memory model from
1172+ The OpenCL 2.x memory consistency model is based on the memory model from
11721173the ISO C11 programming language.
11731174To help make the presentation more precise and self-contained, we include
11741175modified paragraphs taken verbatim from the ISO C11 international standard.
@@ -1184,24 +1185,24 @@ Each access to a memory location sees the last assignment to that location
11841185in that interleaving.
11851186While sequential consistency is relatively straightforward for a programmer
11861187to reason about, implementing sequential consistency is expensive.
1187- Therefore, OpenCL 2.x implements a relaxed memory consistency model; i.e. it is
1188- possible to write programs where the loads from memory violate sequential
1189- consistency.
1188+ Therefore, the OpenCL 2.x memory consistency model is a relaxed memory
1189+ consistency model; i.e. it is possible to write programs where the loads from
1190+ memory violate sequential consistency.
11901191Fortunately, if a program does not contain any races and if the program only
11911192uses atomic operations that utilize the sequentially consistent memory order
1192- (the default memory ordering for OpenCL 2.x), OpenCL programs appear to execute
1193- with sequential consistency.
1193+ (the default memory ordering for OpenCL C 2.x), OpenCL programs appear to
1194+ execute with sequential consistency.
11941195
11951196Programmers can to some degree control how the memory model is relaxed by
11961197choosing the memory order for synchronization operations.
11971198The precise semantics of synchronization and the memory orders are formally
11981199defined in <<memory-ordering-rules, Memory Ordering Rules>>.
11991200Here, we give a high level description of how these memory orders apply to
12001201atomic operations on atomic objects shared between units of execution.
1201- OpenCL 2.x memory_order choices are based on those from the ISO C11 standard
1202+ The OpenCL 2.x memory orders are based on those from the ISO C11 standard
12021203memory model.
1203- They are specified in certain OpenCL functions through the following
1204- enumeration constants:
1204+ They are specified in certain OpenCL C functions through the following
1205+ *memory_order* enumeration constants:
12051206
12061207 * *memory_order_relaxed*: implies no order constraints.
12071208 This memory order can be used safely to increment counters that are
@@ -1239,13 +1240,13 @@ detailed rules for when synchronisation must occur.
12391240 loads and stores from different units of execution appear to be simply
12401241 interleaved.
12411242
1242- Regardless of which memory_order is specified, resolving constraints on
1243+ Regardless of which memory order is specified, resolving constraints on
12431244memory operations across a heterogeneous platform adds considerable overhead
12441245to the execution of a program.
12451246An OpenCL platform may be able to optimize certain operations that depend on
12461247the features of the memory consistency model by restricting the scope of the
12471248memory operations.
1248- Distinct memory scopes are defined by the values of the memory_scope
1249+ Distinct memory scopes are defined by the values of the * memory_scope*
12491250enumeration constant:
12501251
12511252 * *memory_scope_work_item*: memory-ordering constraints only apply within
@@ -1308,8 +1309,8 @@ detailed rules behind the relaxed memory models and go directly to
13081309
13091310=== Overview of Atomic and Fence Operations
13101311
1311- OpenCL 2.x has a number of _synchronization operations_ that are used to define
1312- memory order constraints in a program.
1312+ OpenCL C 2.x has a number of _synchronization operations_ that are used to
1313+ define memory order constraints in a program.
13131314They play a special role in controlling how memory operations in one unit of
13141315execution (such as work-items or, when using SVM a host thread) are made
13151316visible to another.
@@ -1319,17 +1320,13 @@ operations_ and _fences_.
13191320Atomic operations are indivisible.
13201321They either occur completely or not at all.
13211322These operations are used to order memory operations between units of
1322- execution and hence they are parameterized with the memory_order and
1323- memory_scope parameters defined by the OpenCL memory consistency model.
1323+ execution and hence they are parameterized with the memory order and
1324+ memory scope parameters defined by the OpenCL memory consistency model.
13241325The atomic operations for OpenCL kernel languages are similar to the
13251326corresponding operations defined by the C11 standard.
13261327
1327- The OpenCL 2.x atomic operations apply to variables of an atomic type (a
1328- subset of those in the C11 standard) including atomic versions of the int,
1329- uint, long, ulong, float, double, half, intptr_t, uintptr_t, size_t, and
1330- ptrdiff_t types.
1331- However, support for some of these atomic types depends on support for the
1332- corresponding regular types.
1328+ The OpenCL C 2.x atomic operations apply to variables of an atomic type (a
1329+ subset of those in the C11 standard).
13331330
13341331An atomic operation on one or more memory locations is either an acquire
13351332operation, a release operation, or both an acquire and release operation.
@@ -1345,40 +1342,41 @@ The orders *memory_order_acquire* (used for reads), *memory_order_release*
13451342(used for writes), and *memory_order_acq_rel* (used for read-modify-write
13461343operations) are used for simple communication between units of execution
13471344using shared variables.
1348- Informally, executing a *memory_order_release* on an atomic object A makes
1345+ Informally, executing a *memory_order_release* on an atomic object *A* makes
13491346all previous side effects visible to any unit of execution that later
1350- executes a *memory_order_acquire* on A .
1347+ executes a *memory_order_acquire* on *A* .
13511348The orders *memory_order_acquire*, *memory_order_release*, and
13521349*memory_order_acq_rel* do not provide sequential consistency for race-free
13531350programs because they will not ensure that atomic stores followed by atomic
13541351loads become visible to other threads in that order.
13551352
13561353[[atomic-fence-orders]]
1357- The fence operation is atomic_work_item_fence, which includes a memory_order
1358- argument as well as the memory_scope and cl_mem_fence_flags arguments.
1359- Depending on the memory_order argument, this operation:
1360-
1361- * has no effects, if *memory_order_relaxed*;
1362- * is an acquire fence, if *memory_order_acquire*;
1363- * is a release fence, if *memory_order_release*;
1364- * is both an acquire fence and a release fence, if *memory_order_acq_rel*;
1354+ The fence operation is *atomic_work_item_fence*, which includes a memory order
1355+ argument as well as memory scope and memory flag arguments.
1356+ Depending on the memory order argument, this operation:
1357+
1358+ * has no effects, if the memory order is *memory_order_relaxed*;
1359+ * is an acquire fence, if the memory order is *memory_order_acquire*;
1360+ * is a release fence, if the memory order is *memory_order_release*;
1361+ * is both an acquire fence and a release fence, if the memory order is
1362+ *memory_order_acq_rel*;
13651363 * is a sequentially-consistent fence with both acquire and release
1366- semantics, if *memory_order_seq_cst*.
1364+ semantics, if the memory order is *memory_order_seq_cst*.
13671365
13681366If specified, the cl_mem_fence_flags argument must be `CLK_IMAGE_MEM_FENCE`,
13691367`CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, or `CLK_GLOBAL_MEM_FENCE |
13701368CLK_LOCAL_MEM_FENCE`.
13711369
1372- The ` atomic_work_item_fence(CLK_IMAGE_MEM_FENCE, ...)` built-in function must be
1373- used to make sure that sampler-less writes are visible to later reads by the
1374- same work-item.
1375- Without use of the atomic_work_item_fence function, write-read coherence on
1370+ The * atomic_work_item_fence* built-in function must be used with
1371+ `CLK_IMAGE_MEM_FENCE` to make sure that sampler-less writes are visible to later
1372+ reads by the same work-item.
1373+ Without use of the * atomic_work_item_fence* function, write-read coherence on
13761374image objects is not guaranteed: if a work-item reads from an image to which
1377- it has previously written without an intervening atomic_work_item_fence, it
1375+ it has previously written without an intervening * atomic_work_item_fence* , it
13781376is not guaranteed that those previous writes are visible to the work-item.
13791377
1380- The synchronization operations in OpenCL 2.x can be parameterized by a
1381- memory_scope .
1378+ The synchronization operations in OpenCL C 2.x can be parameterized by a
1379+ memory scope .
13821380Memory scopes control the extent that an atomic operation or fence is
13831381visible with respect to the memory model.
13841382These memory scopes may be used when performing atomic operations and fences
@@ -1509,9 +1507,9 @@ A local memory action *A* local-happens-before a local memory action *B* if
15091507 * For some local memory action *C*, *A* local-happens-before *C* and *C*
15101508 local-happens-before *B*.
15111509
1512- An OpenCL 2.x implementation shall ensure that no program execution
1513- demonstrates a cycle in either the local-happens-before relation or the
1514- global-happens-before relation.
1510+ An implementation of the OpenCL 2.x memory consistency model shall ensure that
1511+ no program execution demonstrates a cycle in either the local-happens-before
1512+ relation or the global-happens-before relation.
15151513
15161514NOTE: The global- and local-happens-before relations are critical to
15171515defining what values are read and when data races occur.
@@ -1593,12 +1591,12 @@ This requirement is known as write-read coherence.
15931591This and following sections describe how different program actions in kernel
15941592C code and the host program contribute to the local- and
15951593global-happens-before relations.
1596- This section discusses ordering rules for OpenCL 2.x atomic operations.
1594+ This section discusses ordering rules for OpenCL C 2.x atomic operations.
15971595
1598- <<device-side-enqueue, Device-side enqueue>> defines the enumerated type
1599- memory_order.
1596+ The <<memory-consistency-model>> section defines the enumerated type
1597+ * memory_order* .
16001598
1601- * For *memory_order_relaxed*, no operation orders memory.
1599+ * For *memory_order_relaxed*, there is no memory ordering .
16021600 * For *memory_order_release*, *memory_order_acq_rel*, and
16031601 *memory_order_seq_cst*, a store operation performs a release operation
16041602 on the affected memory location.
@@ -1723,16 +1721,16 @@ reasonable amount of time.
17231721<<iso-c11,[C11 standard, Section 7.17.3, paragraph 16.]>>
17241722
17251723As long as the following conditions are met, a host program sharing SVM memory
1726- with a kernel executing on one or more OpenCL 2.x devices may use atomic and
1727- synchronization operations to ensure that its assignments, and those of the
1728- kernel, are visible to each other:
1724+ with a kernel executing on one or more OpenCL 2.x or newer devices may use
1725+ atomic and synchronization operations to ensure that its assignments, and those
1726+ of the kernel, are visible to each other:
17291727
17301728 . Either fine-grained buffer or fine-grained system SVM must be used to
17311729 share memory.
17321730 While coarse-grained buffer SVM allocations may support atomic
17331731 operations, visibility on these allocations is not guaranteed except at
17341732 map and unmap operations.
1735- . The optional OpenCL 2.x SVM atomic-controlled visibility specified by
1733+ . The optional OpenCL SVM atomic-controlled visibility specified by
17361734 provision of the {CL_MEM_SVM_ATOMICS} flag must be supported by the device
17371735 and the flag provided to the SVM buffer on allocation.
17381736 . The host atomic and synchronization operations must be compatible with
@@ -1748,11 +1746,11 @@ all_svm_devices scope.
17481746[[memory-ordering-fence]]
17491747==== Fence Operations
17501748
1751- This section describes how the OpenCL 2.x fence operations contribute to the
1749+ This section describes how the OpenCL C 2.x fence operations contribute to the
17521750local- and global-happens-before relations.
17531751
17541752Earlier, we introduced synchronization primitives called fences.
1755- Fences can utilize the acquire memory_order , release memory_order , or both.
1753+ Fences can utilize the acquire memory order , release memory order , or both.
17561754A fence with acquire semantics is called an acquire fence; a fence with
17571755release semantics is called a release fence. The <<atomic-fence-orders,
17581756overview of atomic and fence operations>> section describes the memory orders
@@ -1990,8 +1988,6 @@ following:
19901988 command-queue, then the event (including the event implied between *C*
19911989 and *C1* due to the in-order queue) signaling *C*'s completion
19921990 global-synchronizes-with *C1*.
1993- Note that in OpenCL 2.x, only a host command-queue can be configured as
1994- an in-order queue.
19951991 . If an API call enqueues a marker command *C* with an empty list of
19961992 events upon which *C* should wait, then the events of all commands
19971993 enqueued prior to *C* in the command-queue global-synchronize-with *C*.
@@ -2053,7 +2049,7 @@ In this situation:
20532049 enqueued from the host to a single device is guaranteed under the memory
20542050 ordering rules described earlier in this section.
20552051
2056- If fine-grain SVM is used but without support for the OpenCL 2.x atomic
2052+ If fine-grain SVM is used but without support for SVM atomic
20572053operations, then the host and devices can concurrently read the same memory
20582054locations and can concurrently update non-overlapping memory regions, but
20592055attempts to update the same memory locations are undefined.
0 commit comments