@@ -1148,9 +1148,10 @@ enforced at a synchronization point.
11481148IMPORTANT: This memory consistency model is <<unified-spec, missing
11491149before>> version 2.0.
11501150
1151- The OpenCL 2.x memory model tells programmers what they can expect from an
1152- OpenCL 2.x implementation; which memory operations are guaranteed to happen in
1153- which order and which memory values each read operation will return.
1151+ The OpenCL 2.x memory consistency model tells programmers what they can expect
1152+ from an OpenCL 2.x or newer implementation; which memory operations are
1153+ guaranteed to happen in which order and which memory values each read operation
1154+ will return.
11541155The memory model tells compiler writers which restrictions they must follow
11551156when implementing compiler optimizations; which variables they can cache in
11561157registers and when they can move reads or writes around a barrier or atomic
@@ -1159,7 +1160,7 @@ The memory model also tells hardware designers about limitations on hardware
11591160optimizations; for example, when they must flush or invalidate hardware
11601161caches.
11611162
1162- The memory consistency model in OpenCL 2.x is based on the memory model from
1163+ The OpenCL 2.x memory consistency model is based on the memory model from
11631164the ISO C11 programming language.
11641165To help make the presentation more precise and self-contained, we include
11651166modified paragraphs taken verbatim from the ISO C11 international standard.
@@ -1175,24 +1176,24 @@ Each access to a memory location sees the last assignment to that location
11751176in that interleaving.
11761177While sequential consistency is relatively straightforward for a programmer
11771178to reason about, implementing sequential consistency is expensive.
1178- Therefore, OpenCL 2.x implements a relaxed memory consistency model; i.e. it is
1179- possible to write programs where the loads from memory violate sequential
1180- consistency.
1179+ Therefore, the OpenCL 2.x memory consistency model is a relaxed memory
1180+ consistency model; i.e. it is possible to write programs where the loads from
1181+ memory violate sequential consistency.
11811182Fortunately, if a program does not contain any races and if the program only
11821183uses atomic operations that utilize the sequentially consistent memory order
1183- (the default memory ordering for OpenCL 2.x), OpenCL programs appear to execute
1184- with sequential consistency.
1184+ (the default memory ordering for OpenCL C 2.x), OpenCL programs appear to
1185+ execute with sequential consistency.
11851186
11861187Programmers can to some degree control how the memory model is relaxed by
11871188choosing the memory order for synchronization operations.
11881189The precise semantics of synchronization and the memory orders are formally
11891190defined in <<memory-ordering-rules, Memory Ordering Rules>>.
11901191Here, we give a high level description of how these memory orders apply to
11911192atomic operations on atomic objects shared between units of execution.
1192- OpenCL 2.x memory_order choices are based on those from the ISO C11 standard
1193+ The OpenCL 2.x memory orders are based on those from the ISO C11 standard
11931194memory model.
1194- They are specified in certain OpenCL functions through the following
1195- enumeration constants:
1195+ They are specified in certain OpenCL C functions through the following
1196+ *memory_order* enumeration constants:
11961197
11971198 * *memory_order_relaxed*: implies no order constraints.
11981199 This memory order can be used safely to increment counters that are
@@ -1230,13 +1231,13 @@ detailed rules for when synchronisation must occur.
12301231 loads and stores from different units of execution appear to be simply
12311232 interleaved.
12321233
1233- Regardless of which memory_order is specified, resolving constraints on
1234+ Regardless of which memory order is specified, resolving constraints on
12341235memory operations across a heterogeneous platform adds considerable overhead
12351236to the execution of a program.
12361237An OpenCL platform may be able to optimize certain operations that depend on
12371238the features of the memory consistency model by restricting the scope of the
12381239memory operations.
1239- Distinct memory scopes are defined by the values of the memory_scope
1240+ Distinct memory scopes are defined by the values of the * memory_scope*
12401241enumeration constant:
12411242
12421243 * *memory_scope_work_item*: memory-ordering constraints only apply within
@@ -1299,8 +1300,8 @@ detailed rules behind the relaxed memory models and go directly to
12991300
13001301=== Overview of Atomic and Fence Operations
13011302
1302- OpenCL 2.x has a number of _synchronization operations_ that are used to define
1303- memory order constraints in a program.
1303+ OpenCL C 2.x has a number of _synchronization operations_ that are used to
1304+ define memory order constraints in a program.
13041305They play a special role in controlling how memory operations in one unit of
13051306execution (such as work-items or, when using SVM a host thread) are made
13061307visible to another.
@@ -1310,17 +1311,13 @@ operations_ and _fences_.
13101311Atomic operations are indivisible.
13111312They either occur completely or not at all.
13121313These operations are used to order memory operations between units of
1313- execution and hence they are parameterized with the memory_order and
1314- memory_scope parameters defined by the OpenCL memory consistency model.
1314+ execution and hence they are parameterized with the memory order and
1315+ memory scope parameters defined by the OpenCL memory consistency model.
13151316The atomic operations for OpenCL kernel languages are similar to the
13161317corresponding operations defined by the C11 standard.
13171318
1318- The OpenCL 2.x atomic operations apply to variables of an atomic type (a
1319- subset of those in the C11 standard) including atomic versions of the int,
1320- uint, long, ulong, float, double, half, intptr_t, uintptr_t, size_t, and
1321- ptrdiff_t types.
1322- However, support for some of these atomic types depends on support for the
1323- corresponding regular types.
1319+ The OpenCL C 2.x atomic operations apply to variables of an atomic type (a
1320+ subset of those in the C11 standard).
13241321
13251322An atomic operation on one or more memory locations is either an acquire
13261323operation, a release operation, or both an acquire and release operation.
@@ -1336,40 +1333,41 @@ The orders *memory_order_acquire* (used for reads), *memory_order_release*
13361333(used for writes), and *memory_order_acq_rel* (used for read-modify-write
13371334operations) are used for simple communication between units of execution
13381335using shared variables.
1339- Informally, executing a *memory_order_release* on an atomic object A makes
1336+ Informally, executing a *memory_order_release* on an atomic object *A* makes
13401337all previous side effects visible to any unit of execution that later
1341- executes a *memory_order_acquire* on A .
1338+ executes a *memory_order_acquire* on *A* .
13421339The orders *memory_order_acquire*, *memory_order_release*, and
13431340*memory_order_acq_rel* do not provide sequential consistency for race-free
13441341programs because they will not ensure that atomic stores followed by atomic
13451342loads become visible to other threads in that order.
13461343
13471344[[atomic-fence-orders]]
1348- The fence operation is atomic_work_item_fence, which includes a memory_order
1349- argument as well as the memory_scope and cl_mem_fence_flags arguments.
1350- Depending on the memory_order argument, this operation:
1351-
1352- * has no effects, if *memory_order_relaxed*;
1353- * is an acquire fence, if *memory_order_acquire*;
1354- * is a release fence, if *memory_order_release*;
1355- * is both an acquire fence and a release fence, if *memory_order_acq_rel*;
1345+ The fence operation is *atomic_work_item_fence*, which includes a memory order
1346+ argument as well as memory scope and memory flag arguments.
1347+ Depending on the memory order argument, this operation:
1348+
1349+ * has no effects, if the memory order is *memory_order_relaxed*;
1350+ * is an acquire fence, if the memory order is *memory_order_acquire*;
1351+ * is a release fence, if the memory order is *memory_order_release*;
1352+ * is both an acquire fence and a release fence, if the memory order is
1353+ *memory_order_acq_rel*;
13561354 * is a sequentially-consistent fence with both acquire and release
1357- semantics, if *memory_order_seq_cst*.
1355+ semantics, if the memory order is *memory_order_seq_cst*.
13581356
13591357If specified, the cl_mem_fence_flags argument must be `CLK_IMAGE_MEM_FENCE`,
13601358`CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, or `CLK_GLOBAL_MEM_FENCE |
13611359CLK_LOCAL_MEM_FENCE`.
13621360
1363- The ` atomic_work_item_fence(CLK_IMAGE_MEM_FENCE, ...)` built-in function must be
1364- used to make sure that sampler-less writes are visible to later reads by the
1365- same work-item.
1366- Without use of the atomic_work_item_fence function, write-read coherence on
1361+ The * atomic_work_item_fence* built-in function must be used with
1362+ `CLK_IMAGE_MEM_FENCE` to make sure that sampler-less writes are visible to later
1363+ reads by the same work-item.
1364+ Without use of the * atomic_work_item_fence* function, write-read coherence on
13671365image objects is not guaranteed: if a work-item reads from an image to which
1368- it has previously written without an intervening atomic_work_item_fence, it
1366+ it has previously written without an intervening * atomic_work_item_fence* , it
13691367is not guaranteed that those previous writes are visible to the work-item.
13701368
1371- The synchronization operations in OpenCL 2.x can be parameterized by a
1372- memory_scope .
1369+ The synchronization operations in OpenCL C 2.x can be parameterized by a
1370+ memory scope .
13731371Memory scopes control the extent that an atomic operation or fence is
13741372visible with respect to the memory model.
13751373These memory scopes may be used when performing atomic operations and fences
@@ -1500,9 +1498,9 @@ A local memory action *A* local-happens-before a local memory action *B* if
15001498 * For some local memory action *C*, *A* local-happens-before *C* and *C*
15011499 local-happens-before *B*.
15021500
1503- An OpenCL 2.x implementation shall ensure that no program execution
1504- demonstrates a cycle in either the local-happens-before relation or the
1505- global-happens-before relation.
1501+ An implementation of the OpenCL 2.x memory consistency model shall ensure that
1502+ no program execution demonstrates a cycle in either the local-happens-before
1503+ relation or the global-happens-before relation.
15061504
15071505NOTE: The global- and local-happens-before relations are critical to
15081506defining what values are read and when data races occur.
@@ -1584,12 +1582,12 @@ This requirement is known as write-read coherence.
15841582This and following sections describe how different program actions in kernel
15851583C code and the host program contribute to the local- and
15861584global-happens-before relations.
1587- This section discusses ordering rules for OpenCL 2.x atomic operations.
1585+ This section discusses ordering rules for OpenCL C 2.x atomic operations.
15881586
1589- <<device-side-enqueue, Device-side enqueue>> defines the enumerated type
1590- memory_order.
1587+ The <<memory-consistency-model>> section defines the enumerated type
1588+ * memory_order* .
15911589
1592- * For *memory_order_relaxed*, no operation orders memory.
1590+ * For *memory_order_relaxed*, there is no memory ordering .
15931591 * For *memory_order_release*, *memory_order_acq_rel*, and
15941592 *memory_order_seq_cst*, a store operation performs a release operation
15951593 on the affected memory location.
@@ -1714,16 +1712,16 @@ reasonable amount of time.
17141712<<iso-c11,[C11 standard, Section 7.17.3, paragraph 16.]>>
17151713
17161714As long as the following conditions are met, a host program sharing SVM memory
1717- with a kernel executing on one or more OpenCL 2.x devices may use atomic and
1718- synchronization operations to ensure that its assignments, and those of the
1719- kernel, are visible to each other:
1715+ with a kernel executing on one or more OpenCL 2.x or newer devices may use
1716+ atomic and synchronization operations to ensure that its assignments, and those
1717+ of the kernel, are visible to each other:
17201718
17211719 . Either fine-grained buffer or fine-grained system SVM must be used to
17221720 share memory.
17231721 While coarse-grained buffer SVM allocations may support atomic
17241722 operations, visibility on these allocations is not guaranteed except at
17251723 map and unmap operations.
1726- . The optional OpenCL 2.x SVM atomic-controlled visibility specified by
1724+ . The optional OpenCL SVM atomic-controlled visibility specified by
17271725 provision of the {CL_MEM_SVM_ATOMICS} flag must be supported by the device
17281726 and the flag provided to the SVM buffer on allocation.
17291727 . The host atomic and synchronization operations must be compatible with
@@ -1739,11 +1737,11 @@ all_svm_devices scope.
17391737[[memory-ordering-fence]]
17401738==== Fence Operations
17411739
1742- This section describes how the OpenCL 2.x fence operations contribute to the
1740+ This section describes how the OpenCL C 2.x fence operations contribute to the
17431741local- and global-happens-before relations.
17441742
17451743Earlier, we introduced synchronization primitives called fences.
1746- Fences can utilize the acquire memory_order , release memory_order , or both.
1744+ Fences can utilize the acquire memory order , release memory order , or both.
17471745A fence with acquire semantics is called an acquire fence; a fence with
17481746release semantics is called a release fence. The <<atomic-fence-orders,
17491747overview of atomic and fence operations>> section describes the memory orders
@@ -1981,8 +1979,6 @@ following:
19811979 command-queue, then the event (including the event implied between *C*
19821980 and *C1* due to the in-order queue) signaling *C*'s completion
19831981 global-synchronizes-with *C1*.
1984- Note that in OpenCL 2.x, only a host command-queue can be configured as
1985- an in-order queue.
19861982 . If an API call enqueues a marker command *C* with an empty list of
19871983 events upon which *C* should wait, then the events of all commands
19881984 enqueued prior to *C* in the command-queue global-synchronize-with *C*.
@@ -2044,7 +2040,7 @@ In this situation:
20442040 enqueued from the host to a single device is guaranteed under the memory
20452041 ordering rules described earlier in this section.
20462042
2047- If fine-grain SVM is used but without support for the OpenCL 2.x atomic
2043+ If fine-grain SVM is used but without support for SVM atomic
20482044operations, then the host and devices can concurrently read the same memory
20492045locations and can concurrently update non-overlapping memory regions, but
20502046attempts to update the same memory locations are undefined.
0 commit comments