4646== Version
4747
4848Built On: {docdate} +
49- Revision: 0.2 .0
49+ Revision: 0.3 .0
5050
5151== Dependencies
5252
@@ -59,8 +59,6 @@ This extension requires OpenCL 1.0 or later.
5959The basic cl_mem buffer API doesn't enable access to the underlying raw
6060pointers in the device memory, preventing its use in host side
6161data structures that need pointer references to objects.
62- This API adds a minimal increment on top of cl_mem that provides such
63- capabilities.
6462
6563Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
6664that enables raw device side pointers in the OpenCL standard. Its coarse-grain
@@ -69,17 +67,18 @@ coherency requirements, but it requires mapping the buffer's address range
6967to the host virtual address space although it might not be needed by the
7068application. This is not an issue in systems which can provide virtual memory
7169across the platform, but might provide implementation challenges in cases
72- where the device presents a global memory with its disjoint address space
70+ where the device presents a global memory with a disjoint address space
7371(that can also be a physical memory address space) or, for example, when
7472a barebone embedded system lacks virtual memory support altogether.
7573
7674Various higher-level APIs present a memory allocation routine which can
7775allocate device-only memory and provide raw pointers to it without guarentees
78- of system-wide uniqueness: Minimal implementations of OpenMP's omp_target_alloc() and
79- CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared
76+ of system-wide uniqueness: For example, minimal implementations of OpenMP's
77+ omp_target_alloc() and CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared
8078address space between the host and the device. This extension is meant to
81- provide a minimal set of features to implement such APIs without requiring
82- a shared virtual address space between the host and the device.
79+ provide a minimal set of features to implement such APIs using the cl_mem
80+ buffers without requiring a shared virtual address space between the host and
81+ the device.
8382
8483=== New API Function
8584
@@ -92,8 +91,8 @@ Enums for enabling device pointer properties when creating a buffer
9291
9392[source]
9493----
95- #define CL_MEM_DEVICE_ADDRESS_EXT (1ul << 31)
96- #define CL_MEM_DEVICE_PRIVATE_EXT (1ul << 30)
94+ #define CL_MEM_DEVICE_SHARED_ADDRESS_EXT (1ul << 31)
95+ #define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT (1ul << 30)
9796----
9897
9998Enums for querying the device pointer from the cl_mem <<clGetMemObjectInfo, the list of supported param_names table>>:
@@ -142,9 +141,9 @@ Add new allocation flags <<clCreateBuffer, List of supported memory flag values
142141[width="100%",cols="<50%,<50%",options="header"]
143142|====
144143| Memory Flags | Description
145- | {CL_MEM_DEVICE_ADDRESS_EXT_anchor }
144+ | {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor }
146145
147- include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT .asciidoc[]
146+ include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT .asciidoc[]
148147 | This flag specifies that the buffer must have a single fixed address
149148 for its lifetime and the address should be unique at least across the devices
150149 of the context, but not necessarily within the host (virtual) memory.
@@ -161,22 +160,22 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
161160 client code. If all of the devices in the context do not support
162161 this type of allocations, an error (CL_INVALID_VALUE) is returned.
163162
164- The device addresses of sub-buffers derived from CL_MEM_DEVICE_ADDRESS_EXT
163+ The device addresses of sub-buffers derived from CL_MEM_DEVICE_SHARED_ADDRESS_EXT
165164 allocated buffers can be computed by adding the sub-buffer origin to the
166165 start address.
167166
168- | {CL_MEM_DEVICE_PRIVATE_EXT_anchor }
167+ | {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor }
169168
170- include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_EXT .asciidoc[]
171- | If this flag is combined with CL_MEM_DEVICE_ADDRESS_EXT, each device in
172- the context can have their own (fixed) device-side address and a copy of
173- the created buffer which are synchronized implicitly by the runtime.
174- The main difference to a default cl_mem allocation in that case is then
175- that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT and the
176- per-device address is guaranteed to be the same for the entire lifetime
169+ include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT .asciidoc[]
170+ | This flag specifies that the buffer must have a single fixed address
171+ for its lifetime. Each device in the context can have their own (fixed)
172+ device-side address and a copy of the created buffer which are synchronized
173+ implicitly by the runtime. The main difference to a default cl_mem allocation
174+ in that case is that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT
175+ and the per-device address is guaranteed to be the same for the entire lifetime
177176 of the cl_mem.
178177
179- The device addresses of sub-buffers derived from CL_MEM_DEVICE_PRIVATE_EXT
178+ The device addresses of sub-buffers derived from CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT
180179 allocated buffers can be computed by adding the sub-buffer origin to the
181180 device-specific start address.
182181
@@ -196,7 +195,7 @@ Add a new information type <<clGetMemObjectInfo, List of supported param_names t
196195include::{generated}/api/version-notes/CL_MEM_DEVICE_PTR_EXT.asciidoc[]
197196 | {cl_mem_device_address_EXT_TYPE}
198197 | Returns the device address for a buffer allocated with
199- CL_MEM_DEVICE_ADDRESS_EXT . If the buffer was not created with the flag
198+ CL_MEM_DEVICE_SHARED_ADDRESS_EXT . If the buffer was not created with the flag
200199 or there are multiple devices in the context and the buffer address is
201200 not the same for all of them, it returns CL_INVALID_MEM_OBJECT.
202201
@@ -205,7 +204,7 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_PTRS_EXT.asciidoc[]
205204 | {cl_mem_device_address_pair_EXT_TYPE}
206205 | Returns the device-address pairs for all devices in the context.
207206 The per-device addresses might differ when the buffer was allocated
208- with the CL_MEM_DEVICE_PRIVATE_EXT enabled.
207+ with the CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT enabled.
209208|====
210209
211210
@@ -231,7 +230,7 @@ include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]
231230 value is changed by a call to {clSetKernelArgDevicePointer} for _kernel_.
232231 The device pointer can only be used for arguments that are declared to be a
233232 pointer to `global` memory allocated with clCreateBuffer() with the
234- CL_MEM_DEVICE_ADDRESS_EXT flag. The pointer value specified as the argument value
233+ CL_MEM_DEVICE_SHARED_ADDRESS_EXT flag. The pointer value specified as the argument value
235234 can be the pointer to the beginning of the buffer or be a pointer offset into
236235 the buffer region. The device pointer value must be naturally aligned according to
237236 the argument's type.
@@ -299,6 +298,12 @@ None.
299298[options="header"]
300299|====
301300| *Version* | *Date* | *Author* | *Changes*
301+ | 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst |
302+ Made the allocation flags independent from each other and
303+ renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and
304+ CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the
305+ same address across all devices in the context, whereas the latter
306+ allows per-device addresses.
302307| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst |
303308 Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device
304309 cases "all", not "any", covering a case where not all devices
0 commit comments