|
| 1 | += cl_ext_buffer_device_address |
| 2 | + |
| 3 | +// This section needs to be after the document title. |
| 4 | +:doctype: book |
| 5 | +:toc2: |
| 6 | +:toc: left |
| 7 | +:encoding: utf-8 |
| 8 | +:lang: en |
| 9 | + |
| 10 | +:blank: pass:[ +] |
| 11 | + |
| 12 | +// Set the default source code type in this document to C, |
| 13 | +// for syntax highlighting purposes. |
| 14 | +:language: c |
| 15 | + |
| 16 | +// This is what is needed for C++, since docbook uses c++ |
| 17 | +// and everything else uses cpp. This doesn't work when |
| 18 | +// source blocks are in table cells, though, so don't use |
| 19 | +// C++ unless it is required. |
| 20 | +//:language: {basebackend@docbook:c++:cpp} |
| 21 | + |
| 22 | +== Name Strings |
| 23 | + |
| 24 | +`cl_ext_buffer_device_address` |
| 25 | + |
| 26 | +== Contact |
| 27 | + |
| 28 | +Pekka Jääskeläinen, Intel (pekka 'dot' jaaskelainen 'at' intel 'dot' com) |
| 29 | + |
| 30 | +== Contributors |
| 31 | + |
| 32 | +// spell-checker: disable |
| 33 | +Pekka Jääskeläinen, Intel + |
| 34 | +Karol Herbst, Red Hat + |
| 35 | +Henry Linjamäki, Intel + |
| 36 | +// spell-checker: enable |
| 37 | + |
| 38 | +== Notice |
| 39 | + |
| 40 | +Copyright (c) 2024 Intel Corporation. All rights reserved. |
| 41 | + |
| 42 | +== Status |
| 43 | + |
| 44 | +Draft. |
| 45 | + |
| 46 | +== Version |
| 47 | + |
| 48 | +Built On: {docdate} + |
| 49 | +Revision: 0.1.0 |
| 50 | + |
| 51 | +== Dependencies |
| 52 | + |
| 53 | +This extension is written against the OpenCL Specification version 3.0.16. |
| 54 | + |
| 55 | +This extension requires OpenCL 1.0 or later. |
| 56 | + |
| 57 | +== Overview |
| 58 | + |
| 59 | +The basic cl_mem buffer API doesn't enable access to the underlying raw |
| 60 | +pointers in the device memory, preventing its use in host side |
| 61 | +data structures that need pointer references to objects. |
| 62 | +This API adds a minimal increment on top of cl_mem that provides such |
| 63 | +capabilities. |
| 64 | + |
| 65 | +Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature |
| 66 | +that enables raw device side pointers in the OpenCL standard. Its coarse-grain |
| 67 | +variant is relatively simple to implement on various platforms in terms of |
| 68 | +coherency requirements, but it requires mapping the buffer's address range |
| 69 | +to the host virtual address space although it might not be needed by the |
| 70 | +application. This is not an issue in systems which can provide virtual memory |
| 71 | +across the platform, but might provide implementation challenges in cases |
| 72 | +where the device presents a global memory with its disjoint address space |
| 73 | +(that can also be a physical memory address space) or, for example, when |
| 74 | +a barebone embedded system lacks virtual memory support altogether. |
| 75 | + |
| 76 | +Various higher-level APIs present a memory allocation routine which can |
| 77 | +allocate device-only memory and provide raw pointers to it without guarentees |
| 78 | +of system-wide uniqueness: Minimal implementations of OpenMP's omp_target_alloc() and |
| 79 | +CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared |
| 80 | +address space between the host and the device. This extension is meant to |
| 81 | +provide a minimal set of features to implement such APIs without requiring |
| 82 | +a shared virtual address space between the host and the device. |
| 83 | + |
| 84 | +=== New API Function |
| 85 | + |
| 86 | +include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] |
| 87 | + |
| 88 | +=== New API Enums |
| 89 | + |
| 90 | +Enums for enabling device pointer properties when creating a buffer |
| 91 | +{clCreateBuffer}, see <<clCreateBuffer, the list of supported memory flag values table>>: |
| 92 | + |
| 93 | +[source] |
| 94 | +---- |
| 95 | +#define CL_MEM_DEVICE_ADDRESS_EXT (1ul << 31) |
| 96 | +#define CL_MEM_DEVICE_PRIVATE_EXT (1ul << 30) |
| 97 | +---- |
| 98 | + |
| 99 | +Enums for querying the device pointer from the cl_mem <<clGetMemObjectInfo, the list of supported param_names table>>: |
| 100 | + |
| 101 | +[source] |
| 102 | +---- |
| 103 | +#define CL_MEM_DEVICE_PTR_EXT 0xff01 |
| 104 | +---- |
| 105 | + |
| 106 | +Enums for setting information of indirect device pointer accesses to kernels <<clSetKernelExecInfo, the list of supported param_names table>>. This is for OpenCL 2.0 and above. When implementing the |
| 107 | +extension on an older OpenCL version, indirect device pointer access is not supported. |
| 108 | + |
| 109 | +[source] |
| 110 | +---- |
| 111 | +#define CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT 0x11B8 |
| 112 | +---- |
| 113 | + |
| 114 | +== New API Types |
| 115 | + |
| 116 | +Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTR_EXT`. |
| 117 | + |
| 118 | +[source] |
| 119 | +---- |
| 120 | +typedef cl_ulong cl_mem_device_address_EXT; |
| 121 | +---- |
| 122 | + |
| 123 | +Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTRS_EXT`. |
| 124 | + |
| 125 | +[source] |
| 126 | +---- |
| 127 | +typedef struct _cl_mem_device_address_pair_EXT |
| 128 | +{ |
| 129 | + cl_device_id device; |
| 130 | + cl_mem_device_address_EXT address; |
| 131 | +} cl_mem_device_address_pair_EXT; |
| 132 | +---- |
| 133 | + |
| 134 | +== Modifications to the OpenCL API Specification |
| 135 | + |
| 136 | +=== Section 5.2.1 - Creating Buffer Objects: |
| 137 | + |
| 138 | +Add new allocation flags <<clCreateBuffer, List of supported memory flag values table>>: |
| 139 | + |
| 140 | +[[list-of-supported-memory-flag-values-adds]] |
| 141 | +.List of supported memory flags by {clCreateBuffer} |
| 142 | +[width="100%",cols="<50%,<50%",options="header"] |
| 143 | +|==== |
| 144 | +| Memory Flags | Description |
| 145 | +| {CL_MEM_DEVICE_ADDRESS_EXT_anchor} |
| 146 | + |
| 147 | +include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[] |
| 148 | + | This flag specifies that the buffer must have a single fixed address |
| 149 | + for its lifetime and the address should be unique at least across the devices |
| 150 | + of the context, but not necessarily withing the host (virtual) memory. |
| 151 | + |
| 152 | + The flag might imply that the buffer will be "pinned" permanently to |
| 153 | + a device's memory, but might not be necessarily so, as long as the address |
| 154 | + range of the buffer remains constant. |
| 155 | + |
| 156 | + The address is guaranteed to remain the same until the buffer is freed, and |
| 157 | + the address can be queried via {clGetMemObjectInfo}. |
| 158 | + |
| 159 | + The device-specific buffer content updates are still performed by |
| 160 | + implicit or explicit buffer migrations performed by the runtime or the |
| 161 | + client code. If any of the devices in the context does not support |
| 162 | + this type of allocations, an error (CL_INVALID_VALUE) is returned. |
| 163 | +| {CL_MEM_DEVICE_PRIVATE_EXT_anchor} |
| 164 | + |
| 165 | +include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_EXT.asciidoc[] |
| 166 | + | If this flag is combined with CL_MEM_DEVICE_ADDRESS_EXT, each device in |
| 167 | + the context can have their own (fixed) device-side address and copy of |
| 168 | + the created buffer which are synchronized implicitly by the runtime. |
| 169 | + The main difference to a default cl_mem allocation in that case is then |
| 170 | + that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT and the |
| 171 | + per-device address is guaranteed to be the same for the entire lifetime |
| 172 | + of the cl_mem. |
| 173 | +|==== |
| 174 | + |
| 175 | +// refError |
| 176 | + |
| 177 | +=== Section 5.5.6 - Memory Object Queries |
| 178 | + |
| 179 | +Add a new information type <<clGetMemObjectInfo, List of supported param_names table>>: |
| 180 | + |
| 181 | +[width="100%",cols="<33%,<17%,<50%",options="header"] |
| 182 | +|==== |
| 183 | +| Memory Object Info | Return type | Description |
| 184 | +| {CL_MEM_DEVICE_PTR_EXT_anchor} |
| 185 | + |
| 186 | +include::{generated}/api/version-notes/CL_MEM_DEVICE_PTR_EXT.asciidoc[] |
| 187 | + | {cl_mem_device_address_EXT_TYPE} |
| 188 | + | Returns the device address for a buffer allocated with |
| 189 | + CL_MEM_DEVICE_ADDRESS_EXT. If the buffer was not created with the flag |
| 190 | + or there are multiple devices in the context and the buffer address is |
| 191 | + not the same for all of them, it returns CL_INVALID_MEM_OBJECT. |
| 192 | + |
| 193 | +| {CL_MEM_DEVICE_PTRS_EXT_anchor} |
| 194 | +include::{generated}/api/version-notes/CL_MEM_DEVICE_PTRS_EXT.asciidoc[] |
| 195 | + | {cl_mem_device_address_pair_EXT_TYPE} |
| 196 | + | Returns the device-address pairs for all devices in the context. |
| 197 | + The per-device addresses might differ when the buffer was allocated |
| 198 | + with the CL_MEM_DEVICE_PRIVATE_EXT enabled. |
| 199 | +|==== |
| 200 | + |
| 201 | + |
| 202 | +=== Section 5.9.2 - Setting Kernel Arguments |
| 203 | + |
| 204 | +Add a new kernel argument setter for device pointers <<setting-kernel-arguments, Section 5.9.2>>: |
| 205 | + |
| 206 | +To set a device pointer as the argument value for a specific argument of a |
| 207 | +kernel, call the function |
| 208 | + |
| 209 | +include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] |
| 210 | +include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[] |
| 211 | + |
| 212 | + * _kernel_ is a valid kernel object. |
| 213 | + * _arg_index_ is the argument index. |
| 214 | + Arguments to the kernel are referred by indices that go from 0 for the |
| 215 | + leftmost argument to _n_ - 1, where _n_ is the total number of arguments |
| 216 | + declared by a kernel. |
| 217 | + * _arg_value_ is the device pointer that should be used as the argument value for |
| 218 | + argument specified by _arg_index_. |
| 219 | + The device pointer specified is the value used by all API calls that enqueue |
| 220 | + _kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument |
| 221 | + value is changed by a call to {clSetKernelArgSVMPointer} for _kernel_. |
| 222 | + The device pointer can only be used for arguments that are declared to be a |
| 223 | + pointer to `global` memory allocated with clCreateBuffer() with the |
| 224 | + CL_MEM_DEVICE_ADDRESS_EXT flag. The pointer value specified as the argument value |
| 225 | + can be the pointer to the beginning of the buffer or be a pointer offset into |
| 226 | + the buffer region. The device pointer value must be naturally aligned according to |
| 227 | + the argument's type. |
| 228 | + |
| 229 | +// refError |
| 230 | + |
| 231 | +{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the function was executed |
| 232 | +successfully. Otherwise, it returns one of the following errors: |
| 233 | + |
| 234 | + * {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object. |
| 235 | + * {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support |
| 236 | + the device pointer. |
| 237 | + * {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index. |
| 238 | + * {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value. |
| 239 | + * {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required |
| 240 | + by the OpenCL implementation on the device. |
| 241 | + * {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources |
| 242 | + required by the OpenCL implementation on the host. |
| 243 | + |
| 244 | +Add a new flag to clSetKernelExecInfo for setting indirect device pointer access info <<clSetKernelExecInfo, List of supported param_name stable>>: |
| 245 | + |
| 246 | +[width="100%",cols="<33%,<17%,<50%",options="header"] |
| 247 | +|==== |
| 248 | +| Kernel Exec Info | Type | Description |
| 249 | +| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor} |
| 250 | + |
| 251 | +include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[] |
| 252 | + | {cl_mem_device_address_EXT_TYPE} |
| 253 | + | Device pointers must reference locations contained entirely within |
| 254 | + buffers that are passed to kernel as arguments, or that are passed |
| 255 | + through the execution information. |
| 256 | + |
| 257 | + Non-argument device pointers accessed by the kernel must be specified |
| 258 | + by passing pointers to those buffers via {clSetKernelExecInfo}. |
| 259 | +|==== |
| 260 | + |
| 261 | +// refError |
| 262 | + |
| 263 | + |
| 264 | +== Interactions with Other Extensions |
| 265 | + |
| 266 | +This extension is targeted to complement the OpenCL SVM extension and/or the |
| 267 | +Intel Unified Shared Memory extension by providing an additional lower-end |
| 268 | +step in the spectrum of type of pointers/buffers OpenCL can allocate. The |
| 269 | +extension can be seen as a simplification of the USM Device allocation type |
| 270 | +which drops the need to map the device buffer's address range to the same |
| 271 | +position in the host memory or to implement platform-wide VM. |
| 272 | + |
| 273 | +== Issues |
| 274 | + |
| 275 | +None. |
| 276 | + |
| 277 | +== Version History |
| 278 | + |
| 279 | +[cols="5,15,15,70"] |
| 280 | +[grid="rows"] |
| 281 | +[options="header"] |
| 282 | +|==== |
| 283 | +| *Version* | *Date* | *Author* | *Changes* |
| 284 | +| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback. |
| 285 | + This version describes the first API version that was prototyped |
| 286 | + in PoCL and RustiCL using temporary placeholder flag/enum values. |
| 287 | + The PoCL implementation and initial discussion on the extension |
| 288 | + can be found https://github.com/pocl/pocl/pull/1441[in this PR]. |
| 289 | +|==== |
0 commit comments