Skip to content

Commit 6c3323d

Browse files
author
Pekka Jääskeläinen
committed
cl_ext_buffer_device_address
The basic cl_mem buffer API doesn't enable access to the underlying raw pointers in the device memory, preventing its use in host side data structures that need pointer references to objects. This API adds a minimal increment on top of cl_mem that provides such capabilities. The version 0.1.0 is implemented in PoCL and rusticl for prototyping, but everything's still up for discussion. chipStar is the first client that uses the API.
1 parent 49083c2 commit 6c3323d

3 files changed

Lines changed: 323 additions & 4 deletions

File tree

Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
= cl_ext_buffer_device_address
2+
3+
// This section needs to be after the document title.
4+
:doctype: book
5+
:toc2:
6+
:toc: left
7+
:encoding: utf-8
8+
:lang: en
9+
10+
:blank: pass:[ +]
11+
12+
// Set the default source code type in this document to C,
13+
// for syntax highlighting purposes.
14+
:language: c
15+
16+
// This is what is needed for C++, since docbook uses c++
17+
// and everything else uses cpp. This doesn't work when
18+
// source blocks are in table cells, though, so don't use
19+
// C++ unless it is required.
20+
//:language: {basebackend@docbook:c++:cpp}
21+
22+
== Name Strings
23+
24+
`cl_ext_buffer_device_address`
25+
26+
== Contact
27+
28+
Pekka Jääskeläinen, Intel (pekka 'dot' jaaskelainen 'at' intel 'dot' com)
29+
30+
== Contributors
31+
32+
// spell-checker: disable
33+
Pekka Jääskeläinen, Intel +
34+
Karol Herbst, Red Hat +
35+
Henry Linjamäki, Intel +
36+
// spell-checker: enable
37+
38+
== Notice
39+
40+
Copyright (c) 2024 Intel Corporation. All rights reserved.
41+
42+
== Status
43+
44+
Draft.
45+
46+
== Version
47+
48+
Built On: {docdate} +
49+
Revision: 0.1.0
50+
51+
== Dependencies
52+
53+
This extension is written against the OpenCL Specification version 3.0.16.
54+
55+
This extension requires OpenCL 1.0 or later.
56+
57+
== Overview
58+
59+
The basic cl_mem buffer API doesn't enable access to the underlying raw
60+
pointers in the device memory, preventing its use in host side
61+
data structures that need pointer references to objects.
62+
This API adds a minimal increment on top of cl_mem that provides such
63+
capabilities.
64+
65+
Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
66+
that enables raw device side pointers in the OpenCL standard. Its coarse-grain
67+
variant is relatively simple to implement on various platforms in terms of
68+
coherency requirements, but it requires mapping the buffer's address range
69+
to the host virtual address space although it might not be needed by the
70+
application. This is not an issue in systems which can provide virtual memory
71+
across the platform, but might provide implementation challenges in cases
72+
where the device presents a global memory with its disjoint address space
73+
(that can also be a physical memory address space) or, for example, when
74+
a barebone embedded system lacks virtual memory support altogether.
75+
76+
Various higher-level APIs present a memory allocation routine which can
77+
allocate device-only memory and provide raw pointers to it without guarentees
78+
of system-wide uniqueness: Minimal implementations of OpenMP's omp_target_alloc() and
79+
CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared
80+
address space between the host and the device. This extension is meant to
81+
provide a minimal set of features to implement such APIs without requiring
82+
a shared virtual address space between the host and the device.
83+
84+
=== New API Function
85+
86+
include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
87+
88+
=== New API Enums
89+
90+
Enums for enabling device pointer properties when creating a buffer
91+
{clCreateBuffer}, see <<clCreateBuffer, the list of supported memory flag values table>>:
92+
93+
[source]
94+
----
95+
#define CL_MEM_DEVICE_ADDRESS_EXT (1ul << 31)
96+
#define CL_MEM_DEVICE_PRIVATE_EXT (1ul << 30)
97+
----
98+
99+
Enums for querying the device pointer from the cl_mem <<clGetMemObjectInfo, the list of supported param_names table>>:
100+
101+
[source]
102+
----
103+
#define CL_MEM_DEVICE_PTR_EXT 0xff01
104+
----
105+
106+
Enums for setting information of indirect device pointer accesses to kernels <<clSetKernelExecInfo, the list of supported param_names table>>. This is for OpenCL 2.0 and above. When implementing the
107+
extension on an older OpenCL version, indirect device pointer access is not supported.
108+
109+
[source]
110+
----
111+
#define CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT 0x11B8
112+
----
113+
114+
== New API Types
115+
116+
Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTR_EXT`.
117+
118+
[source]
119+
----
120+
typedef cl_ulong cl_mem_device_address_EXT;
121+
----
122+
123+
Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTRS_EXT`.
124+
125+
[source]
126+
----
127+
typedef struct _cl_mem_device_address_pair_EXT
128+
{
129+
cl_device_id device;
130+
cl_mem_device_address_EXT address;
131+
} cl_mem_device_address_pair_EXT;
132+
----
133+
134+
== Modifications to the OpenCL API Specification
135+
136+
=== Section 5.2.1 - Creating Buffer Objects:
137+
138+
Add new allocation flags <<clCreateBuffer, List of supported memory flag values table>>:
139+
140+
[[list-of-supported-memory-flag-values-adds]]
141+
.List of supported memory flags by {clCreateBuffer}
142+
[width="100%",cols="<50%,<50%",options="header"]
143+
|====
144+
| Memory Flags | Description
145+
| {CL_MEM_DEVICE_ADDRESS_EXT_anchor}
146+
147+
include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
148+
| This flag specifies that the buffer must have a single fixed address
149+
for its lifetime and the address should be unique at least across the devices
150+
of the context, but not necessarily withing the host (virtual) memory.
151+
152+
The flag might imply that the buffer will be "pinned" permanently to
153+
a device's memory, but might not be necessarily so, as long as the address
154+
range of the buffer remains constant.
155+
156+
The address is guaranteed to remain the same until the buffer is freed, and
157+
the address can be queried via {clGetMemObjectInfo}.
158+
159+
The device-specific buffer content updates are still performed by
160+
implicit or explicit buffer migrations performed by the runtime or the
161+
client code. If any of the devices in the context does not support
162+
this type of allocations, an error (CL_INVALID_VALUE) is returned.
163+
| {CL_MEM_DEVICE_PRIVATE_EXT_anchor}
164+
165+
include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_EXT.asciidoc[]
166+
| If this flag is combined with CL_MEM_DEVICE_ADDRESS_EXT, each device in
167+
the context can have their own (fixed) device-side address and copy of
168+
the created buffer which are synchronized implicitly by the runtime.
169+
The main difference to a default cl_mem allocation in that case is then
170+
that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT and the
171+
per-device address is guaranteed to be the same for the entire lifetime
172+
of the cl_mem.
173+
|====
174+
175+
// refError
176+
177+
=== Section 5.5.6 - Memory Object Queries
178+
179+
Add a new information type <<clGetMemObjectInfo, List of supported param_names table>>:
180+
181+
[width="100%",cols="<33%,<17%,<50%",options="header"]
182+
|====
183+
| Memory Object Info | Return type | Description
184+
| {CL_MEM_DEVICE_PTR_EXT_anchor}
185+
186+
include::{generated}/api/version-notes/CL_MEM_DEVICE_PTR_EXT.asciidoc[]
187+
| {cl_mem_device_address_EXT_TYPE}
188+
| Returns the device address for a buffer allocated with
189+
CL_MEM_DEVICE_ADDRESS_EXT. If the buffer was not created with the flag
190+
or there are multiple devices in the context and the buffer address is
191+
not the same for all of them, it returns CL_INVALID_MEM_OBJECT.
192+
193+
| {CL_MEM_DEVICE_PTRS_EXT_anchor}
194+
include::{generated}/api/version-notes/CL_MEM_DEVICE_PTRS_EXT.asciidoc[]
195+
| {cl_mem_device_address_pair_EXT_TYPE}
196+
| Returns the device-address pairs for all devices in the context.
197+
The per-device addresses might differ when the buffer was allocated
198+
with the CL_MEM_DEVICE_PRIVATE_EXT enabled.
199+
|====
200+
201+
202+
=== Section 5.9.2 - Setting Kernel Arguments
203+
204+
Add a new kernel argument setter for device pointers <<setting-kernel-arguments, Section 5.9.2>>:
205+
206+
To set a device pointer as the argument value for a specific argument of a
207+
kernel, call the function
208+
209+
include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
210+
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]
211+
212+
* _kernel_ is a valid kernel object.
213+
* _arg_index_ is the argument index.
214+
Arguments to the kernel are referred by indices that go from 0 for the
215+
leftmost argument to _n_ - 1, where _n_ is the total number of arguments
216+
declared by a kernel.
217+
* _arg_value_ is the device pointer that should be used as the argument value for
218+
argument specified by _arg_index_.
219+
The device pointer specified is the value used by all API calls that enqueue
220+
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument
221+
value is changed by a call to {clSetKernelArgSVMPointer} for _kernel_.
222+
The device pointer can only be used for arguments that are declared to be a
223+
pointer to `global` memory allocated with clCreateBuffer() with the
224+
CL_MEM_DEVICE_ADDRESS_EXT flag. The pointer value specified as the argument value
225+
can be the pointer to the beginning of the buffer or be a pointer offset into
226+
the buffer region. The device pointer value must be naturally aligned according to
227+
the argument's type.
228+
229+
// refError
230+
231+
{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the function was executed
232+
successfully. Otherwise, it returns one of the following errors:
233+
234+
* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
235+
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support
236+
the device pointer.
237+
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index.
238+
* {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value.
239+
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
240+
by the OpenCL implementation on the device.
241+
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
242+
required by the OpenCL implementation on the host.
243+
244+
Add a new flag to clSetKernelExecInfo for setting indirect device pointer access info <<clSetKernelExecInfo, List of supported param_name stable>>:
245+
246+
[width="100%",cols="<33%,<17%,<50%",options="header"]
247+
|====
248+
| Kernel Exec Info | Type | Description
249+
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor}
250+
251+
include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[]
252+
| {cl_mem_device_address_EXT_TYPE}
253+
| Device pointers must reference locations contained entirely within
254+
buffers that are passed to kernel as arguments, or that are passed
255+
through the execution information.
256+
257+
Non-argument device pointers accessed by the kernel must be specified
258+
by passing pointers to those buffers via {clSetKernelExecInfo}.
259+
|====
260+
261+
// refError
262+
263+
264+
== Interactions with Other Extensions
265+
266+
This extension is targeted to complement the OpenCL SVM extension and/or the
267+
Intel Unified Shared Memory extension by providing an additional lower-end
268+
step in the spectrum of type of pointers/buffers OpenCL can allocate. The
269+
extension can be seen as a simplification of the USM Device allocation type
270+
which drops the need to map the device buffer's address range to the same
271+
position in the host memory or to implement platform-wide VM.
272+
273+
== Issues
274+
275+
None.
276+
277+
== Version History
278+
279+
[cols="5,15,15,70"]
280+
[grid="rows"]
281+
[options="header"]
282+
|====
283+
| *Version* | *Date* | *Author* | *Changes*
284+
| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback.
285+
This version describes the first API version that was prototyped
286+
in PoCL and RustiCL using temporary placeholder flag/enum values.
287+
The PoCL implementation and initial discussion on the extension
288+
can be found https://github.com/pocl/pocl/pull/1441[in this PR].
289+
|====

extensions/extensions.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ include::cl_ext_image_from_buffer.asciidoc[]
4141
include::cl_ext_image_raw10_raw12.asciidoc[]
4242
<<<
4343
include::cl_ext_image_requirements_info.asciidoc[]
44+
<<<
45+
include::cl_ext_buffer_device_address.asciidoc[]
4446

4547
// Vendor Extensions
4648
:leveloffset: 0

xml/cl.xml

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,8 @@ server's OpenCL/api-docs repository.
255255
<type category="define">typedef <type>cl_bitfield</type> <name>cl_platform_command_buffer_capabilities_khr</name>;</type>
256256
<type category="define">typedef <type>cl_bitfield</type> <name>cl_mutable_dispatch_asserts_khr</name></type>
257257
<type category="define">typedef <type>cl_bitfield</type> <name>cl_device_kernel_clock_capabilities_khr</name>;</type>
258+
<type category="define">typedef <type>cl_ulong</type> <name>cl_mem_device_address_EXT</name>;</type>
259+
<type category="define">typedef struct _cl_mem_device_address_pair_EXT* <name>cl_mem_device_address_pair_EXT</name>;</type>
258260

259261
<comment>Structure types</comment>
260262
<type category="struct" name="cl_dx9_surface_info_khr">
@@ -304,6 +306,10 @@ server's OpenCL/api-docs repository.
304306
<member><type>size_t</type> <name>origin</name></member>
305307
<member><type>size_t</type> <name>size</name></member>
306308
</type>
309+
<type category="struct" name="cl_mem_device_address_pair_EXT">
310+
<member><type>cl_device_id</type> <name>device</name></member>
311+
<member><type>cl_mem_device_address_EXT</type> <name>address</name></member>
312+
</type>
307313
<type category="struct" name="cl_name_version">
308314
<member><type>cl_version</type> <name>version</name></member>
309315
<member><type>char</type> <name>name</name>[<enum>CL_NAME_VERSION_MAX_NAME_SIZE</enum>]</member>
@@ -911,7 +917,9 @@ server's OpenCL/api-docs repository.
911917
<enum bitpos="38" name="CL_MEM_RESERVED1_QCOM"/>
912918
<enum bitpos="39" name="CL_MEM_RESERVED2_QCOM"/>
913919
<enum bitpos="40" name="CL_MEM_RESERVED3_QCOM"/>
914-
<unused start="41" end="63"/>
920+
<enum bitpos="41" name="CL_MEM_DEVICE_ADDRESS_EXT"/>
921+
<enum bitpos="42" name="CL_MEM_DEVICE_PRIVATE_EXT"/>
922+
<unused start="43" end="63"/>
915923
</enums>
916924

917925
<enums name="cl_map_flags" vendor="Khronos" type="bitmask">
@@ -1630,7 +1638,9 @@ server's OpenCL/api-docs repository.
16301638
<enum value="0x1108" name="CL_MEM_OFFSET"/>
16311639
<enum value="0x1109" name="CL_MEM_USES_SVM_POINTER"/>
16321640
<enum value="0x110A" name="CL_MEM_PROPERTIES"/>
1633-
<unused start="0x110B" end="0x110F" comment="Reserved for cl_mem_info"/>
1641+
<enum value="0x110B" name="CL_MEM_DEVICE_PTR_EXT"/>
1642+
<enum value="0x110C" name="CL_MEM_DEVICE_PTRS_EXT"/>
1643+
<unused start="0x110D" end="0x110F" comment="Reserved for cl_mem_info"/>
16341644
<enum value="0x1110" name="CL_IMAGE_FORMAT"/>
16351645
<enum value="0x1111" name="CL_IMAGE_ELEMENT_SIZE"/>
16361646
<enum value="0x1112" name="CL_IMAGE_ROW_PITCH"/>
@@ -1723,7 +1733,8 @@ server's OpenCL/api-docs repository.
17231733
<enum value="0x11B8" name="CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT"/>
17241734
<enum value="0x11B9" name="CL_KERNEL_MAX_NUM_SUB_GROUPS"/>
17251735
<enum value="0x11BA" name="CL_KERNEL_COMPILE_NUM_SUB_GROUPS"/>
1726-
<unused start="0x11BB" end="0x11CF" comment="Reserved for cl_kernel_info / cl_kernel_work_group_info / cl_kernel_exec_info / cl_kernel_sub_group_info"/>
1736+
<enum value="0x11BB" name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
1737+
<unused start="0x11BC" end="0x11CF" comment="Reserved for cl_kernel_info / cl_kernel_work_group_info / cl_kernel_exec_info / cl_kernel_sub_group_info"/>
17271738
<enum value="0x11D0" name="CL_EVENT_COMMAND_QUEUE"/>
17281739
<enum value="0x11D1" name="CL_EVENT_COMMAND_TYPE"/>
17291740
<enum value="0x11D2" name="CL_EVENT_REFERENCE_COUNT"/>
@@ -3725,6 +3736,12 @@ server's OpenCL/api-docs repository.
37253736
<param><type>cl_uint</type> <name>arg_index</name></param>
37263737
<param>const <type>void</type>* <name>arg_value</name></param>
37273738
</command>
3739+
<command suffix="CL_API_SUFFIX__VERSION_1_0">
3740+
<proto><type>cl_int</type> <name>clSetKernelArgDevicePointerEXT</name></proto>
3741+
<param><type>cl_kernel</type> <name>kernel</name></param>
3742+
<param><type>cl_uint</type> <name>arg_index</name></param>
3743+
<param>const <type>void</type>* <name>arg_value</name></param>
3744+
</command>
37283745
<command suffix="CL_API_SUFFIX__VERSION_2_0">
37293746
<proto><type>cl_int</type> <name>clSetKernelExecInfo</name></proto>
37303747
<param><type>cl_kernel</type> <name>kernel</name></param>
@@ -7186,7 +7203,18 @@ server's OpenCL/api-docs repository.
71867203
<command name="clSetContentSizeBufferPoCL"/>
71877204
</require>
71887205
</extension>
7189-
<extension name="cl_khr_command_buffer" revision="0.9.5" supported="opencl" depends="CL_VERSION_1_2" ratified="opencl" provisional="true">
7206+
<extension name="cl_ext_buffer_device_address" supported="opencl">
7207+
<require>
7208+
<command name="clSetKernelArgDevicePointerEXT"/>
7209+
<enum name="CL_MEM_DEVICE_ADDRESS_EXT"/>
7210+
<enum name="CL_MEM_DEVICE_PRIVATE_EXT"/>
7211+
<enum name="CL_MEM_DEVICE_PTR_EXT"/>
7212+
<enum name="CL_MEM_DEVICE_PTRS_EXT"/>
7213+
<enum name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
7214+
<type name="cl_mem_device_address_EXT"/>
7215+
<type name="cl_mem_device_address_pair_EXT"/>
7216+
</require>
7217+
</extension>
71907218
<require>
71917219
<type name="CL/cl.h"/>
71927220
</require>

0 commit comments

Comments
 (0)