Skip to content

Commit 92385e5

Browse files
author
Pekka Jääskeläinen
committed
cl_ext_buffer_device_address
The basic cl_mem buffer API doesn't enable access to the underlying raw pointers in the device memory, preventing its use in host side data structures that need pointer references to objects. This API adds a minimal increment on top of cl_mem that provides such capabilities.
1 parent 532b6ea commit 92385e5

4 files changed

Lines changed: 256 additions & 4 deletions

File tree

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
// Copyright 2024 The Khronos Group Inc.
2+
// SPDX-License-Identifier: CC-BY-4.0
3+
4+
include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[]
5+
6+
=== Other Extension Metadata
7+
8+
*Last Modified Date*::
9+
2024-12-06
10+
*IP Status*::
11+
No known IP claims.
12+
*Contributors*::
13+
- Pekka Jääskeläinen, Intel +
14+
- Karol Herbst, Red Hat +
15+
- Henry Linjamäki, Intel +
16+
- Kevin Petit, Arm +
17+
18+
=== Description
19+
20+
This extension provides access to raw device pointers for cl_mem buffers
21+
without requiring a shared virtual address space between the host and
22+
the device.
23+
24+
==== Background
25+
26+
Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
27+
that enables raw pointers in the OpenCL standard. Its coarse-grain
28+
variant is relatively simple to implement on various platforms in terms of
29+
coherency requirements, but it requires mapping the buffer's address range
30+
to the host virtual address space.
31+
However, various higher-level heterogeneous APIs present a memory allocation
32+
routine which can allocate device-only memory and provide raw addresses to
33+
it without guarentees of system-wide uniqueness. For example, minimal
34+
implementations of OpenMP's omp_target_alloc() and CUDA/HIP's
35+
cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device.
36+
37+
Host-device unified addressing might not be a major implementation issue in
38+
systems which can provide virtual memory across the platform, but might
39+
bring challenges in cases where the device presents a global memory with
40+
a disjoint address space (that can also be a physical memory address space) or,
41+
for example, when a barebone embedded system lacks virtual memory support altogether.
42+
This extension is targeted to complement the OpenCL SVM extension by providing
43+
an additional lower-end step in the spectrum of type of pointers/buffers OpenCL
44+
can allocate.
45+
46+
=== New Command
47+
48+
* {clSetKernelArgDevicePointerEXT}
49+
50+
=== New Types
51+
52+
* {cl_mem_device_address_EXT}
53+
54+
=== New Enums
55+
56+
* {cl_mem_properties_TYPE}
57+
** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
58+
** {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
59+
* {cl_mem_info_TYPE}
60+
** {CL_MEM_DEVICE_ADDRESS_EXT}
61+
* {cl_kernel_exec_info_TYPE}
62+
** {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT}
63+
64+
=== Version History
65+
66+
[cols="5,15,15,70"]
67+
[grid="rows"]
68+
[options="header"]
69+
|====
70+
| *Version* | *Date* | *Author* | *Changes*
71+
| 0.9.0 | 2024-12-06 | Pekka Jääskeläinen, Kevin Petit |
72+
Integrated to the main unified specification.
73+
Moved the functionality to clCreateBufferWithProperties,
74+
thus requiring 3.0+. Single memobj query for fetching the
75+
address(es). Also other smaller improvements pointed by Kevin.
76+
Candidate for final 1.0.0.
77+
| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst |
78+
Made the allocation flags independent from each other and
79+
renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and
80+
CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the
81+
same address across all devices in the context, whereas the latter
82+
allows per-device addresses.
83+
| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst |
84+
Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device
85+
cases "all", not "any", covering a case where not all devices
86+
can ensure the same address across the context. In that case
87+
CL_INVALID_VALUE can be returned. Defined sub-buffer address
88+
computation to be 'base_addr + origin'. Added error conditions
89+
for clSetKernelExecInfo when the device doesn't support
90+
device pointers.
91+
| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback.
92+
This version describes the first API version that was prototyped
93+
in PoCL and RustiCL using temporary placeholder flag/enum values.
94+
The PoCL implementation and initial discussion on the extension
95+
can be found https://github.com/pocl/pocl/pull/1441[in this PR].
96+
|====

api/opencl_runtime_layer.asciidoc

Lines changed: 126 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -595,6 +595,39 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_HANDLE_LIST_KHR.asciidoc[]
595595
{CL_MEM_DEVICE_HANDLE_LIST_END_KHR_anchor}) to associate with the
596596
external memory handle.
597597
endif::cl_khr_external_memory[]
598+
599+
ifdef::cl_ext_buffer_device_address[]
600+
601+
| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor}
602+
603+
include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[]
604+
| {cl_bool_TYPE}
605+
| When set to CL_TRUE, specifies that the buffer must have a single fixed
606+
device-side address for its lifetime, and the address can be queried via {clGetMemObjectInfo}.
607+
608+
Each device in the context can have their own (fixed) device-side address and
609+
a copy of the created buffer which are synchronized
610+
implicitly by the runtime.
611+
612+
The flag might imply that the buffer will be "pinned" permanently to
613+
a device's memory, but might not be necessarily so, as long as the address
614+
range of the buffer remains constant.
615+
616+
The device addresses of sub-buffers derived from {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
617+
allocated buffers can be computed by adding the sub-buffer origin to the
618+
device-specific start address.
619+
620+
| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor}
621+
622+
include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[]
623+
| {cl_bool_TYPE}
624+
| When set to CL_TRUE, the buffer has otherwise the same properties as
625+
when allocated using the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} flag,
626+
but with an additional property that the buffer's address is the same across
627+
all the devices in the context.
628+
629+
endif::cl_ext_buffer_device_address[]
630+
598631
|====
599632

600633
ifdef::cl_khr_external_memory[]
@@ -662,6 +695,15 @@ ifdef::cl_khr_external_memory[]
662695
{CL_MEM_DEVICE_HANDLE_LIST_KHR} is specified as part of _properties_.
663696
** if _properties_ includes more than one external memory handle.
664697
endif::cl_khr_external_memory[]
698+
ifdef::cl_ext_buffer_device_address[]
699+
* {CL_INVALID_DEVICE}
700+
** If _properties_ includes either {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} or
701+
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there is at least one device in
702+
the context that doesn't support such allocation.
703+
* {CL_INVALID_VALUE}
704+
** If _properties_ includes both {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} and
705+
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} at the same time.
706+
endif::cl_ext_buffer_device_address[]
665707

666708
[[memory-flags-table]]
667709
.List of supported memory flag values
@@ -6374,6 +6416,21 @@ include::{generated}/api/version-notes/CL_MEM_D3D11_RESOURCE_KHR.asciidoc[]
63746416
returns the _resource_ argument specified when _memobj_ was created.
63756417
endif::cl_khr_d3d11_sharing[]
63766418

6419+
ifdef::cl_ext_buffer_device_address[]
6420+
| {CL_MEM_DEVICE_ADDRESS_EXT_anchor}
6421+
6422+
include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
6423+
| {cl_mem_device_address_EXT_TYPE}[]
6424+
| If _memobj_ was created using {clCreateBufferWithProperties} with
6425+
the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to CL_TRUE,
6426+
returns a list of device addresses for the buffer, one for each
6427+
device in the context. If the buffer was allocated
6428+
with the {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} property,
6429+
only one device address is returned.
6430+
6431+
endif::cl_ext_buffer_device_address[]
6432+
6433+
63776434
|====
63786435

63796436
// refError
@@ -6388,6 +6445,12 @@ Otherwise, it returns one of the following errors:
63886445
the return type specified in the
63896446
<<mem-info-table, Memory Object Queries>> table
63906447
and _param_value_ is not `NULL`.
6448+
ifdef::cl_ext_buffer_device_address[]
6449+
** Returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if
6450+
the {cl_ext_buffer_device_address_EXT} is not supported or if the
6451+
buffer was not allocated with neither {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or
6452+
{CL_MEM_DEVICE_SHARED_ADDRESS_EXT}.
6453+
endif::cl_ext_buffer_device_address[]
63916454
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
63926455
by the OpenCL implementation on the device.
63936456
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
@@ -10689,6 +10752,48 @@ Otherwise, it returns one of the following errors:
1068910752
required by the OpenCL implementation on the host.
1069010753
--
1069110754

10755+
ifdef::cl_ext_buffer_device_address[]
10756+
[open,refpage='clSetKernelArgDevicePointerEXT',desc='Set a device pointer as the argument value for a specific argument of a kernel.',type='protos']
10757+
--
10758+
To set a device pointer as the argument value for a specific argument of a
10759+
kernel, call the function
10760+
10761+
include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
10762+
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]
10763+
10764+
* _kernel_ is a valid kernel object.
10765+
* _arg_index_ is the argument index.
10766+
Arguments to the kernel are referred by indices that go from 0 for the
10767+
leftmost argument to _n_ - 1, where _n_ is the total number of arguments
10768+
declared by a kernel.
10769+
* _arg_value_ is the device pointer that should be used as the argument value for
10770+
argument specified by _arg_index_.
10771+
The device pointer specified is the value used by all API calls that enqueue
10772+
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument
10773+
value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_.
10774+
The device pointer can only be used for arguments that are declared to be a
10775+
pointer to `global` memory allocated with {clCreateBufferWithProperties} with
10776+
either the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
10777+
property. The pointer value specified as the argument value
10778+
can be the pointer to the beginning of the buffer or any offset into
10779+
the buffer region. The device pointer value must be naturally aligned according to
10780+
the argument's type.
10781+
10782+
{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the argument was set
10783+
successfully. Otherwise, it returns one of the following errors:
10784+
10785+
* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
10786+
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support
10787+
the device pointer.
10788+
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index.
10789+
* {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value.
10790+
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
10791+
by the OpenCL implementation on the device.
10792+
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
10793+
required by the OpenCL implementation on the host.
10794+
--
10795+
endif::cl_ext_buffer_device_address[]
10796+
1069210797
[open,refpage='clSetKernelExecInfo',desc='Set additional execution information for a kernel.',type='protos']
1069310798
--
1069410799
To set additional execution information for a kernel, call the function
@@ -10755,6 +10860,19 @@ include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
1075510860
If {clSetKernelExecInfo} has not been called with a value for
1075610861
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM}, the default value is
1075710862
{CL_TRUE}.
10863+
10864+
ifdef::cl_ext_buffer_device_address[]
10865+
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor}
10866+
10867+
include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[]
10868+
| {cl_mem_device_address_EXT_TYPE}[]
10869+
| Device pointers must reference locations contained entirely within
10870+
buffers that are passed to kernel as arguments, or that are passed
10871+
through the execution information. Non-argument device pointers accessed
10872+
by the kernel must be specified by passing pointers to those buffers
10873+
via this {clSetKernelExecInfo} option.
10874+
endif::cl_ext_buffer_device_address[]
10875+
1075810876
|====
1075910877

1076010878
// refError
@@ -10764,7 +10882,14 @@ successfully.
1076410882
Otherwise, it returns one of the following errors:
1076510883

1076610884
* {CL_INVALID_KERNEL} if _kernel_ is a not a valid kernel object.
10767-
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support SVM.
10885+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_SVM_PTRS} if no devices in
10886+
the context associated with _kernel_ support SVM.
10887+
ifdef::cl_ext_buffer_device_address[]
10888+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} if no
10889+
device in the context associated with _kernel_ support device pointers.
10890+
endif::cl_ext_buffer_device_address[]
10891+
* {CL_INVALID_VALUE} if _param_name_ is not valid, if _param_value_ is
10892+
`NULL` or if the size specified by _param_value_size_ is not valid.
1076810893
* {CL_INVALID_OPERATION} if _param_name_ is
1076910894
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} and _param_value_ is {CL_TRUE}
1077010895
and no devices in the context associated with _kernel_ support fine-grain

extensions/extensions.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ include::cl_ext_float_atomics.asciidoc[]
3939
include::cl_ext_image_from_buffer.asciidoc[]
4040
<<<
4141
include::cl_ext_image_raw10_raw12.asciidoc[]
42+
<<<
43+
include::cl_ext_buffer_device_address.asciidoc[]
4244

4345
// Vendor Extensions
4446
:leveloffset: 0

xml/cl.xml

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,7 @@ server's OpenCL/api-docs repository.
255255
<type category="define">typedef <type>cl_bitfield</type> <name>cl_platform_command_buffer_capabilities_khr</name>;</type>
256256
<type category="define">typedef <type>cl_bitfield</type> <name>cl_mutable_dispatch_asserts_khr</name></type>
257257
<type category="define">typedef <type>cl_bitfield</type> <name>cl_device_kernel_clock_capabilities_khr</name>;</type>
258+
<type category="define">typedef <type>cl_ulong</type> <name>cl_mem_device_address_ext</name>;</type>
258259

259260
<comment>Structure types</comment>
260261
<type category="struct" name="cl_dx9_surface_info_khr">
@@ -719,6 +720,8 @@ server's OpenCL/api-docs repository.
719720
<enum value="((cl_device_partition_property_ext)0 - 1)" name="CL_PARTITION_BY_NAMES_LIST_END_EXT"/>
720721
<enum value="-1" name="CL_PARTITION_BY_NAMES_LIST_END_INTEL"/>
721722
<enum value="0" name="CL_MEM_DEVICE_HANDLE_LIST_END_KHR"/>
723+
<enum value="1" name="CL_MEM_DEVICE_SHARED_ADDRESS_EXT"/>
724+
<enum value="2" name="CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT"/>
722725
<enum value="0" name="CL_SEMAPHORE_DEVICE_HANDLE_LIST_END_KHR"/>
723726
<enum value="0" name="CL_SEMAPHORE_EXPORT_HANDLE_TYPES_LIST_END_KHR"/>
724727
</enums>
@@ -911,7 +914,7 @@ server's OpenCL/api-docs repository.
911914
<enum bitpos="38" name="CL_MEM_RESERVED1_QCOM"/>
912915
<enum bitpos="39" name="CL_MEM_RESERVED2_QCOM"/>
913916
<enum bitpos="40" name="CL_MEM_RESERVED3_QCOM"/>
914-
<unused start="41" end="63"/>
917+
<unused start="43" end="63"/>
915918
</enums>
916919

917920
<enums name="cl_map_flags" vendor="Khronos" type="bitmask">
@@ -1630,7 +1633,8 @@ server's OpenCL/api-docs repository.
16301633
<enum value="0x1108" name="CL_MEM_OFFSET"/>
16311634
<enum value="0x1109" name="CL_MEM_USES_SVM_POINTER"/>
16321635
<enum value="0x110A" name="CL_MEM_PROPERTIES"/>
1633-
<unused start="0x110B" end="0x110F" comment="Reserved for cl_mem_info"/>
1636+
<enum value="0x110B" name="CL_MEM_DEVICE_ADDRESS_EXT"/>
1637+
<unused start="0x110D" end="0x110F" comment="Reserved for cl_mem_info"/>
16341638
<enum value="0x1110" name="CL_IMAGE_FORMAT"/>
16351639
<enum value="0x1111" name="CL_IMAGE_ELEMENT_SIZE"/>
16361640
<enum value="0x1112" name="CL_IMAGE_ROW_PITCH"/>
@@ -1723,7 +1727,8 @@ server's OpenCL/api-docs repository.
17231727
<enum value="0x11B8" name="CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT"/>
17241728
<enum value="0x11B9" name="CL_KERNEL_MAX_NUM_SUB_GROUPS"/>
17251729
<enum value="0x11BA" name="CL_KERNEL_COMPILE_NUM_SUB_GROUPS"/>
1726-
<unused start="0x11BB" end="0x11CF" comment="Reserved for cl_kernel_info / cl_kernel_work_group_info / cl_kernel_exec_info / cl_kernel_sub_group_info"/>
1730+
<enum value="0x11BB" name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
1731+
<unused start="0x11BC" end="0x11CF" comment="Reserved for cl_kernel_info / cl_kernel_work_group_info / cl_kernel_exec_info / cl_kernel_sub_group_info"/>
17271732
<enum value="0x11D0" name="CL_EVENT_COMMAND_QUEUE"/>
17281733
<enum value="0x11D1" name="CL_EVENT_COMMAND_TYPE"/>
17291734
<enum value="0x11D2" name="CL_EVENT_REFERENCE_COUNT"/>
@@ -3730,6 +3735,12 @@ server's OpenCL/api-docs repository.
37303735
<param><type>cl_uint</type> <name>arg_index</name></param>
37313736
<param>const <type>void</type>* <name>arg_value</name></param>
37323737
</command>
3738+
<command suffix="CL_API_SUFFIX__VERSION_3_0">
3739+
<proto><type>cl_int</type> <name>clSetKernelArgDevicePointerEXT</name></proto>
3740+
<param><type>cl_kernel</type> <name>kernel</name></param>
3741+
<param><type>cl_uint</type> <name>arg_index</name></param>
3742+
<param>const <type>void</type>* <name>arg_value</name></param>
3743+
</command>
37333744
<command suffix="CL_API_SUFFIX__VERSION_2_0">
37343745
<proto><type>cl_int</type> <name>clSetKernelExecInfo</name></proto>
37353746
<param><type>cl_kernel</type> <name>kernel</name></param>
@@ -7191,6 +7202,24 @@ server's OpenCL/api-docs repository.
71917202
<command name="clSetContentSizeBufferPoCL"/>
71927203
</require>
71937204
</extension>
7205+
<extension name="cl_ext_buffer_device_address" revision="0.9.0" supported="opencl" depends="CL_VERSION_3_0" provisional="true">
7206+
<require>
7207+
<command name="clSetKernelArgDevicePointerEXT"/>
7208+
</require>
7209+
<require comment="cl_mem_properties">
7210+
<enum name="CL_MEM_DEVICE_SHARED_ADDRESS_EXT"/>
7211+
<enum name="CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT"/>
7212+
</require>
7213+
<require comment="cl_mem_info">
7214+
<enum name="CL_MEM_DEVICE_ADDRESS_EXT"/>
7215+
</require>
7216+
<require comment="cl_kernel_exec_info">
7217+
<enum name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
7218+
</require>
7219+
<require>
7220+
<type name="cl_mem_device_address_ext"/>
7221+
</require>
7222+
</extension>
71947223
<extension name="cl_khr_command_buffer" revision="0.9.6" supported="opencl" depends="CL_VERSION_1_2" ratified="opencl" provisional="true">
71957224
<require>
71967225
<type name="CL/cl.h"/>

0 commit comments

Comments
 (0)