Skip to content

Commit 9b18682

Browse files
author
Pekka Jääskeläinen
committed
cl_ext_buffer_device_address
The basic cl_mem buffer API doesn't enable access to the underlying raw pointers in the device memory, preventing its use in host side data structures that need pointer references to objects. This API adds a minimal increment on top of cl_mem that provides such capabilities.
1 parent 6662fc3 commit 9b18682

4 files changed

Lines changed: 256 additions & 4 deletions

File tree

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
// Copyright 2024 The Khronos Group Inc.
2+
// SPDX-License-Identifier: CC-BY-4.0
3+
4+
include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[]
5+
6+
=== Other Extension Metadata
7+
8+
*Last Modified Date*::
9+
2024-12-06
10+
*IP Status*::
11+
No known IP claims.
12+
*Contributors*::
13+
- Pekka Jääskeläinen, Intel +
14+
- Karol Herbst, Red Hat +
15+
- Henry Linjamäki, Intel +
16+
- Kevin Petit, Arm +
17+
18+
=== Description
19+
20+
This extension provides access to raw device pointers for cl_mem buffers
21+
without requiring a shared virtual address space between the host and
22+
the device.
23+
24+
==== Background
25+
26+
Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
27+
that enables raw pointers in the OpenCL standard. Its coarse-grain
28+
variant is relatively simple to implement on various platforms in terms of
29+
coherency requirements, but it requires mapping the buffer's address range
30+
to the host virtual address space.
31+
However, various higher-level heterogeneous APIs present a memory allocation
32+
routine which can allocate device-only memory and provide raw addresses to
33+
it without guarentees of system-wide uniqueness. For example, minimal
34+
implementations of OpenMP's omp_target_alloc() and CUDA/HIP's
35+
cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device.
36+
37+
Host-device unified addressing might not be a major implementation issue in
38+
systems which can provide virtual memory across the platform, but might
39+
bring challenges in cases where the device presents a global memory with
40+
a disjoint address space (that can also be a physical memory address space) or,
41+
for example, when a barebone embedded system lacks virtual memory support altogether.
42+
This extension is targeted to complement the OpenCL SVM extension by providing
43+
an additional lower-end step in the spectrum of type of pointers/buffers OpenCL
44+
can allocate.
45+
46+
=== New Command
47+
48+
* {clSetKernelArgDevicePointerEXT}
49+
50+
=== New Types
51+
52+
* {cl_mem_device_address_EXT}
53+
54+
=== New Enums
55+
56+
* {cl_mem_properties_TYPE}
57+
** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
58+
** {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
59+
* {cl_mem_info_TYPE}
60+
** {CL_MEM_DEVICE_ADDRESS_EXT}
61+
* {cl_kernel_exec_info_TYPE}
62+
** {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT}
63+
64+
=== Version History
65+
66+
[cols="5,15,15,70"]
67+
[grid="rows"]
68+
[options="header"]
69+
|====
70+
| *Version* | *Date* | *Author* | *Changes*
71+
| 0.9.0 | 2024-12-06 | Pekka Jääskeläinen, Kevin Petit |
72+
Integrated to the main unified specification.
73+
Moved the functionality to clCreateBufferWithProperties,
74+
thus requiring 3.0+. Single memobj query for fetching the
75+
address(es). Also other smaller improvements pointed by Kevin.
76+
Candidate for final 1.0.0.
77+
| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst |
78+
Made the allocation flags independent from each other and
79+
renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and
80+
CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the
81+
same address across all devices in the context, whereas the latter
82+
allows per-device addresses.
83+
| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst |
84+
Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device
85+
cases "all", not "any", covering a case where not all devices
86+
can ensure the same address across the context. In that case
87+
CL_INVALID_VALUE can be returned. Defined sub-buffer address
88+
computation to be 'base_addr + origin'. Added error conditions
89+
for clSetKernelExecInfo when the device doesn't support
90+
device pointers.
91+
| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback.
92+
This version describes the first API version that was prototyped
93+
in PoCL and RustiCL using temporary placeholder flag/enum values.
94+
The PoCL implementation and initial discussion on the extension
95+
can be found https://github.com/pocl/pocl/pull/1441[in this PR].
96+
|====

api/opencl_runtime_layer.asciidoc

Lines changed: 126 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -595,6 +595,39 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_HANDLE_LIST_KHR.asciidoc[]
595595
{CL_MEM_DEVICE_HANDLE_LIST_END_KHR_anchor}) to associate with the
596596
external memory handle.
597597
endif::cl_khr_external_memory[]
598+
599+
ifdef::cl_ext_buffer_device_address[]
600+
601+
| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor}
602+
603+
include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[]
604+
| {cl_bool_TYPE}
605+
| When set to CL_TRUE, specifies that the buffer must have a single fixed
606+
device-side address for its lifetime, and the address can be queried via {clGetMemObjectInfo}.
607+
608+
Each device in the context can have their own (fixed) device-side address and
609+
a copy of the created buffer which are synchronized
610+
implicitly by the runtime.
611+
612+
The flag might imply that the buffer will be "pinned" permanently to
613+
a device's memory, but might not be necessarily so, as long as the address
614+
range of the buffer remains constant.
615+
616+
The device addresses of sub-buffers derived from {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
617+
allocated buffers can be computed by adding the sub-buffer origin to the
618+
device-specific start address.
619+
620+
| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor}
621+
622+
include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[]
623+
| {cl_bool_TYPE}
624+
| When set to CL_TRUE, the buffer has otherwise the same properties as
625+
when allocated using the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} flag,
626+
but with an additional property that the buffer's address is the same across
627+
all the devices in the context.
628+
629+
endif::cl_ext_buffer_device_address[]
630+
598631
|====
599632

600633
ifdef::cl_khr_external_memory[]
@@ -662,6 +695,15 @@ ifdef::cl_khr_external_memory[]
662695
{CL_MEM_DEVICE_HANDLE_LIST_KHR} is specified as part of _properties_.
663696
** if _properties_ includes more than one external memory handle.
664697
endif::cl_khr_external_memory[]
698+
ifdef::cl_ext_buffer_device_address[]
699+
* {CL_INVALID_DEVICE}
700+
** If _properties_ includes either {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} or
701+
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there is at least one device in
702+
the context that doesn't support such allocation.
703+
* {CL_INVALID_VALUE}
704+
** If _properties_ includes both {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} and
705+
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} at the same time.
706+
endif::cl_ext_buffer_device_address[]
665707

666708
[[memory-flags-table]]
667709
.List of supported memory flag values
@@ -6463,6 +6505,21 @@ include::{generated}/api/version-notes/CL_MEM_D3D11_RESOURCE_KHR.asciidoc[]
64636505
returns the _resource_ argument specified when _memobj_ was created.
64646506
endif::cl_khr_d3d11_sharing[]
64656507

6508+
ifdef::cl_ext_buffer_device_address[]
6509+
| {CL_MEM_DEVICE_ADDRESS_EXT_anchor}
6510+
6511+
include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
6512+
| {cl_mem_device_address_EXT_TYPE}[]
6513+
| If _memobj_ was created using {clCreateBufferWithProperties} with
6514+
the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to CL_TRUE,
6515+
returns a list of device addresses for the buffer, one for each
6516+
device in the context. If the buffer was allocated
6517+
with the {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} property,
6518+
only one device address is returned.
6519+
6520+
endif::cl_ext_buffer_device_address[]
6521+
6522+
64666523
|====
64676524

64686525
// refError
@@ -6477,6 +6534,12 @@ Otherwise, it returns one of the following errors:
64776534
the return type specified in the
64786535
<<mem-info-table, Memory Object Queries>> table
64796536
and _param_value_ is not `NULL`.
6537+
ifdef::cl_ext_buffer_device_address[]
6538+
** Returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if
6539+
the {cl_ext_buffer_device_address_EXT} is not supported or if the
6540+
buffer was not allocated with neither {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or
6541+
{CL_MEM_DEVICE_SHARED_ADDRESS_EXT}.
6542+
endif::cl_ext_buffer_device_address[]
64806543
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
64816544
by the OpenCL implementation on the device.
64826545
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
@@ -10778,6 +10841,48 @@ Otherwise, it returns one of the following errors:
1077810841
required by the OpenCL implementation on the host.
1077910842
--
1078010843

10844+
ifdef::cl_ext_buffer_device_address[]
10845+
[open,refpage='clSetKernelArgDevicePointerEXT',desc='Set a device pointer as the argument value for a specific argument of a kernel.',type='protos']
10846+
--
10847+
To set a device pointer as the argument value for a specific argument of a
10848+
kernel, call the function
10849+
10850+
include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
10851+
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]
10852+
10853+
* _kernel_ is a valid kernel object.
10854+
* _arg_index_ is the argument index.
10855+
Arguments to the kernel are referred by indices that go from 0 for the
10856+
leftmost argument to _n_ - 1, where _n_ is the total number of arguments
10857+
declared by a kernel.
10858+
* _arg_value_ is the device pointer that should be used as the argument value for
10859+
argument specified by _arg_index_.
10860+
The device pointer specified is the value used by all API calls that enqueue
10861+
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument
10862+
value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_.
10863+
The device pointer can only be used for arguments that are declared to be a
10864+
pointer to `global` memory allocated with {clCreateBufferWithProperties} with
10865+
either the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
10866+
property. The pointer value specified as the argument value
10867+
can be the pointer to the beginning of the buffer or any offset into
10868+
the buffer region. The device pointer value must be naturally aligned according to
10869+
the argument's type.
10870+
10871+
{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the argument was set
10872+
successfully. Otherwise, it returns one of the following errors:
10873+
10874+
* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
10875+
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support
10876+
the device pointer.
10877+
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index.
10878+
* {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value.
10879+
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
10880+
by the OpenCL implementation on the device.
10881+
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
10882+
required by the OpenCL implementation on the host.
10883+
--
10884+
endif::cl_ext_buffer_device_address[]
10885+
1078110886
[open,refpage='clSetKernelExecInfo',desc='Set additional execution information for a kernel.',type='protos']
1078210887
--
1078310888
To set additional execution information for a kernel, call the function
@@ -10844,6 +10949,19 @@ include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
1084410949
If {clSetKernelExecInfo} has not been called with a value for
1084510950
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM}, the default value is
1084610951
{CL_TRUE}.
10952+
10953+
ifdef::cl_ext_buffer_device_address[]
10954+
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor}
10955+
10956+
include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[]
10957+
| {cl_mem_device_address_EXT_TYPE}[]
10958+
| Device pointers must reference locations contained entirely within
10959+
buffers that are passed to kernel as arguments, or that are passed
10960+
through the execution information. Non-argument device pointers accessed
10961+
by the kernel must be specified by passing pointers to those buffers
10962+
via this {clSetKernelExecInfo} option.
10963+
endif::cl_ext_buffer_device_address[]
10964+
1084710965
|====
1084810966

1084910967
// refError
@@ -10853,7 +10971,14 @@ successfully.
1085310971
Otherwise, it returns one of the following errors:
1085410972

1085510973
* {CL_INVALID_KERNEL} if _kernel_ is a not a valid kernel object.
10856-
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support SVM.
10974+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_SVM_PTRS} if no devices in
10975+
the context associated with _kernel_ support SVM.
10976+
ifdef::cl_ext_buffer_device_address[]
10977+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} if no
10978+
device in the context associated with _kernel_ support device pointers.
10979+
endif::cl_ext_buffer_device_address[]
10980+
* {CL_INVALID_VALUE} if _param_name_ is not valid, if _param_value_ is
10981+
`NULL` or if the size specified by _param_value_size_ is not valid.
1085710982
* {CL_INVALID_OPERATION} if _param_name_ is
1085810983
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} and _param_value_ is {CL_TRUE}
1085910984
and no devices in the context associated with _kernel_ support fine-grain

extensions/extensions.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ Khronos{R} OpenCL Working Group
3737
include::cl_ext_float_atomics.asciidoc[]
3838
<<<
3939
include::cl_ext_image_raw10_raw12.asciidoc[]
40+
<<<
41+
include::cl_ext_buffer_device_address.asciidoc[]
4042

4143
// Vendor Extensions
4244
:leveloffset: 0

xml/cl.xml

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,7 @@ server's OpenCL/api-docs repository.
255255
<type category="define">typedef <type>cl_bitfield</type> <name>cl_platform_command_buffer_capabilities_khr</name>;</type>
256256
<type category="define">typedef <type>cl_bitfield</type> <name>cl_mutable_dispatch_asserts_khr</name></type>
257257
<type category="define">typedef <type>cl_bitfield</type> <name>cl_device_kernel_clock_capabilities_khr</name>;</type>
258+
<type category="define">typedef <type>cl_ulong</type> <name>cl_mem_device_address_ext</name>;</type>
258259

259260
<comment>Structure types</comment>
260261
<type category="struct" name="cl_dx9_surface_info_khr">
@@ -719,6 +720,8 @@ server's OpenCL/api-docs repository.
719720
<enum value="((cl_device_partition_property_ext)0 - 1)" name="CL_PARTITION_BY_NAMES_LIST_END_EXT"/>
720721
<enum value="-1" name="CL_PARTITION_BY_NAMES_LIST_END_INTEL"/>
721722
<enum value="0" name="CL_MEM_DEVICE_HANDLE_LIST_END_KHR"/>
723+
<enum value="1" name="CL_MEM_DEVICE_SHARED_ADDRESS_EXT"/>
724+
<enum value="2" name="CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT"/>
722725
<enum value="0" name="CL_SEMAPHORE_DEVICE_HANDLE_LIST_END_KHR"/>
723726
<enum value="0" name="CL_SEMAPHORE_EXPORT_HANDLE_TYPES_LIST_END_KHR"/>
724727
</enums>
@@ -911,7 +914,7 @@ server's OpenCL/api-docs repository.
911914
<enum bitpos="38" name="CL_MEM_RESERVED1_QCOM"/>
912915
<enum bitpos="39" name="CL_MEM_RESERVED2_QCOM"/>
913916
<enum bitpos="40" name="CL_MEM_RESERVED3_QCOM"/>
914-
<unused start="41" end="63"/>
917+
<unused start="43" end="63"/>
915918
</enums>
916919

917920
<enums name="cl_map_flags" vendor="Khronos" type="bitmask">
@@ -1630,7 +1633,8 @@ server's OpenCL/api-docs repository.
16301633
<enum value="0x1108" name="CL_MEM_OFFSET"/>
16311634
<enum value="0x1109" name="CL_MEM_USES_SVM_POINTER"/>
16321635
<enum value="0x110A" name="CL_MEM_PROPERTIES"/>
1633-
<unused start="0x110B" end="0x110F" comment="Reserved for cl_mem_info"/>
1636+
<enum value="0x110B" name="CL_MEM_DEVICE_ADDRESS_EXT"/>
1637+
<unused start="0x110D" end="0x110F" comment="Reserved for cl_mem_info"/>
16341638
<enum value="0x1110" name="CL_IMAGE_FORMAT"/>
16351639
<enum value="0x1111" name="CL_IMAGE_ELEMENT_SIZE"/>
16361640
<enum value="0x1112" name="CL_IMAGE_ROW_PITCH"/>
@@ -1723,7 +1727,8 @@ server's OpenCL/api-docs repository.
17231727
<enum value="0x11B8" name="CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT"/>
17241728
<enum value="0x11B9" name="CL_KERNEL_MAX_NUM_SUB_GROUPS"/>
17251729
<enum value="0x11BA" name="CL_KERNEL_COMPILE_NUM_SUB_GROUPS"/>
1726-
<unused start="0x11BB" end="0x11CF" comment="Reserved for cl_kernel_info / cl_kernel_work_group_info / cl_kernel_exec_info / cl_kernel_sub_group_info"/>
1730+
<enum value="0x11BB" name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
1731+
<unused start="0x11BC" end="0x11CF" comment="Reserved for cl_kernel_info / cl_kernel_work_group_info / cl_kernel_exec_info / cl_kernel_sub_group_info"/>
17271732
<enum value="0x11D0" name="CL_EVENT_COMMAND_QUEUE"/>
17281733
<enum value="0x11D1" name="CL_EVENT_COMMAND_TYPE"/>
17291734
<enum value="0x11D2" name="CL_EVENT_REFERENCE_COUNT"/>
@@ -3730,6 +3735,12 @@ server's OpenCL/api-docs repository.
37303735
<param><type>cl_uint</type> <name>arg_index</name></param>
37313736
<param>const <type>void</type>* <name>arg_value</name></param>
37323737
</command>
3738+
<command suffix="CL_API_SUFFIX__VERSION_3_0">
3739+
<proto><type>cl_int</type> <name>clSetKernelArgDevicePointerEXT</name></proto>
3740+
<param><type>cl_kernel</type> <name>kernel</name></param>
3741+
<param><type>cl_uint</type> <name>arg_index</name></param>
3742+
<param>const <type>void</type>* <name>arg_value</name></param>
3743+
</command>
37333744
<command suffix="CL_API_SUFFIX__VERSION_2_0">
37343745
<proto><type>cl_int</type> <name>clSetKernelExecInfo</name></proto>
37353746
<param><type>cl_kernel</type> <name>kernel</name></param>
@@ -7191,6 +7202,24 @@ server's OpenCL/api-docs repository.
71917202
<command name="clSetContentSizeBufferPoCL"/>
71927203
</require>
71937204
</extension>
7205+
<extension name="cl_ext_buffer_device_address" revision="0.9.0" supported="opencl" depends="CL_VERSION_3_0" provisional="true">
7206+
<require>
7207+
<command name="clSetKernelArgDevicePointerEXT"/>
7208+
</require>
7209+
<require comment="cl_mem_properties">
7210+
<enum name="CL_MEM_DEVICE_SHARED_ADDRESS_EXT"/>
7211+
<enum name="CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT"/>
7212+
</require>
7213+
<require comment="cl_mem_info">
7214+
<enum name="CL_MEM_DEVICE_ADDRESS_EXT"/>
7215+
</require>
7216+
<require comment="cl_kernel_exec_info">
7217+
<enum name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
7218+
</require>
7219+
<require>
7220+
<type name="cl_mem_device_address_ext"/>
7221+
</require>
7222+
</extension>
71947223
<extension name="cl_khr_command_buffer" revision="0.9.7" supported="opencl" depends="CL_VERSION_1_2" ratified="opencl" provisional="true">
71957224
<require>
71967225
<type name="CL/cl.h"/>

0 commit comments

Comments
 (0)