Skip to content

Commit 859b53b

Browse files
Pekka Jääskeläinenaharon-abramson
authored andcommitted
cl_ext_buffer_device_address (#1159)
* cl_ext_buffer_device_address The basic cl_mem buffer API doesn't enable access to the underlying raw pointers in the device memory, preventing its use in host side data structures that need pointer references to objects. This API adds a minimal increment on top of cl_mem that provides such capabilities. * BDA: Removed CL_MEM_DEVICE_SHARED_ADDRESS_EXT as unneeded. Also made the enums globally unique. * cl_ext_buffer_device_address to 1.0.0 The only content addition since the previous version is "If the device supports SVM and {clCreateBufferWithProperties} is called with a pointer returned by {clSVMAlloc} as its _host_ptr_ argument, and {CL_MEM_USE_HOST_PTR} is set in its _flags_ argument, the device-side address is guaranteed to match the _host_ptr." * cl_ext_buffer_device_address: Revision 1.0.1 * Made it explicit that passing illegal pointers is legal as long as they are not referenced. * Removed CL_INVALID_ARG_VALUE as a possible error in clSetKernelArgDevicePointerEXT() as there are no illegal pointer cases when calling this function. Return CL_INVALID_OPERATION for clGetMemObjectInfo() if the pointer is not a buffer device pointer. * clSetKernelExecInfo() and clSetKernelArgDevicePointerEXT() now only error out if no devices in the context associated with kernel support device pointers. * cl_ext_buffer_device_address: Revision 1.0.2 Converted the clSetKernelArgDevicePointerEXT() address parameter to a value instead of a pointer to the value.
1 parent 73971af commit 859b53b

3 files changed

Lines changed: 229 additions & 1 deletion

File tree

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
// Copyright 2024 The Khronos Group Inc.
2+
// SPDX-License-Identifier: CC-BY-4.0
3+
4+
include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[]
5+
6+
=== Other Extension Metadata
7+
8+
*Last Modified Date*::
9+
2025-02-04
10+
*IP Status*::
11+
No known IP claims.
12+
*Contributors*::
13+
- Pekka Jääskeläinen, Intel +
14+
- Karol Herbst, Red Hat +
15+
- Ben Ashbaugh, Intel +
16+
- Kevin Petit, Arm +
17+
- Henry Linjamäki, Intel +
18+
19+
=== Description
20+
21+
This extension provides access to raw device pointers for cl_mem buffers
22+
without requiring a shared virtual address space between the host and
23+
the device.
24+
25+
==== Background
26+
27+
Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
28+
that enables raw pointers in the OpenCL standard. Its coarse-grain
29+
variant is relatively simple to implement on various platforms in terms of
30+
coherency requirements, but it requires mapping the buffer's address range
31+
to the host virtual address space.
32+
However, various higher-level heterogeneous APIs present a memory allocation
33+
routine which can allocate device-only memory and provide raw addresses to
34+
it without guarentees of system-wide uniqueness. For example, minimal
35+
implementations of OpenMP's omp_target_alloc() and CUDA/HIP's
36+
cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device.
37+
38+
Host-device unified addressing might not be a major implementation issue in
39+
systems which can provide virtual memory across the platform, but might
40+
bring challenges in cases where the device presents a global memory with
41+
a disjoint address space (that can also be a physical memory address space) or,
42+
for example, when a barebone embedded system lacks virtual memory support altogether.
43+
This extension is targeted to complement the OpenCL SVM extension by providing
44+
an additional lower-end step in the spectrum of type of pointers/buffers OpenCL
45+
can allocate.
46+
47+
=== New Command
48+
49+
* {clSetKernelArgDevicePointerEXT}
50+
51+
=== New Types
52+
53+
* {cl_mem_device_address_EXT}
54+
55+
=== New Enums
56+
57+
* {cl_mem_properties_TYPE}
58+
** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
59+
* {cl_mem_info_TYPE}
60+
** {CL_MEM_DEVICE_ADDRESS_EXT}
61+
* {cl_kernel_exec_info_TYPE}
62+
** {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT}
63+
64+
=== Version History
65+
66+
* Revision 1.0.0, 2025-01-15
67+
** Initial version for detailed review.
68+
* Revision 1.0.1, 2025-01-28
69+
** Made it explicit that passing illegal pointers is legal as long as they are
70+
not referenced. Removed CL_INVALID_ARG_VALUE as a possible error in
71+
clSetKernelArgDevicePointerEXT() as there are no illegal pointer
72+
cases when calling this function. Return CL_INVALID_OPERATION for
73+
clGetMemObjectInfo() if the pointer is not a buffer device pointer.
74+
clSetKernelExecInfo() and clSetKernelArgDevicePointerEXT() now only
75+
error out if no devices in the context associated with kernel support
76+
device pointers.
77+
* Revision 1.0.2, 2025-02-04
78+
** Converted the clSetKernelArgDevicePointerEXT() address parameter to
79+
a value instead of a pointer to the value.
80+

api/opencl_runtime_layer.asciidoc

Lines changed: 119 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -595,6 +595,35 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_HANDLE_LIST_KHR.asciidoc[]
595595
{CL_MEM_DEVICE_HANDLE_LIST_END_KHR_anchor}) to associate with the
596596
external memory handle.
597597
endif::cl_khr_external_memory[]
598+
599+
ifdef::cl_ext_buffer_device_address[]
600+
601+
| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor}
602+
603+
include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[]
604+
| {cl_bool_TYPE}
605+
| When set to {CL_TRUE}, specifies that the buffer must have a single fixed
606+
device-side address for its lifetime, and the address can be queried via {clGetMemObjectInfo}.
607+
608+
Each device in the context can have their own (fixed) device-side address and
609+
a copy of the created buffer which are synchronized
610+
implicitly by the runtime.
611+
612+
The flag might imply that the buffer will be "pinned" permanently to
613+
a device's memory, but might not be necessarily so, as long as the address
614+
range of the buffer remains constant.
615+
616+
The device addresses of sub-buffers derived from {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
617+
allocated buffers can be computed by adding the sub-buffer origin to the
618+
device-specific start address.
619+
620+
If the device supports SVM and {clCreateBufferWithProperties} is called with a pointer
621+
returned by {clSVMAlloc} as its _host_ptr_ argument, and {CL_MEM_USE_HOST_PTR} is
622+
set in its _flags_ argument, the device-side address is guaranteed to match
623+
the _host_ptr_.
624+
625+
endif::cl_ext_buffer_device_address[]
626+
598627
|====
599628

600629
ifdef::cl_khr_external_memory[]
@@ -662,6 +691,12 @@ ifdef::cl_khr_external_memory[]
662691
{CL_MEM_DEVICE_HANDLE_LIST_KHR} is specified as part of _properties_.
663692
** if _properties_ includes more than one external memory handle.
664693
endif::cl_khr_external_memory[]
694+
ifdef::cl_ext_buffer_device_address[]
695+
* {CL_INVALID_OPERATION}
696+
** If _properties_ includes {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there
697+
are no devices in the context that support the {cl_ext_buffer_device_address_EXT}
698+
extension.
699+
endif::cl_ext_buffer_device_address[]
665700

666701
[[memory-flags-table]]
667702
.List of supported memory flag values
@@ -6463,6 +6498,20 @@ include::{generated}/api/version-notes/CL_MEM_D3D11_RESOURCE_KHR.asciidoc[]
64636498
returns the _resource_ argument specified when _memobj_ was created.
64646499
endif::cl_khr_d3d11_sharing[]
64656500

6501+
ifdef::cl_ext_buffer_device_address[]
6502+
| {CL_MEM_DEVICE_ADDRESS_EXT_anchor}
6503+
6504+
include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
6505+
| {cl_mem_device_address_EXT_TYPE}[]
6506+
| If _memobj_ was created using {clCreateBufferWithProperties} with
6507+
the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to {CL_TRUE},
6508+
returns a list of device addresses for the buffer, one for each
6509+
device in the context in the same order as the list of devices
6510+
passed to {clCreateContext}.
6511+
6512+
endif::cl_ext_buffer_device_address[]
6513+
6514+
64666515
|====
64676516

64686517
// refError
@@ -6472,6 +6521,11 @@ successfully.
64726521
Otherwise, it returns one of the following errors:
64736522

64746523
* {CL_INVALID_MEM_OBJECT} if _memobj_ is a not a valid memory object.
6524+
ifdef::cl_ext_buffer_device_address[]
6525+
* {CL_INVALID_OPERATION} is returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if
6526+
the {cl_ext_buffer_device_address_EXT} is not supported or if the
6527+
buffer was not allocated with {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}.
6528+
endif::cl_ext_buffer_device_address[]
64756529
* {CL_INVALID_VALUE} if _param_name_ is not one of the supported values, or
64766530
if the size in bytes specified by _param_value_size_ is less than size of
64776531
the return type specified in the
@@ -10778,6 +10832,48 @@ Otherwise, it returns one of the following errors:
1077810832
required by the OpenCL implementation on the host.
1077910833
--
1078010834

10835+
ifdef::cl_ext_buffer_device_address[]
10836+
[open,refpage='clSetKernelArgDevicePointerEXT',desc='Set a device pointer as the argument value for a specific argument of a kernel.',type='protos']
10837+
--
10838+
To set a device pointer as the argument value for a specific argument of a
10839+
kernel, call the function
10840+
10841+
include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
10842+
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]
10843+
10844+
* _kernel_ is a valid kernel object.
10845+
* _arg_index_ is the argument index.
10846+
Arguments to the kernel are referred by indices that go from 0 for the
10847+
leftmost argument to _n_ - 1, where _n_ is the total number of arguments
10848+
declared by a kernel.
10849+
* _arg_value_ is the device pointer that should be used as the argument value for
10850+
argument specified by _arg_index_.
10851+
The device pointer specified is the value used by all API calls that enqueue
10852+
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument
10853+
value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_.
10854+
The device pointer can only be used for arguments that are declared to be a
10855+
pointer to `global` memory allocated with {clCreateBufferWithProperties} with
10856+
the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property. The pointer value specified as
10857+
the argument value can be the pointer to the beginning of the buffer or any offset into
10858+
the buffer region. The device pointer value must be naturally aligned according to
10859+
the argument's type. It should be noted that it's legal to pass invalid
10860+
pointers as the value (similarly to C/C++ function calls with pointer arguments) as
10861+
long as the kernel doesn't dereference the pointer.
10862+
10863+
{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the argument was set
10864+
successfully. Otherwise, it returns one of the following errors:
10865+
10866+
* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
10867+
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support
10868+
the {cl_ext_buffer_device_address_EXT} extension.
10869+
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index.
10870+
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
10871+
by the OpenCL implementation on the device.
10872+
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
10873+
required by the OpenCL implementation on the host.
10874+
--
10875+
endif::cl_ext_buffer_device_address[]
10876+
1078110877
[open,refpage='clSetKernelExecInfo',desc='Set additional execution information for a kernel.',type='protos']
1078210878
--
1078310879
To set additional execution information for a kernel, call the function
@@ -10844,6 +10940,19 @@ include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
1084410940
If {clSetKernelExecInfo} has not been called with a value for
1084510941
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM}, the default value is
1084610942
{CL_TRUE}.
10943+
10944+
ifdef::cl_ext_buffer_device_address[]
10945+
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor}
10946+
10947+
include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[]
10948+
| {cl_mem_device_address_EXT_TYPE}[]
10949+
| Device pointers must reference locations contained entirely within
10950+
buffers that are passed to kernel as arguments, or that are passed
10951+
through the execution information. Non-argument device pointers accessed
10952+
by the kernel must be specified by passing pointers to those buffers
10953+
via this {clSetKernelExecInfo} option.
10954+
endif::cl_ext_buffer_device_address[]
10955+
1084710956
|====
1084810957

1084910958
// refError
@@ -10853,7 +10962,16 @@ successfully.
1085310962
Otherwise, it returns one of the following errors:
1085410963

1085510964
* {CL_INVALID_KERNEL} if _kernel_ is a not a valid kernel object.
10856-
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support SVM.
10965+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_SVM_PTRS} and
10966+
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} if no devices in
10967+
the context associated with _kernel_ support SVM.
10968+
ifdef::cl_ext_buffer_device_address[]
10969+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} if no
10970+
device in the context associated with _kernel_ support the {cl_ext_buffer_device_address_EXT}
10971+
extension.
10972+
endif::cl_ext_buffer_device_address[]
10973+
* {CL_INVALID_VALUE} if _param_name_ is not valid, if _param_value_ is
10974+
`NULL` or if the size specified by _param_value_size_ is not valid.
1085710975
* {CL_INVALID_OPERATION} if _param_name_ is
1085810976
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} and _param_value_ is {CL_TRUE}
1085910977
and no devices in the context associated with _kernel_ support fine-grain

xml/cl.xml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,7 @@ server's OpenCL/api-docs repository.
255255
<type category="define">typedef <type>cl_bitfield</type> <name>cl_platform_command_buffer_capabilities_khr</name>;</type>
256256
<type category="define">typedef <type>cl_bitfield</type> <name>cl_mutable_dispatch_asserts_khr</name></type>
257257
<type category="define">typedef <type>cl_bitfield</type> <name>cl_device_kernel_clock_capabilities_khr</name>;</type>
258+
<type category="define">typedef <type>cl_ulong</type> <name>cl_mem_device_address_ext</name>;</type>
258259

259260
<comment>Structure types</comment>
260261
<type category="struct" name="cl_dx9_surface_info_khr">
@@ -2315,6 +2316,12 @@ server's OpenCL/api-docs repository.
23152316
<unused start="0x42B0" end="0x4FFF"/>
23162317
</enums>
23172318

2319+
<enums start="0x5000" end="0x500F" name="enums.5000" comment="For cl_ext_buffer_device_address">
2320+
<enum value="0x5000" name="CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT"/>
2321+
<enum value="0x5001" name="CL_MEM_DEVICE_ADDRESS_EXT"/>
2322+
<enum value="0x5002" name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
2323+
</enums>
2324+
23182325
<enums start="0x10000" end="0x10FFF" name="enums.10000" vendor="Khronos" comment="Experimental range for internal development only. Do not allocate.">
23192326
<!-- Khronos will never assign values in this range, and vendors
23202327
should never ship using values in this range. It is intended
@@ -3730,6 +3737,12 @@ server's OpenCL/api-docs repository.
37303737
<param><type>cl_uint</type> <name>arg_index</name></param>
37313738
<param>const <type>void</type>* <name>arg_value</name></param>
37323739
</command>
3740+
<command suffix="CL_API_SUFFIX__VERSION_3_0">
3741+
<proto><type>cl_int</type> <name>clSetKernelArgDevicePointerEXT</name></proto>
3742+
<param><type>cl_kernel</type> <name>kernel</name></param>
3743+
<param><type>cl_uint</type> <name>arg_index</name></param>
3744+
<param><type>cl_mem_device_address_ext</type> <name>arg_value</name></param>
3745+
</command>
37333746
<command suffix="CL_API_SUFFIX__VERSION_2_0">
37343747
<proto><type>cl_int</type> <name>clSetKernelExecInfo</name></proto>
37353748
<param><type>cl_kernel</type> <name>kernel</name></param>
@@ -7191,6 +7204,23 @@ server's OpenCL/api-docs repository.
71917204
<command name="clSetContentSizeBufferPoCL"/>
71927205
</require>
71937206
</extension>
7207+
<extension name="cl_ext_buffer_device_address" revision="1.0.2" supported="opencl" depends="CL_VERSION_3_0">
7208+
<require>
7209+
<type name="cl_mem_device_address_ext"/>
7210+
</require>
7211+
<require>
7212+
<command name="clSetKernelArgDevicePointerEXT"/>
7213+
</require>
7214+
<require comment="cl_mem_properties">
7215+
<enum name="CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT"/>
7216+
</require>
7217+
<require comment="cl_mem_info">
7218+
<enum name="CL_MEM_DEVICE_ADDRESS_EXT"/>
7219+
</require>
7220+
<require comment="cl_kernel_exec_info">
7221+
<enum name="CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT"/>
7222+
</require>
7223+
</extension>
71947224
<extension name="cl_khr_command_buffer" revision="0.9.7" supported="opencl" depends="CL_VERSION_1_2" ratified="opencl" provisional="true">
71957225
<require>
71967226
<type name="CL/cl.h"/>

0 commit comments

Comments
 (0)