Skip to content

Commit edcff73

Browse files
author
Pekka Jääskeläinen
committed
Integrated to the main unified specification and other updates.
* Moved the functionality to clCreateBufferWithProperties, thus now requiring 3.0+. * Single memobj query for fetching the address(es). * Also other smaller improvements pointed by Kevin. * Candidate for 1.0.0.
1 parent 8f39997 commit edcff73

5 files changed

Lines changed: 233 additions & 337 deletions

File tree

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
// Copyright 2024 The Khronos Group Inc.
2+
// SPDX-License-Identifier: CC-BY-4.0
3+
4+
include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[]
5+
6+
=== Other Extension Metadata
7+
8+
*Last Modified Date*::
9+
2024-12-06
10+
*IP Status*::
11+
No known IP claims.
12+
*Contributors*::
13+
- Pekka Jääskeläinen, Intel +
14+
- Karol Herbst, Red Hat +
15+
- Henry Linjamäki, Intel +
16+
- Kevin Petit, Arm +
17+
18+
=== Description
19+
20+
This extension provides access to raw device pointers for cl_mem buffers
21+
without requiring a shared virtual address space between the host and
22+
the device.
23+
24+
==== Background
25+
26+
Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
27+
that enables raw pointers in the OpenCL standard. Its coarse-grain
28+
variant is relatively simple to implement on various platforms in terms of
29+
coherency requirements, but it requires mapping the buffer's address range
30+
to the host virtual address space.
31+
However, various higher-level heterogeneous APIs present a memory allocation
32+
routine which can allocate device-only memory and provide raw addresses to
33+
it without guarentees of system-wide uniqueness. For example, minimal
34+
implementations of OpenMP's omp_target_alloc() and CUDA/HIP's
35+
cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device.
36+
37+
Host-device unified addressing might not be a major implementation issue in
38+
systems which can provide virtual memory across the platform, but might
39+
bring challenges in cases where the device presents a global memory with
40+
a disjoint address space (that can also be a physical memory address space) or,
41+
for example, when a barebone embedded system lacks virtual memory support altogether.
42+
This extension is targeted to complement the OpenCL SVM extension by providing
43+
an additional lower-end step in the spectrum of type of pointers/buffers OpenCL
44+
can allocate.
45+
46+
=== New Command
47+
48+
* {clSetKernelArgDevicePointerEXT}
49+
50+
=== New Types
51+
52+
* {cl_mem_device_address_EXT}
53+
54+
=== New Enums
55+
56+
* {cl_mem_properties_TYPE}
57+
** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
58+
** {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
59+
* {cl_mem_info_TYPE}
60+
** {CL_MEM_DEVICE_ADDRESS_EXT}
61+
* {cl_kernel_exec_info_TYPE}
62+
** {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT}
63+
64+
=== Version History
65+
66+
[cols="5,15,15,70"]
67+
[grid="rows"]
68+
[options="header"]
69+
|====
70+
| *Version* | *Date* | *Author* | *Changes*
71+
| 0.9.0 | 2024-12-06 | Pekka Jääskeläinen, Kevin Petit |
72+
Integrated to the main unified specification.
73+
Moved the functionality to clCreateBufferWithProperties,
74+
thus requiring 3.0+. Single memobj query for fetching the
75+
address(es). Also other smaller improvements pointed by Kevin.
76+
Candidate for final 1.0.0.
77+
| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst |
78+
Made the allocation flags independent from each other and
79+
renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and
80+
CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the
81+
same address across all devices in the context, whereas the latter
82+
allows per-device addresses.
83+
| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst |
84+
Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device
85+
cases "all", not "any", covering a case where not all devices
86+
can ensure the same address across the context. In that case
87+
CL_INVALID_VALUE can be returned. Defined sub-buffer address
88+
computation to be 'base_addr + origin'. Added error conditions
89+
for clSetKernelExecInfo when the device doesn't support
90+
device pointers.
91+
| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback.
92+
This version describes the first API version that was prototyped
93+
in PoCL and RustiCL using temporary placeholder flag/enum values.
94+
The PoCL implementation and initial discussion on the extension
95+
can be found https://github.com/pocl/pocl/pull/1441[in this PR].
96+
|====

api/opencl_runtime_layer.asciidoc

Lines changed: 126 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -595,6 +595,39 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_HANDLE_LIST_KHR.asciidoc[]
595595
{CL_MEM_DEVICE_HANDLE_LIST_END_KHR_anchor}) to associate with the
596596
external memory handle.
597597
endif::cl_khr_external_memory[]
598+
599+
ifdef::cl_ext_buffer_device_address[]
600+
601+
| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor}
602+
603+
include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[]
604+
| {cl_bool_TYPE}
605+
| When set to CL_TRUE, specifies that the buffer must have a single fixed
606+
device-side address for its lifetime, and the address can be queried via {clGetMemObjectInfo}.
607+
608+
Each device in the context can have their own (fixed) device-side address and
609+
a copy of the created buffer which are synchronized
610+
implicitly by the runtime.
611+
612+
The flag might imply that the buffer will be "pinned" permanently to
613+
a device's memory, but might not be necessarily so, as long as the address
614+
range of the buffer remains constant.
615+
616+
The device addresses of sub-buffers derived from {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
617+
allocated buffers can be computed by adding the sub-buffer origin to the
618+
device-specific start address.
619+
620+
| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor}
621+
622+
include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[]
623+
| {cl_bool_TYPE}
624+
| When set to CL_TRUE, the buffer has otherwise the same properties as
625+
when allocated using the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} flag,
626+
but with an additional property that the buffer's address is the same across
627+
all the devices in the context.
628+
629+
endif::cl_ext_buffer_device_address[]
630+
598631
|====
599632

600633
ifdef::cl_khr_external_memory[]
@@ -662,6 +695,15 @@ ifdef::cl_khr_external_memory[]
662695
{CL_MEM_DEVICE_HANDLE_LIST_KHR} is specified as part of _properties_.
663696
** if _properties_ includes more than one external memory handle.
664697
endif::cl_khr_external_memory[]
698+
ifdef::cl_ext_buffer_device_address[]
699+
* {CL_INVALID_DEVICE}
700+
** If _properties_ includes either {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} or
701+
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there is at least one device in
702+
the context that doesn't support such allocation.
703+
* {CL_INVALID_VALUE}
704+
** If _properties_ includes both {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} and
705+
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} at the same time.
706+
endif::cl_ext_buffer_device_address[]
665707

666708
[[memory-flags-table]]
667709
.List of supported memory flag values
@@ -6185,6 +6227,21 @@ include::{generated}/api/version-notes/CL_MEM_D3D11_RESOURCE_KHR.asciidoc[]
61856227
returns the _resource_ argument specified when _memobj_ was created.
61866228
endif::cl_khr_d3d11_sharing[]
61876229

6230+
ifdef::cl_ext_buffer_device_address[]
6231+
| {CL_MEM_DEVICE_ADDRESS_EXT_anchor}
6232+
6233+
include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
6234+
| {cl_mem_device_address_EXT_TYPE}[]
6235+
| If _memobj_ was created using {clCreateBufferWithProperties} with
6236+
the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to CL_TRUE,
6237+
returns a list of device addresses for the buffer, one for each
6238+
device in the context. If the buffer was allocated
6239+
with the {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} property,
6240+
only one device address is returned.
6241+
6242+
endif::cl_ext_buffer_device_address[]
6243+
6244+
61886245
|====
61896246

61906247
// refError
@@ -6199,6 +6256,12 @@ Otherwise, it returns one of the following errors:
61996256
the return type specified in the
62006257
<<mem-info-table, Memory Object Queries>> table
62016258
and _param_value_ is not `NULL`.
6259+
ifdef::cl_ext_buffer_device_address[]
6260+
** Returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if
6261+
the {cl_ext_buffer_device_address_EXT} is not supported or if the
6262+
buffer was not allocated with neither {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or
6263+
{CL_MEM_DEVICE_SHARED_ADDRESS_EXT}.
6264+
endif::cl_ext_buffer_device_address[]
62026265
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
62036266
by the OpenCL implementation on the device.
62046267
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
@@ -10500,6 +10563,48 @@ Otherwise, it returns one of the following errors:
1050010563
required by the OpenCL implementation on the host.
1050110564
--
1050210565

10566+
ifdef::cl_ext_buffer_device_address[]
10567+
[open,refpage='clSetKernelArgDevicePointerEXT',desc='Set a device pointer as the argument value for a specific argument of a kernel.',type='protos']
10568+
--
10569+
To set a device pointer as the argument value for a specific argument of a
10570+
kernel, call the function
10571+
10572+
include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
10573+
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]
10574+
10575+
* _kernel_ is a valid kernel object.
10576+
* _arg_index_ is the argument index.
10577+
Arguments to the kernel are referred by indices that go from 0 for the
10578+
leftmost argument to _n_ - 1, where _n_ is the total number of arguments
10579+
declared by a kernel.
10580+
* _arg_value_ is the device pointer that should be used as the argument value for
10581+
argument specified by _arg_index_.
10582+
The device pointer specified is the value used by all API calls that enqueue
10583+
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument
10584+
value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_.
10585+
The device pointer can only be used for arguments that are declared to be a
10586+
pointer to `global` memory allocated with {clCreateBufferWithProperties} with
10587+
either the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
10588+
property. The pointer value specified as the argument value
10589+
can be the pointer to the beginning of the buffer or any offset into
10590+
the buffer region. The device pointer value must be naturally aligned according to
10591+
the argument's type.
10592+
10593+
{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the argument was set
10594+
successfully. Otherwise, it returns one of the following errors:
10595+
10596+
* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
10597+
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support
10598+
the device pointer.
10599+
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index.
10600+
* {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value.
10601+
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
10602+
by the OpenCL implementation on the device.
10603+
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
10604+
required by the OpenCL implementation on the host.
10605+
--
10606+
endif::cl_ext_buffer_device_address[]
10607+
1050310608
[open,refpage='clSetKernelExecInfo',desc='Set additional execution information for a kernel.',type='protos']
1050410609
--
1050510610
To set additional execution information for a kernel, call the function
@@ -10566,6 +10671,19 @@ include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
1056610671
If {clSetKernelExecInfo} has not been called with a value for
1056710672
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM}, the default value is
1056810673
{CL_TRUE}.
10674+
10675+
ifdef::cl_ext_buffer_device_address[]
10676+
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor}
10677+
10678+
include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[]
10679+
| {cl_mem_device_address_EXT_TYPE}[]
10680+
| Device pointers must reference locations contained entirely within
10681+
buffers that are passed to kernel as arguments, or that are passed
10682+
through the execution information. Non-argument device pointers accessed
10683+
by the kernel must be specified by passing pointers to those buffers
10684+
via this {clSetKernelExecInfo} option.
10685+
endif::cl_ext_buffer_device_address[]
10686+
1056910687
|====
1057010688

1057110689
// refError
@@ -10575,7 +10693,14 @@ successfully.
1057510693
Otherwise, it returns one of the following errors:
1057610694

1057710695
* {CL_INVALID_KERNEL} if _kernel_ is a not a valid kernel object.
10578-
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support SVM.
10696+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_SVM_PTRS} if no devices in
10697+
the context associated with _kernel_ support SVM.
10698+
ifdef::cl_ext_buffer_device_address[]
10699+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} if no
10700+
device in the context associated with _kernel_ support device pointers.
10701+
endif::cl_ext_buffer_device_address[]
10702+
* {CL_INVALID_VALUE} if _param_name_ is not valid, if _param_value_ is
10703+
`NULL` or if the size specified by _param_value_size_ is not valid.
1057910704
* {CL_INVALID_OPERATION} if _param_name_ is
1058010705
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} and _param_value_ is {CL_TRUE}
1058110706
and no devices in the context associated with _kernel_ support fine-grain

0 commit comments

Comments
 (0)