Skip to content

Commit ceb9e19

Browse files
authored
[KHRGA-80] Updated DA Host allocation and memory sharing (#104)
* Updating DA Host alloc and mem sharing 1 Issue: KHRGA-80
1 parent ccdb0ea commit ceb9e19

3 files changed

Lines changed: 253 additions & 2 deletions

File tree

source/conf.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,12 @@ def make_ref(ref_str, ref_view, ref_sufix):
201201
"#sec:accessor.host.buffer.conversions",
202202
)
203203
+ make_ref("SYCL_SYNC_PRIMITIVES", "Section 4.7.5", "#subsec:mutex")
204+
+ make_ref("SYCL_SPEC_HOST_ALLOC", "Section 4.7.1", "#_host_allocation")
205+
+ make_ref(
206+
"SYCL_SPEC_HOST_MEM_SHARING",
207+
"Section 4.7.4",
208+
"#sec:sharing-host-memory-with-dm",
209+
)
204210
+ make_ref(
205211
"SYCL_SYNC_PARALLEL_FOR_HIERARCHICAL",
206212
"Section 4.7.5",

source/iface/host-allocation.rst

Lines changed: 245 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,251 @@
22
Copyright 2023 The Khronos Group Inc.
33
SPDX-License-Identifier: CC-BY-4.0
44
5+
***********************************
6+
Host allocations and memory sharing
7+
***********************************
8+
59
.. _host-allocations:
610

7-
****************
11+
================
812
Host allocations
9-
****************
13+
================
14+
15+
A SYCL runtime may need to allocate temporary objects
16+
on the host to handle some operations (such as copying
17+
data from one context to another).
18+
19+
Allocation on the host is managed using an allocator
20+
object, following the standard C++ allocator class
21+
definition.
22+
23+
The default allocator for memory objects is
24+
implementation-defined, but the user can supply
25+
their own allocator class by passing it as
26+
template parameter. For example, this is how to provide
27+
user-defined allocator class to :ref:`buffer`:
28+
29+
::
30+
31+
{
32+
sycl::buffer<int, 1, UserDefinedAllocator<int>> b(d);
33+
}
34+
35+
When an allocator returns a ``nullptr``, the runtime
36+
cannot allocate data on the host. Note that in this case
37+
the runtime will raise an error if it requires host memory
38+
but it is not available (e.g when moving data across SYCL
39+
backend contexts).
40+
41+
In some cases, the implementation may retain a copy of
42+
the allocator object even after the buffer is destroyed.
43+
For example, this can happen when the buffer object is
44+
destroyed before commands using accessors to the buffer
45+
have completed. Therefore, the application must be
46+
prepared for calls to the allocator even after the buffer
47+
is destroyed.
48+
49+
.. note::
50+
51+
If the application needs to know when the implementation
52+
has destroyed all copies of the allocator, it can maintain
53+
a reference count within the allocator.
54+
55+
The definition of allocators extends the current functionality
56+
of SYCL, ensuring that users can define allocator functions for
57+
specific hardware or certain complex shared memory mechanisms,
58+
and improves interoperability with STL-based libraries.
59+
60+
.. seealso:: |SYCL_SPEC_HOST_ALLOC|
61+
62+
Default allocators
63+
==================
64+
65+
A default allocator is always defined by the implementation.
66+
For allocations greater than size zero, when successful
67+
it is guaranteed to return non-``nullptr`` and new memory
68+
positions every call.
69+
70+
The default allocator for ``const`` buffers will remove the
71+
constantness of the type (therefore, the default allocator
72+
for a buffer of type ``const int`` will be an ``Allocator<int>``).
73+
This implies that host accessors will not synchronize with the
74+
pointer given by the user in the :ref:`buffer`/:ref:`image <iface-images>`
75+
constructor, but will use the memory returned by the ``Allocator``
76+
itself for that purpose.
77+
78+
The user can implement an allocator that returns the same address
79+
as the one passed in the buffer constructor, but it is the
80+
responsibility of the user to handle the potential race conditions.
81+
82+
The list of SYCL default allocators:
83+
84+
.. list-table::
85+
:header-rows: 1
86+
87+
* - Allocator
88+
- Description
89+
* - ``template <class T> sycl::buffer_allocator``
90+
- It is the default :ref:`buffer` allocator used by the runtime,
91+
when no allocator is defined by the user.
92+
93+
Meets the C++ named requirement Allocator.
94+
95+
A buffer of data type ``const T`` uses ``buffer_allocator<T>``
96+
by default.
97+
* - ``sycl::image_allocator``
98+
- It is the default allocator used by the runtime for the SYCL :ref:`unsampled_image`
99+
and :ref:`sampled_image` classes when no allocator is provided by the user.
100+
101+
The ``sycl::image_allocator`` is required to allocate in elements of ``std::byte``.
102+
103+
104+
.. _host_memory_sharing:
105+
106+
=========================================================
107+
Sharing host memory with the SYCL data management classes
108+
=========================================================
109+
110+
In order to allow the SYCL runtime to do memory management
111+
and allow for data dependencies, there are :ref:`buffer`
112+
and :ref:`image <iface-images>` classes defined.
113+
114+
The default behavior for them is that if a “raw” pointer
115+
is given during the construction of the data management
116+
class, then full ownership to use it is given to the SYCL
117+
runtime until the destruction of the SYCL object.
118+
119+
Below you can find details on sharing or explicitly not
120+
sharing host memory with the SYCL data classes. The same
121+
rules apply to :ref:`images <iface-images>` as well.
122+
123+
.. seealso:: |SYCL_SPEC_HOST_MEM_SHARING|
124+
125+
126+
Default behavior
127+
================
128+
129+
When using a :ref:`buffer`, the ownership of a pointer
130+
passed to the constructor of the class is, by default,
131+
passed to SYCL runtime, and that pointer cannot be used
132+
on the host side until the :ref:`buffer` or
133+
:ref:`image <iface-images>` is destroyed.
134+
135+
A SYCL application can access the contents of the memory
136+
managed by a SYCL buffer by using a :ref:`host_accessor`.
137+
However, there is no guarantee that the
138+
host accessor synchronizes with the original host
139+
address used in its constructor.
140+
141+
The pointer passed in is the one used to copy data back
142+
to the host, if needed, before buffer destruction.
143+
The memory pointed to by the host pointer will not be
144+
deallocated by the runtime, and the data is copied back
145+
from the device to the host through the host pointer
146+
if there is a need for it.
147+
148+
149+
SYCL ownership of the host memory
150+
=================================
151+
152+
In the case where there is host memory to be used for
153+
initialization of data but there is no intention of using
154+
that host memory after the :ref:`buffer` is destroyed,
155+
then the :ref:`buffer` can take full ownership of that
156+
host memory.
157+
158+
When a :ref:`buffer` owns the host pointer there is no copy back,
159+
by default. To create this situation, the SYCL application may pass
160+
a unique pointer to the host data, which will be then used by the
161+
runtime internally to initialize the data in the device.
162+
163+
For example, the following could be used:
164+
165+
::
166+
167+
{
168+
auto ptr = std::make_unique<int>(-1234);
169+
buffer<int, 1> b { std::move(ptr), range { 1 } };
170+
// ptr is not valid anymore.
171+
// There is nowhere to copy data back
172+
}
173+
174+
However, optionally the ``sycl::buffer::set_final_data()`` can be
175+
set to an output iterator (including a “raw” pointer) or to a
176+
``std::weak_ptr`` to enable copying data back, to another host memory
177+
address that is going to be valid after :ref:`buffer` destruction.
178+
179+
::
180+
181+
{
182+
auto ptr = std::make_unique<int>(-42);
183+
buffer<int, 1> b { std::move(ptr), range { 1 } };
184+
// ptr is not valid anymore.
185+
// There is nowhere to copy data back.
186+
// To get copy back, a location can be specified:
187+
b.set_final_data(std::weak_ptr<int> { .... })
188+
}
189+
190+
191+
Shared SYCL ownership of the host memory
192+
========================================
193+
194+
When an instance of ``std::shared_ptr`` is passed to the
195+
:ref:`buffer` constructor, then the :ref:`buffer` object
196+
and the developer's application share the memory region.
197+
198+
Rules of shared ownership:
199+
200+
1. If the ``std::shared_ptr`` is not empty, the contents of the
201+
referenced memory are used to initialize the :ref:`buffer`.
202+
203+
If the ``std::shared_ptr`` is empty, then the :ref:`buffer`
204+
is created with uninitialized memory.
205+
206+
2. If the ``std::shared_ptr`` is still used on the application's
207+
side then the data will be copied back from the :ref:`buffer`
208+
or :ref:`image <iface-images>` and will be available to the
209+
application after the :ref:`buffer` or
210+
:ref:`image <iface-images>` object is destroyed.
211+
212+
3. When the :ref:`buffer` is destroyed and the data have
213+
potentially been updated, if the number of copies of
214+
the ``std::shared_ptr`` outside the runtime is 0,
215+
there is no user-side shared pointer to read the data.
216+
217+
Therefore the data is not copied out, and the :ref:`buffer`
218+
destructor does not need to wait for the data processes
219+
to be finished, as the outcome is not needed on the
220+
application's side.
221+
222+
Example of such behavior:
223+
224+
::
225+
226+
{
227+
std::shared_ptr<int> ptr { data };
228+
{
229+
buffer<int, 1> b { ptr, range<2>{ 10, 10 } };
230+
// update the data
231+
[...]
232+
} // Data is copied back because there is an user side shared_ptr
233+
}
234+
235+
::
236+
237+
{
238+
std::shared_ptr<int> ptr { data };
239+
{
240+
buffer<int, 1> b { ptr, range<2>{ 10, 10 } };
241+
// update the data
242+
[...]
243+
ptr.reset();
244+
} // Data is not copied back, there is no user side shared_ptr.
245+
}
246+
247+
This behavior can be overridden using the
248+
``sycl::buffer::set_final_data()`` member function of the
249+
:ref:`buffer` class, which will by any means force the
250+
:ref:`buffer` destructor to wait until the data is copied to
251+
wherever the ``set_final_data()`` member function has put the
252+
data (or not wait nor copy if set final data is ``nullptr``).

source/spelling_wordlist.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,4 +48,6 @@ linearizes
4848
copyable
4949
mutex
5050
constantness
51+
allocators
52+
STL
5153
conformant

0 commit comments

Comments
 (0)