Skip to content

Commit 862a280

Browse files
committed
tidy up and resolve a few open issues
1 parent 6fb1716 commit 862a280

1 file changed

Lines changed: 35 additions & 24 deletions

File tree

extensions/cl_khr_unified_svm.asciidoc

Lines changed: 35 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -821,27 +821,30 @@ The initial version of this extension will only support allocating host memory.
821821
+
822822
--
823823
`RESOLVED`: The behavior is defined for all queries for this case.
824+
825+
This behavior is tested by the CTS test `unified_svm_api_query_defaults`.
824826
--
825827

826828
. Do we want separate "memset" APIs to set to different sized "value", such as 8-bits, 16-bits?, 32-bits, or others? Do we want to go back to a "fill" API?
827829
+
828830
--
829-
`RESOLVED`: We are reusing the "fill" API.
831+
`RESOLVED`: We are reusing the "fill" API {clEnqueueSVMMemFill}.
830832
--
831833

832-
. What are the restrictions for the _dst_ptr_ values that can be passed to the "fill" API?
834+
. What are the restrictions for the _dst_ptr_ values that can be passed to {clEnqueueSVMMemFill}?
833835
+
834836
--
835837
`RESOLVED`:
838+
However, we should still tidy up the spec text for these cases.
836839

837840
* Can a device "fill" another device's allocation? (Recommendation: Yes, but finalize as part of multi-device support.)
838841
* Can a device "fill" arbitrary host memory? (No, undefined behavior unless system SVM is supported.)
839842
* Can a device "fill" a USM allocation from another context? (No, undefined behavior.)
840843

841-
Note, there are no existing CTS tests that pass an arbitrary host allocation to {clEnqueueSVMMemFill}.
844+
Note, there are not any existing CTS tests that pass an arbitrary host allocation to {clEnqueueSVMMemFill}.
842845
--
843846

844-
. What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to the "memcpy" API?
847+
. What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to {clEnqueueSVMMemcpy}?
845848
+
846849
--
847850
`RESOLVED`:
@@ -853,6 +856,8 @@ Note, there are no existing CTS tests that pass an arbitrary host allocation to
853856
* Can a device "memcpy" from arbitrary host memory? (Yes, we already have tests.)
854857
* Can a device "memcpy" from arbitrary host memory to arbitrary host memory? (Yes, we already have tests.)
855858
* Can the memory region to copy to overlap the memory region to copy from? (No, already an error.)
859+
860+
The valid cases are tested by the CTS tests `unified_svm_memcpy` and `unified_svm_corner_case_memcpy`.
856861
--
857862

858863
. Do we want to support migrating to devices other than the device associated with _command_queue_?
@@ -877,7 +882,7 @@ The initial version of this extension will not extend {clEnqueueSVMMigrateMem},
877882
. Should we allow querying the associated device for a USM allocation using {clGetSVMPointerInfoKHR}?
878883
+
879884
--
880-
`RESOLVED`: Yes, we should.
885+
`RESOLVED`: Yes, we should, supported by {CL_SVM_INFO_ASSOCIATED_DEVICE_HANDLE_KHR}.
881886
--
882887

883888
. Should we add explicit mem alloc flags for `CACHED` and `UNCACHED`?
@@ -889,7 +894,7 @@ In a layered extension, we recommend adding cacheability properties instead of c
889894
The layered extension could add coarse `CACHED` and `UNCACHED` properties, or separate properties for host vs. device, or even separate properties for specific cache levels.
890895
--
891896

892-
. At least for HOST and SHARED allocations, should we have separate mem alloc flags for the host and the device?
897+
. At least for `HOST` and `SHARED` allocations, should we have separate mem alloc flags for the host and the device?
893898
+
894899
--
895900
`RESOLVED`: We removed the _flags_ argument entirely.
@@ -901,9 +906,7 @@ Specifically, is `NULL` a valid value for `ptr`?
901906
Is `size` equal to zero valid?
902907
+
903908
--
904-
*UNRESOLVED*:
905-
906-
Tentative resolution:
909+
`RESOLVED`:
907910

908911
.. A `size` equal to zero is valid.
909912
When `size` is zero, the call to {clEnqueueSVMMigrateMem}, {clEnqueueSVMMemFill}, and {clEnqueueSVMMemcpy} trivially succeeds, similar to an enqueued marker.
@@ -923,10 +926,12 @@ For reference, the full set of options we considered were:
923926

924927
.. A `size` equal to zero is valid. This appears to be the specified behavior for the C `memcpy` and `memset` functions.
925928
.. [.line-through]#A `size` equal to zero is undefined behavior.#
926-
.. A `size` equal to zero is an error.
929+
.. [.line-through]#A `size` equal to zero is an error.#
927930
.. A `ptr` equal to `NULL` is valid if and only if `size` is equal to zero, otherwise it is an error.
928931
.. [.line-through]#A `ptr` equal to `NULL` is undefined behavior. This appears to be the specified behavior for the C `memcpy` and `memset` functions.#
929-
.. A `ptr` equal to `NULL` is an error.
932+
.. [.line-through]#A `ptr` equal to `NULL` is an error.#
933+
934+
These cases are tested by the CTS test `unified_svm_corner_case_migrate_mem`.
930935
--
931936

932937
. Should we add a device query for a maximum supported SVM alignment, or should the maximum supported alignment implicitly be defined by the size of the largest data type supported by the device?
@@ -947,7 +952,8 @@ See internal merge request 198.
947952
+
948953
--
949954
`RESOLVED`:
950-
The initial version of this extension will not support larger fill patterns.
955+
The initial version of this extension will not support larger fill patterns, therefore the maximum supported fill pattern size will implicitly be defined by the size of the largest data type supported by the device.
956+
Supporting larger fill patterns could be added as a layered extension.
951957
--
952958

953959
. Can a pointer to a device, host, or shared SVM allocation be used to create a {cl_mem_TYPE} using {CL_MEM_USE_HOST_PTR}?
@@ -1009,17 +1015,20 @@ One particular enhancement we may want to consider, though, is whether calling {
10091015
In the current specification, this is explicitly not a synchronization point.
10101016
However, in other APIs, querying the event status and observing that the event is complete is a synchronization point.
10111017
Should we adopt this behavior also, or do we want users to call {clWaitForEvents} to define a synchronization point?
1018+
See internal issue 373.
10121019
--
10131020

10141021
. Should it be an error to set an unknown pointer as a kernel argument using {clSetKernelArgSVMPointer} if no devices support shared system allocations?
10151022
+
10161023
--
1017-
*UNRESOLVED*:
1018-
Returning an error for an unknown pointer is helpful to identify and diagnose possible programming errors sooner, but passing a pointer to arbitrary memory to a function on the host is not an error until the pointer is dereferenced.
1024+
`RESOLVED`:
1025+
It is not an error to set an unknown pointer as a kernel argument using {clSetKernelArgSVMPointer}.
1026+
This behavior matches passing a pointer to arbitrary memory to a function on the host, where it is not an error until the pointer is dereferenced.
1027+
Similarly, it is not an error to pass an unknown pointer via {clSetKernelExecInfo}({CL_KERNEL_EXEC_INFO_SVM_PTRS}).
10191028

1020-
If we relax the error condition for {clSetKernelArgSVMPointer} then we could also consider relaxing the error condition for {clSetKernelExecInfo}({CL_KERNEL_EXEC_INFO_SVM_PTRS}) similarly.
1029+
Note that we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer].
10211030

1022-
Note that if the error condition is removed we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer].
1031+
These cases are tested by the CTS tests `unified_svm_corner_case_set_kernel_arg` and `unified_svm_corner_case_set_kernel_exec_info`.
10231032
--
10241033

10251034
. Should we support a "rect" or "2D" memcpy similar to {clEnqueueCopyBufferRect}?
@@ -1046,22 +1055,22 @@ However, for host allocations, some implementations are able to support larger a
10461055

10471056
Possible resolutions:
10481057

1049-
* Add a new query representing the maximum host memory allocation size supported by the device, e.g. `CL_DEVICE_MAX_HOST_MEM_ALLOC_SIZE_KHR`.
1050-
For some devices, this query will return the same value as {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, but for other devices this query will return a larger value.
1058+
* Add a new query representing the maximum device-owned and host-owned memory allocation sizes supported by the device, e.g. `CL_DEVICE_MAX_DEVICE_OWNED_MEM_ALLOC_SIZE_KHR` and `CL_DEVICE_MAX_HOST_OWNED_MEM_ALLOC_SIZE_KHR`.
1059+
For some devices, these queries will return the same value as {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, but for other devices the queries will return a larger value.
1060+
For SVM memory types that are not device-owned or host-owned, the existing limits will continue to apply.
10511061
* Relax the error behavior so implementations may return {CL_INVALID_BUFFER_SIZE}, but they would not be required to return an error if they support larger allocation sizes.
10521062
* Do nothing and keep the existing error behavior.
10531063
--
10541064

10551065
. Should it be an error to allocate zero bytes?
10561066
+
10571067
--
1058-
*UNRESOLVED*:
1059-
1060-
Tentative resolution: Allow zero-sized allocations and require returning a `NULL` pointer.
1068+
`RESOLVED`:
1069+
We will allow zero-sized allocations and require returning a `NULL` pointer.
10611070
This is considered a successful operation and no error will be returned.
10621071

10631072
We evaluated many scenarios and determined that there is no clearly correct behavior.
1064-
The scenarios we evaluated were:
1073+
For reference, the scenarios we evaluated were:
10651074

10661075
* For OpenCL 2.0 SVM, {clSVMAlloc} with a size of zero is specified to return a `NULL` pointer.
10671076
Because {clSVMAlloc} has no mechanism to return an error code, it is unspecified whether this is considered an error.
@@ -1072,15 +1081,17 @@ If a `NULL` pointer is returned then `errno` may be set to an implementation-def
10721081
If a unique non-null pointer is returned then it cannot be dereferenced.
10731082
* Allocating an array of zero elements using `new` must return a non-null pointer, though dereferencing the pointer is undefined.
10741083

1075-
For reference, the full set of options we considered were:
1084+
Also for reference, the full set of options we considered were:
10761085

10771086
.. [.line-through]#Allow zero-sized allocations and require returning a non-null pointer that must be freed.#
10781087
.. Allow zero-sized allocations and require returning a `NULL` pointer.
10791088
No error will be generated.
10801089
Note, it is not currently an error to free a `NULL` pointer.
10811090
.. [.line-through]#Allow zero-sized allocations but allow returning a `NULL` pointer. No error would be generated, even if a `NULL` pointer is returned.#
10821091
.. [.line-through]#Specify that this case is implementation-defined.#
1083-
.. Specify that this case is an error.
1092+
.. [.line-through]#Specify that this case is an error.#
1093+
1094+
This case is tested by the CTS test `unified_svm_corner_case_alloc_free`.
10841095
--
10851096

10861097
Note: The following issues were added to the KHR USM extension:

0 commit comments

Comments
 (0)