You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: extensions/cl_khr_unified_svm.asciidoc
+35-24Lines changed: 35 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -821,27 +821,30 @@ The initial version of this extension will only support allocating host memory.
821
821
+
822
822
--
823
823
`RESOLVED`: The behavior is defined for all queries for this case.
824
+
825
+
This behavior is tested by the CTS test `unified_svm_api_query_defaults`.
824
826
--
825
827
826
828
. Do we want separate "memset" APIs to set to different sized "value", such as 8-bits, 16-bits?, 32-bits, or others? Do we want to go back to a "fill" API?
827
829
+
828
830
--
829
-
`RESOLVED`: We are reusing the "fill" API.
831
+
`RESOLVED`: We are reusing the "fill" API {clEnqueueSVMMemFill}.
830
832
--
831
833
832
-
. What are the restrictions for the _dst_ptr_ values that can be passed to the "fill" API?
834
+
. What are the restrictions for the _dst_ptr_ values that can be passed to {clEnqueueSVMMemFill}?
833
835
+
834
836
--
835
837
`RESOLVED`:
838
+
However, we should still tidy up the spec text for these cases.
836
839
837
840
* Can a device "fill" another device's allocation? (Recommendation: Yes, but finalize as part of multi-device support.)
838
841
* Can a device "fill" arbitrary host memory? (No, undefined behavior unless system SVM is supported.)
839
842
* Can a device "fill" a USM allocation from another context? (No, undefined behavior.)
840
843
841
-
Note, there are no existing CTS tests that pass an arbitrary host allocation to {clEnqueueSVMMemFill}.
844
+
Note, there are not any existing CTS tests that pass an arbitrary host allocation to {clEnqueueSVMMemFill}.
842
845
--
843
846
844
-
. What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to the "memcpy" API?
847
+
. What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to {clEnqueueSVMMemcpy}?
845
848
+
846
849
--
847
850
`RESOLVED`:
@@ -853,6 +856,8 @@ Note, there are no existing CTS tests that pass an arbitrary host allocation to
853
856
* Can a device "memcpy" from arbitrary host memory? (Yes, we already have tests.)
854
857
* Can a device "memcpy" from arbitrary host memory to arbitrary host memory? (Yes, we already have tests.)
855
858
* Can the memory region to copy to overlap the memory region to copy from? (No, already an error.)
859
+
860
+
The valid cases are tested by the CTS tests `unified_svm_memcpy` and `unified_svm_corner_case_memcpy`.
856
861
--
857
862
858
863
. Do we want to support migrating to devices other than the device associated with _command_queue_?
@@ -877,7 +882,7 @@ The initial version of this extension will not extend {clEnqueueSVMMigrateMem},
877
882
. Should we allow querying the associated device for a USM allocation using {clGetSVMPointerInfoKHR}?
878
883
+
879
884
--
880
-
`RESOLVED`: Yes, we should.
885
+
`RESOLVED`: Yes, we should, supported by {CL_SVM_INFO_ASSOCIATED_DEVICE_HANDLE_KHR}.
881
886
--
882
887
883
888
. Should we add explicit mem alloc flags for `CACHED` and `UNCACHED`?
@@ -889,7 +894,7 @@ In a layered extension, we recommend adding cacheability properties instead of c
889
894
The layered extension could add coarse `CACHED` and `UNCACHED` properties, or separate properties for host vs. device, or even separate properties for specific cache levels.
890
895
--
891
896
892
-
. At least for HOST and SHARED allocations, should we have separate mem alloc flags for the host and the device?
897
+
. At least for `HOST` and `SHARED` allocations, should we have separate mem alloc flags for the host and the device?
893
898
+
894
899
--
895
900
`RESOLVED`: We removed the _flags_ argument entirely.
@@ -901,9 +906,7 @@ Specifically, is `NULL` a valid value for `ptr`?
901
906
Is `size` equal to zero valid?
902
907
+
903
908
--
904
-
*UNRESOLVED*:
905
-
906
-
Tentative resolution:
909
+
`RESOLVED`:
907
910
908
911
.. A `size` equal to zero is valid.
909
912
When `size` is zero, the call to {clEnqueueSVMMigrateMem}, {clEnqueueSVMMemFill}, and {clEnqueueSVMMemcpy} trivially succeeds, similar to an enqueued marker.
@@ -923,10 +926,12 @@ For reference, the full set of options we considered were:
923
926
924
927
.. A `size` equal to zero is valid. This appears to be the specified behavior for the C `memcpy` and `memset` functions.
925
928
.. [.line-through]#A `size` equal to zero is undefined behavior.#
926
-
.. A `size` equal to zero is an error.
929
+
.. [.line-through]#A `size` equal to zero is an error.#
927
930
.. A `ptr` equal to `NULL` is valid if and only if `size` is equal to zero, otherwise it is an error.
928
931
.. [.line-through]#A `ptr` equal to `NULL` is undefined behavior. This appears to be the specified behavior for the C `memcpy` and `memset` functions.#
929
-
.. A `ptr` equal to `NULL` is an error.
932
+
.. [.line-through]#A `ptr` equal to `NULL` is an error.#
933
+
934
+
These cases are tested by the CTS test `unified_svm_corner_case_migrate_mem`.
930
935
--
931
936
932
937
. Should we add a device query for a maximum supported SVM alignment, or should the maximum supported alignment implicitly be defined by the size of the largest data type supported by the device?
@@ -947,7 +952,8 @@ See internal merge request 198.
947
952
+
948
953
--
949
954
`RESOLVED`:
950
-
The initial version of this extension will not support larger fill patterns.
955
+
The initial version of this extension will not support larger fill patterns, therefore the maximum supported fill pattern size will implicitly be defined by the size of the largest data type supported by the device.
956
+
Supporting larger fill patterns could be added as a layered extension.
951
957
--
952
958
953
959
. Can a pointer to a device, host, or shared SVM allocation be used to create a {cl_mem_TYPE} using {CL_MEM_USE_HOST_PTR}?
@@ -1009,17 +1015,20 @@ One particular enhancement we may want to consider, though, is whether calling {
1009
1015
In the current specification, this is explicitly not a synchronization point.
1010
1016
However, in other APIs, querying the event status and observing that the event is complete is a synchronization point.
1011
1017
Should we adopt this behavior also, or do we want users to call {clWaitForEvents} to define a synchronization point?
1018
+
See internal issue 373.
1012
1019
--
1013
1020
1014
1021
. Should it be an error to set an unknown pointer as a kernel argument using {clSetKernelArgSVMPointer} if no devices support shared system allocations?
1015
1022
+
1016
1023
--
1017
-
*UNRESOLVED*:
1018
-
Returning an error for an unknown pointer is helpful to identify and diagnose possible programming errors sooner, but passing a pointer to arbitrary memory to a function on the host is not an error until the pointer is dereferenced.
1024
+
`RESOLVED`:
1025
+
It is not an error to set an unknown pointer as a kernel argument using {clSetKernelArgSVMPointer}.
1026
+
This behavior matches passing a pointer to arbitrary memory to a function on the host, where it is not an error until the pointer is dereferenced.
1027
+
Similarly, it is not an error to pass an unknown pointer via {clSetKernelExecInfo}({CL_KERNEL_EXEC_INFO_SVM_PTRS}).
1019
1028
1020
-
If we relax the error condition for {clSetKernelArgSVMPointer} then we could also consider relaxing the error condition for {clSetKernelExecInfo}({CL_KERNEL_EXEC_INFO_SVM_PTRS}) similarly.
1029
+
Note that we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer].
1021
1030
1022
-
Note that if the error condition is removed we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer].
1031
+
These cases are tested by the CTS tests `unified_svm_corner_case_set_kernel_arg` and `unified_svm_corner_case_set_kernel_exec_info`.
1023
1032
--
1024
1033
1025
1034
. Should we support a "rect" or "2D" memcpy similar to {clEnqueueCopyBufferRect}?
@@ -1046,22 +1055,22 @@ However, for host allocations, some implementations are able to support larger a
1046
1055
1047
1056
Possible resolutions:
1048
1057
1049
-
* Add a new query representing the maximum host memory allocation size supported by the device, e.g. `CL_DEVICE_MAX_HOST_MEM_ALLOC_SIZE_KHR`.
1050
-
For some devices, this query will return the same value as {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, but for other devices this query will return a larger value.
1058
+
* Add a new query representing the maximum device-owned and host-owned memory allocation sizes supported by the device, e.g. `CL_DEVICE_MAX_DEVICE_OWNED_MEM_ALLOC_SIZE_KHR` and `CL_DEVICE_MAX_HOST_OWNED_MEM_ALLOC_SIZE_KHR`.
1059
+
For some devices, these queries will return the same value as {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, but for other devices the queries will return a larger value.
1060
+
For SVM memory types that are not device-owned or host-owned, the existing limits will continue to apply.
1051
1061
* Relax the error behavior so implementations may return {CL_INVALID_BUFFER_SIZE}, but they would not be required to return an error if they support larger allocation sizes.
1052
1062
* Do nothing and keep the existing error behavior.
1053
1063
--
1054
1064
1055
1065
. Should it be an error to allocate zero bytes?
1056
1066
+
1057
1067
--
1058
-
*UNRESOLVED*:
1059
-
1060
-
Tentative resolution: Allow zero-sized allocations and require returning a `NULL` pointer.
1068
+
`RESOLVED`:
1069
+
We will allow zero-sized allocations and require returning a `NULL` pointer.
1061
1070
This is considered a successful operation and no error will be returned.
1062
1071
1063
1072
We evaluated many scenarios and determined that there is no clearly correct behavior.
1064
-
The scenarios we evaluated were:
1073
+
For reference, the scenarios we evaluated were:
1065
1074
1066
1075
* For OpenCL 2.0 SVM, {clSVMAlloc} with a size of zero is specified to return a `NULL` pointer.
1067
1076
Because {clSVMAlloc} has no mechanism to return an error code, it is unspecified whether this is considered an error.
@@ -1072,15 +1081,17 @@ If a `NULL` pointer is returned then `errno` may be set to an implementation-def
1072
1081
If a unique non-null pointer is returned then it cannot be dereferenced.
1073
1082
* Allocating an array of zero elements using `new` must return a non-null pointer, though dereferencing the pointer is undefined.
1074
1083
1075
-
For reference, the full set of options we considered were:
1084
+
Also for reference, the full set of options we considered were:
1076
1085
1077
1086
.. [.line-through]#Allow zero-sized allocations and require returning a non-null pointer that must be freed.#
1078
1087
.. Allow zero-sized allocations and require returning a `NULL` pointer.
1079
1088
No error will be generated.
1080
1089
Note, it is not currently an error to free a `NULL` pointer.
1081
1090
.. [.line-through]#Allow zero-sized allocations but allow returning a `NULL` pointer. No error would be generated, even if a `NULL` pointer is returned.#
1082
1091
.. [.line-through]#Specify that this case is implementation-defined.#
1083
-
.. Specify that this case is an error.
1092
+
.. [.line-through]#Specify that this case is an error.#
1093
+
1094
+
This case is tested by the CTS test `unified_svm_corner_case_alloc_free`.
1084
1095
--
1085
1096
1086
1097
Note: The following issues were added to the KHR USM extension:
0 commit comments