Specification Version
SYCL 2020 (Revision 11)
Section Number(s)
Section 4.9.1.2. "nd_range class"
Section 4.9.4.2. "SYCL functions for invoking kernels"
Issue Description
The specification of nd_range::get_group_range says:
Return a range representing the number of groups in each dimension. This range would result from globalSize/localSize as provided on construction.
However, nothing prevents localSize from being zero. I think we should simply say that the behavior of this function is undefined when localSize is zero.
The specification of the parallel_for overloads that take an nd_range are a little vague about whether the localSize of the nd_range can be zero. They do say:
Throws an exception with the errc::nd_range error code if the global size defined in the associated executionRange defines a non-zero index space which is not evenly divisible by the local size in each dimension.
Is a non-zero value "evenly divisible" by zero? If not, then the statement above indicates that parallel_for should throw an exception when the local size is zero. If this is our intent, though, it would be better to say this explicitly.
Somewhat related ... OpenCL has a way to launch an nd-range kernel with no specified local range. In this case, the driver picks a default local range. DPC++ also has an extension for this purpose sycl_ext_oneapi_auto_local_range. We could change the SYCL spec to say that specifying a local size of zero means that the implementation must choose a default local size. However, I think it might be better to introduce a new type for this instead, as the DPC++ extension does.
Code Example (Optional)
No response
Specification Version
SYCL 2020 (Revision 11)
Section Number(s)
Section 4.9.1.2. "nd_range class"
Section 4.9.4.2. "SYCL functions for invoking kernels"
Issue Description
The specification of
nd_range::get_group_rangesays:However, nothing prevents
localSizefrom being zero. I think we should simply say that the behavior of this function is undefined whenlocalSizeis zero.The specification of the
parallel_foroverloads that take annd_rangeare a little vague about whether thelocalSizeof thend_rangecan be zero. They do say:Is a non-zero value "evenly divisible" by zero? If not, then the statement above indicates that
parallel_forshould throw an exception when the local size is zero. If this is our intent, though, it would be better to say this explicitly.Somewhat related ... OpenCL has a way to launch an nd-range kernel with no specified local range. In this case, the driver picks a default local range. DPC++ also has an extension for this purpose sycl_ext_oneapi_auto_local_range. We could change the SYCL spec to say that specifying a local size of zero means that the implementation must choose a default local size. However, I think it might be better to introduce a new type for this instead, as the DPC++ extension does.
Code Example (Optional)
No response