Extension Type
New Feature
Feature Description
Hi,
With this issue, I wanted to suggest the addition of a standardized extension to query where the group, sub-group, or work item is executed on the device. For now I will refer to it as "core" since it is not sure which terminology would be correct and apply to all vendors. Here is a very rough idea:
namespace sycl::khr {
enum class aspect : /*unspecified*/ {
khr_execution_location_lane,
khr_execution_location_core,
};
enum class execution_locations : /* unspecified */ {
core_lane, // lane id with the unit execution the work-group/sub-group
device_core // core id executing the current workgroup
};
template <execution_locations location> uint32_t execution_location_id();
}
It is a feature already in use and tested within Shamrock: Github link.
My first questions would be to know if there is already a standard definition we can relate to and if there already a standard way to split the levels ? For the levels we could attach to the SYCL terminology wit'h something like execution_location_id<work_item>, execution_location_id<sub_group>, ...
In terms of constraints, there are no hard ones to enforce on the implementation since this extension is mostly informative. Besides that, the range of values that can be returned could be queried through an API (device property?). Also, similar instructions exist to query the number of lanes/SMs live within the kernel, which could very well be included in the extension.
I forgot to mention that there is a OpenGL extension for this even though it is a Nvidia one and thus has Nvidia specific terminology https://registry.khronos.org/OpenGL/extensions/NV/NV_shader_thread_group.txt
Related Functionality in C++
Not really c++ but available in c++ nevertheless 😅
Local APIC ID in the returned data from the CPUID x86 opcode
Related Functionality in Other Languages
For the current "core" executing the call.
%smid in PTX source
__smid() in HIP source
intel_get_eu_id() source
- Local APIC ID in the CPUID x86 opcode
For the current lane within the core:
%laneid in PTX source
__laneid() in HIP source
intel_get_eu_thread_id() ? source
- 0 for CPU since there is one level unless the second level is local to the numa and the first one not
Related SYCL Extensions
I do not know of an existing similar extension in SYCL (in both DPC++ and AdaptiveCpp).
Extension Type
New Feature
Feature Description
Hi,
With this issue, I wanted to suggest the addition of a standardized extension to query where the group, sub-group, or work item is executed on the device. For now I will refer to it as "core" since it is not sure which terminology would be correct and apply to all vendors. Here is a very rough idea:
It is a feature already in use and tested within Shamrock: Github link.
My first questions would be to know if there is already a standard definition we can relate to and if there already a standard way to split the levels ? For the levels we could attach to the SYCL terminology wit'h something like
execution_location_id<work_item>,execution_location_id<sub_group>, ...In terms of constraints, there are no hard ones to enforce on the implementation since this extension is mostly informative. Besides that, the range of values that can be returned could be queried through an API (device property?). Also, similar instructions exist to query the number of lanes/SMs live within the kernel, which could very well be included in the extension.
I forgot to mention that there is a OpenGL extension for this even though it is a Nvidia one and thus has Nvidia specific terminology https://registry.khronos.org/OpenGL/extensions/NV/NV_shader_thread_group.txt
Related Functionality in C++
Not really c++ but available in c++ nevertheless 😅
Local APIC IDin the returned data from theCPUIDx86 opcodeRelated Functionality in Other Languages
For the current "core" executing the call.
%smidin PTX source__smid()in HIP sourceintel_get_eu_id()sourceFor the current lane within the core:
%laneidin PTX source__laneid()in HIP sourceintel_get_eu_thread_id()? sourceRelated SYCL Extensions
I do not know of an existing similar extension in SYCL (in both DPC++ and AdaptiveCpp).