Skip to content

[KHR Ext?] Execution location queries #966

@tdavidcl

Description

@tdavidcl

Extension Type

New Feature

Feature Description

Hi,

With this issue, I wanted to suggest the addition of a standardized extension to query where the group, sub-group, or work item is executed on the device. For now I will refer to it as "core" since it is not sure which terminology would be correct and apply to all vendors. Here is a very rough idea:

namespace sycl::khr {

     enum class aspect : /*unspecified*/ {
       khr_execution_location_lane,
       khr_execution_location_core,
     };

     enum class execution_locations : /* unspecified */ {
         core_lane, // lane id with the unit execution the work-group/sub-group
         device_core // core id executing the current workgroup
     };

     template <execution_locations location> uint32_t execution_location_id();
}

It is a feature already in use and tested within Shamrock: Github link.

My first questions would be to know if there is already a standard definition we can relate to and if there already a standard way to split the levels ? For the levels we could attach to the SYCL terminology wit'h something like execution_location_id<work_item>, execution_location_id<sub_group>, ...

In terms of constraints, there are no hard ones to enforce on the implementation since this extension is mostly informative. Besides that, the range of values that can be returned could be queried through an API (device property?). Also, similar instructions exist to query the number of lanes/SMs live within the kernel, which could very well be included in the extension.

I forgot to mention that there is a OpenGL extension for this even though it is a Nvidia one and thus has Nvidia specific terminology https://registry.khronos.org/OpenGL/extensions/NV/NV_shader_thread_group.txt

Related Functionality in C++

Not really c++ but available in c++ nevertheless 😅
Local APIC ID in the returned data from the CPUID x86 opcode

Related Functionality in Other Languages

For the current "core" executing the call.

  • %smid in PTX source
  • __smid() in HIP source
  • intel_get_eu_id() source
  • Local APIC ID in the CPUID x86 opcode

For the current lane within the core:

  • %laneid in PTX source
  • __laneid() in HIP source
  • intel_get_eu_thread_id() ? source
  • 0 for CPU since there is one level unless the second level is local to the numa and the first one not

Related SYCL Extensions

I do not know of an existing similar extension in SYCL (in both DPC++ and AdaptiveCpp).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions