Skip to content

Latest commit

 

History

History
327 lines (282 loc) · 13.3 KB

File metadata and controls

327 lines (282 loc) · 13.3 KB

OpenCL Embedded Profile

The OpenCL specification describes the feature requirements for desktop platforms. This section describes the OpenCL embedded profile, which defines a subset of the OpenCL specification for handheld and embedded platforms. Optional OpenCL extensions may apply to both profiles.

The OpenCL embedded profile has the following restrictions until version 2.0 (i.e. the optionality described below was deprecated by version 2.0):

  1. Support for 3D images is optional.

    If {CL_DEVICE_IMAGE3D_MAX_WIDTH}, {CL_DEVICE_IMAGE3D_MAX_HEIGHT} and {CL_DEVICE_IMAGE3D_MAX_DEPTH} are zero, calls to {clCreateImage} or {clCreateImageWithProperties} will fail to create the 3D image, and the errcode_ret argument will return {CL_INVALID_OPERATION}. Declaring arguments of type image3d_t in a kernel will result in a compilation error.

    If {CL_DEVICE_IMAGE3D_MAX_WIDTH}, {CL_DEVICE_IMAGE3D_MAX_HEIGHT} and {CL_DEVICE_IMAGE3D_MAX_DEPTH} are greater than zero 0, calls to {clCreateImage} and {clCreateImageWithProperties} will behave as described for full profile implementations, and the image3d_t data type can be used in a kernel.

  2. Support for 2D image array writes is optional. If the cles_khr_2d_image_array_writes extension is supported by the embedded profile, writes to 2D image arrays are supported.

  3. Image and image arrays created with an image_channel_data_type value of {CL_FLOAT} or {CL_HALF_FLOAT} can only be used with samplers that use a filter mode of {CL_FILTER_NEAREST}. The values returned by read_imagef [1] for 2D and 3D images if image_channel_data_type value is {CL_FLOAT} or {CL_HALF_FLOAT} and sampler with filter_mode = {CL_FILTER_LINEAR} are undefined.

Furthermore, the OpenCL embedded profile has the following restrictions for all versions:

  1. 64 bit integers i.e. long, ulong including the appropriate vector data types and operations on 64-bit integers are optional. The cles_khr_int64 [2] extension string will be reported if the embedded profile implementation supports 64-bit integers. If double precision is supported i.e. {CL_DEVICE_DOUBLE_FP_CONFIG} is not zero, then cles_khr_int64 must also be supported.

  2. The mandated minimum single precision floating-point capability given by {CL_DEVICE_SINGLE_FP_CONFIG} is {CL_FP_ROUND_TO_ZERO} or {CL_FP_ROUND_TO_NEAREST}. If {CL_FP_ROUND_TO_NEAREST} is supported, the default rounding mode will be round to nearest even; otherwise the default rounding mode will be round to zero.

  3. The single precision floating-point operations (addition, subtraction and multiplication) shall be correctly rounded. Zero results may always be positive 0.0. The accuracy of division and sqrt are given in the OpenCL C and OpenCL SPIR-V Environment specifications.

    If {CL_FP_INF_NAN} is not set in {CL_DEVICE_SINGLE_FP_CONFIG}, and one of the operands or the result of addition, subtraction, multiplication or division would signal the overflow or invalid exception (see IEEE 754 specification), the value of the result is implementation-defined. Likewise, single precision comparison operators (<, >, <=, >=, ==, !=) return implementation-defined values when one or more operands is a NaN.

    In all cases, conversions (see the OpenCL C and OpenCL SPIR-V Environment specifications) shall be correctly rounded as described for the FULL_PROFILE, including those that consume or produce an INF or NaN. The built-in math functions shall behave as described for the FULL_PROFILE, including edge case behavior, but with slightly different accuracy rules. Edge case behavior and accuracy rules are described in the OpenCL C and OpenCL SPIR-V Environment specifications.

    Note

    If addition, subtraction and multiplication have default round to zero rounding mode, then fract, fma and fdim shall produce the correctly rounded result for round to zero rounding mode.

    This relaxation of the requirement to adhere to IEEE 754 requirements for basic floating-point operations, though extremely undesirable, is to provide flexibility for embedded devices that have lot stricter requirements on hardware area budgets.

  4. Denormalized numbers for the half data type which may be generated when converting a float to a half using variants of the vstore_half function or when converting from a half to a float using variants of the vload_half function can be flushed to zero. The OpenCL SPIR-V Environment Specification for details.

  5. The precision of conversions from {CL_UNORM_INT8}, {CL_SNORM_INT8}, {CL_UNORM_INT16}, {CL_SNORM_INT16}, {CL_UNORM_INT_101010}, and {CL_UNORM_INT_101010_2} to float is {leq} 2 ulp for the embedded profile instead of {leq} 1.5 ulp as defined in the full profile. The exception cases described in the full profile and given below apply to the embedded profile.

    For {CL_UNORM_INT8}

    • 0 must convert to 0.0f and

    • 255 must convert to 1.0f

    For {CL_UNORM_INT16}

    • 0 must convert to 0.0f and

    • 65535 must convert to 1.0f

    For {CL_SNORM_INT8}

    • -128 and -127 must convert to -1.0f,

    • 0 must convert to 0.0f and

    • 127 must convert to 1.0f

    For {CL_SNORM_INT16}

    • -32768 and -32767 must convert to -1.0f,

    • 0 must convert to 0.0f and

    • 32767 must convert to 1.0f

    For {CL_UNORM_INT_101010}

    • 0 must convert to 0.0f and

    • 1023 must convert to 1.0f

    For {CL_UNORM_INT_101010_2}

    • 0 must convert to 0.0f and

    • 1023 must convert to 1.0f (for RGB)

    • 3 must convert to 1.0f (for A)

{CL_PLATFORM_PROFILE} defined in the OpenCL Platform Queries table will return the string EMBEDDED_PROFILE if the OpenCL implementation supports the embedded profile only.

The minimum maximum values specified in the OpenCL Device Queries table that have been modified for the OpenCL embedded profile are listed in the OpenCL Embedded Device Queries table.

Table 1. List of supported param_names by {clGetDeviceInfo} for embedded profile
Device Info Return Type Description

{CL_DEVICE_MAX_READ_IMAGE_ARGS}

{cl_uint_TYPE}

Max number of image objects arguments of a kernel declared with the read_only qualifier. The minimum value is 8 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_MAX_WRITE_IMAGE_ARGS}

{cl_uint_TYPE}

Max number of image objects arguments of a kernel declared with the write_only qualifier. The minimum value is 8 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS}

{cl_uint_TYPE}

Max number of image objects arguments of a kernel declared with the write_only or read_write qualifier. The minimum value is 8 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_IMAGE2D_MAX_WIDTH}

{size_t_TYPE}

Max width of 2D image in pixels. The minimum value is 2048 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_IMAGE2D_MAX_HEIGHT}

{size_t_TYPE}

Max height of 2D image in pixels. The minimum value is 2048 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_IMAGE3D_MAX_WIDTH}

{size_t_TYPE}

Max width of 3D image in pixels. The minimum value is 2048 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_IMAGE3D_MAX_HEIGHT}

{size_t_TYPE}

Max height of 3D image in pixels. The minimum value is 2048 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_IMAGE3D_MAX_DEPTH}

{size_t_TYPE}

Max depth of 3D image in pixels. The minimum value is 2048 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_IMAGE_MAX_BUFFER_SIZE}

{size_t_TYPE}

Max number of pixels for a 1D image created from a buffer object.

The minimum value is 2048 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_IMAGE_MAX_ARRAY_SIZE}

{size_t_TYPE}

Max number of images in a 1D or 2D image array.

The minimum value is 256 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_MAX_SAMPLERS}

{cl_uint_TYPE}

Maximum number of samplers that can be used in a kernel.

The minimum value is 8 if {CL_DEVICE_IMAGE_SUPPORT} is {CL_TRUE}, the value is 0 otherwise.

{CL_DEVICE_MAX_PARAMETER_SIZE}

{size_t_TYPE}

Max size in bytes of all arguments that can be passed to a kernel. The minimum value is 256 bytes.

A maximum of 255 arguments can be passed to a kernel.

{CL_DEVICE_SINGLE_FP_CONFIG}

{cl_device_fp_config_TYPE}

Describes single precision floating-point capability of the device. This is a bit-field that describes one or more of the following values:

{CL_FP_DENORM} - denorms are supported

{CL_FP_INF_NAN} - INF and quiet NaNs are supported.

{CL_FP_ROUND_TO_NEAREST} - round to nearest even rounding mode supported

{CL_FP_ROUND_TO_ZERO} - round to zero rounding mode supported

{CL_FP_ROUND_TO_INF} - round to positive and negative infinity rounding modes supported

{CL_FP_FMA} - the fma kernel builtin is implemented using hardware-accelerated fused multiply-add as defined in IEEE754-2008.

{CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT} - divide and sqrt are correctly rounded as defined by the IEEE754 specification.

{CL_FP_SOFT_FLOAT} - Basic floating-point operations (such as addition, subtraction, multiplication) are implemented in software.

The mandated minimum floating-point capability is: {CL_FP_ROUND_TO_ZERO} or {CL_FP_ROUND_TO_NEAREST}.

{CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE}

{cl_ulong_TYPE}

Max size in bytes of a constant buffer allocation. The minimum value is 1 KB.

{CL_DEVICE_MAX_CONSTANT_ARGS}

{cl_uint_TYPE}

Max number of arguments declared with the __constant qualifier in a kernel. The minimum value is 4.

{CL_DEVICE_LOCAL_MEM_SIZE}

{cl_ulong_TYPE}

Size of local memory arena in bytes. The minimum value is 1 KB.

{CL_DEVICE_COMPILER_AVAILABLE}

{cl_bool_TYPE}

Is {CL_TRUE} if a compiler is available to compile programs created from source or IL, or {CL_FALSE} if the implementation does not have a compiler available.

This can be {CL_FALSE} for devices supporting the embedded profile.

When an online compiler is not available, OpenCL programs may still be created from binaries or built-in kernels supported by the device.

{CL_DEVICE_LINKER_AVAILABLE}

{cl_bool_TYPE}

Is {CL_TRUE} if a linker is available to link compiled programs, or {CL_FALSE} if the implementation does not have a linker available.

This can be {CL_FALSE} for devices supporting the embedded profile, but must be {CL_TRUE} if {CL_DEVICE_COMPILER_AVAILABLE} is {CL_TRUE}.

{CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE}

{cl_uint_TYPE}

The max. size of the device queue in bytes. The minimum value is 64 KB for the embedded profile

{CL_DEVICE_PRINTF_BUFFER_SIZE}

{size_t_TYPE}

Maximum size in bytes of the internal buffer that holds the output of printf calls from a kernel. The minimum value for the EMBEDDED profile is 1 KB.

If {CL_DEVICE_IMAGE_SUPPORT} specified in the OpenCL Device Queries table is {CL_TRUE}, the values assigned to {CL_DEVICE_MAX_READ_IMAGE_ARGS}, {CL_DEVICE_MAX_WRITE_IMAGE_ARGS}, {CL_DEVICE_IMAGE2D_MAX_WIDTH}, {CL_DEVICE_IMAGE2D_MAX_HEIGHT}, {CL_DEVICE_IMAGE3D_MAX_WIDTH}, {CL_DEVICE_IMAGE3D_MAX_HEIGHT}, {CL_DEVICE_IMAGE3D_MAX_DEPTH}, and {CL_DEVICE_MAX_SAMPLERS} by the implementation must be greater than or equal to the minimum values specified in the OpenCL Embedded Device Queries table.

If {CL_DEVICE_IMAGE_SUPPORT} specified in the OpenCL Device Queries table is {CL_TRUE}, the minimum list of supported image formats for either reading or writing in a kernel for embedded profile devices is:

Table 2. Minimum list of supported image formats for reading or writing (embedded profile)
num_channels channel_order channel_data_type

4

{CL_RGBA}

{CL_UNORM_INT8}
{CL_UNORM_INT16}
{CL_SIGNED_INT8}
{CL_SIGNED_INT16}
{CL_SIGNED_INT32}
{CL_UNSIGNED_INT8}
{CL_UNSIGNED_INT16}
{CL_UNSIGNED_INT32}
{CL_HALF_FLOAT}
{CL_FLOAT}

For embedded profiles devices that support reading from and writing to the same image object from the same kernel instance (see {CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS}) there is no required minimum list of supported image formats.


1. {fn-readimageh}
2. {fn-int64-performance}