You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
a| `img_dot_interleaved` performs the dual dot product operation.
118
+
a| `img_dot_interleaved` performs the dual dot product operation.
99
119
The input vectors of the first dot product are `a` and the vector containing the even-indexed elements of `b`. The result is stored into the first element of the output vector.
100
120
The input vectors of the second dot product are `a` and the vector containing the odd-indexed elements of `b`. The result is stored into the second element of the output vector.
101
-
121
+
102
122
For example, given:
103
-
123
+
104
124
----
105
125
a = [a0 a1]
106
126
b = [b0 b1 b2 b3]
@@ -111,14 +131,17 @@ the output vector is:
111
131
----
112
132
[res0 res1] = [a0 a1] x [b0 b1]
113
133
[b2 b3]
134
+
135
+
res0 = a0b0 + a1b2
136
+
res1 = a0b1 + a1b3
114
137
----
115
138
116
139
Requires that the `__opencl_img_dot_interleaved` feature macro is defined.
a| `img_dot_interleaved_acc` performs the dual dot product operation with the accumulator `acc`.
144
+
a| `img_dot_interleaved_acc` performs the dual dot product operation with the accumulator `acc`.
122
145
The input vectors of the first dot product are `a` and the vector containing the even-indexed elements of `b`. The result is stored into the first element of the output vector.
123
146
The input vectors of the second dot product are `a` and the vector containing the odd-indexed elements of `b`. The result is stored into the second element of the output vector.
124
147
@@ -135,9 +158,129 @@ the output vector is:
135
158
----
136
159
[res0 res1] = [a0 a1] x [b0 b1] + [acc0 acc1]
137
160
[b2 b3]
161
+
162
+
res0 = a0b0 + a1b2 + acc0
163
+
res1 = a0b1 + a1b3 + acc1
138
164
----
139
165
140
166
Requires that the `__opencl_img_dot_interleaved` feature macro is defined.
a| `img_matmul_float_acc_1x2_2x2` performs the dual dot product operation with the accumulator `acc`
169
+
The input vectors of the first dot product are `a` and the vector containing the even-indexed elements of `b`. The result is stored into the first element of the output vector.
170
+
The input vectors of the second dot product are `a` and the vector containing the odd-indexed elements of `b`. The result is stored into the second element of the output vector.
171
+
172
+
For example, given:
173
+
----
174
+
a = [a0 a1]
175
+
b = [b0 b1 b2 b3]
176
+
acc = [acc0 acc1]
177
+
----
178
+
179
+
the output vector is:
180
+
181
+
----
182
+
[res0 res1] = [a0 a1] x [b0 b1] + [acc0 acc1]
183
+
[b2 b3]
184
+
185
+
res0 = a0b0 + a1b2 + acc0
186
+
res1 = a0b1 + a1b3 + acc1
187
+
----
188
+
189
+
Requires that the `__opencl_img_matmul_1x2_2x2` feature macro is defined.
a| `img_matmul_half2_acc_1x2_2x2f` and `img_matmul_half2_acc_1x2_2x2h` perform the dual dot product operation with the accumulator `acc`
193
+
The input vectors of the first dot product are `a` and the vector containing the even-indexed *32-bit elements* of `b`. The result is stored into the first element of the output vector.
194
+
The input vectors of the second dot product are `a` and the vector containing the odd-indexed *32-bit elements* of `b`. The result is stored into the second element of the output vector.
Note: The parentheses are only used to help the reader see that the dot computation is a [1x2] x [2x2] with half2 elements; they do not indicate the accumulation order.
217
+
----
218
+
219
+
Requires that the `__opencl_img_matmul_1x2_2x2` feature macro is defined.
a| `img_matmul_uchar4_acc_1x2_2x2` and `img_matmul_char4_acc_1x2_2x2` perform the dual dot product operation with the accumulator `acc`
225
+
The input vectors of the first dot product are `a` and the vector containing the even-indexed *32-bit elements* of `b`. The result is stored into the first element of the output vector.
226
+
The input vectors of the second dot product are `a` and the vector containing the odd-indexed *32-bit elements* of `b`. The result is stored into the second element of the output vector.
Note: The parentheses are only used to help the reader see that the dot computation is a [1x2] x [2x2] with char4/uchar4 elements; they do not indicate the accumulation order.
248
+
----
249
+
250
+
Requires that the `__opencl_img_matmul_1x2_2x2` feature macro is defined.
a| `img_matmul_uchar4_acc_1x2_2x2_sat` and `img_matmul_char4_acc_1x2_2x2_sat` perform the dual dot product operation, add the accumulator `acc`, and saturate the result.
256
+
The input vectors of the first dot product are `a` and the vector containing the even-indexed *32-bit elements* of `b`. The result is saturated and stored into the first element of the output vector.
257
+
The input vectors of the second dot product are `a` and the vector containing the odd-indexed *32-bit elements* of `b`. The result is saturated and stored into the second element of the output vector.
Note: The parentheses are only used to help the reader see that the dot computation is a [1x2] x [2x2] with char4/uchar4 elements; they do not indicate the accumulation order.
281
+
----
282
+
283
+
Requires that the `__opencl_img_matmul_1x2_2x2` feature macro is defined.
a| `img_matmul_2x4_4x4f` and `img_matmul_2x4_4x4h` perform the matrix multiplication operation of matrices A and B of dimensions 2x4 and 4x4, where `a0` is the first row and `a1` is the second row of the matrix A.
@@ -158,7 +301,7 @@ the output vector is:
158
301
159
302
----
160
303
[res0 res1 res2 res3] = A x B
161
-
[res4 res5 res6 res7]
304
+
[res4 res5 res6 res7]
162
305
----
163
306
164
307
Requires that the `__opencl_img_matmul_2x4_4x4` feature macro is defined.
@@ -184,7 +327,7 @@ the output vector is:
184
327
185
328
----
186
329
[res0 res1 res2 res3] = A x B + C
187
-
[res4 res5 res6 res7]
330
+
[res4 res5 res6 res7]
188
331
----
189
332
190
333
Requires that the `__opencl_img_matmul_2x4_4x4` feature macro is defined.
@@ -209,7 +352,7 @@ the output vector is:
209
352
210
353
----
211
354
[res0 res1 res2 res3] = A x BT
212
-
[res4 res5 res6 res7]
355
+
[res4 res5 res6 res7]
213
356
----
214
357
215
358
Requires that the `__opencl_img_matmul_2x4_4x4` feature macro is defined.
@@ -228,14 +371,14 @@ BT = [b0 b4 b8 b12]
228
371
[b2 b6 b10 b14]
229
372
[b3 b7 b11 b15]
230
373
C = [acc00 acc01 acc02 acc03]
231
-
[acc10 acc11 acc12 acc13]
374
+
[acc10 acc11 acc12 acc13]
232
375
----
233
376
234
377
the output vector is:
235
378
236
379
----
237
380
[res0 res1 res2 res3] = A x BT + C
238
-
[res4 res5 res6 res7]
381
+
[res4 res5 res6 res7]
239
382
----
240
383
241
384
Requires that the `__opencl_img_matmul_2x4_4x4` feature macro is defined.
@@ -245,7 +388,7 @@ Requires that the `__opencl_img_matmul_2x4_4x4` feature macro is defined.
245
388
== Coding Sample
246
389
247
390
This coding sample shows how to initialize the input vectors, use the *img_dot_interleaved_acc* function, and access the output vector:
248
-
[source]
391
+
[source,c]
249
392
----
250
393
float4 a = (float4) (1.0f, 1.0f, 1.0f, 1.0f);
251
394
__local float8 b;
@@ -257,14 +400,80 @@ float2 res = img_dot_interleaved_acc(a, &b, acc);
257
400
printf("res = [ %f %f ]\n", res.s0, res.s1);
258
401
----
259
402
260
-
Executing a work-item containing this code gives the following result:
261
-
[source]
403
+
This coding sample shows how to use the *img_matmul_float_acc_1x2_2x2* function:
0 commit comments