I was able to save a partially optimized loop_tree, so the agent just needs to label it properly
This is how it looks like:
for m_1677 in 64 : L0
Both LoopTool and LoopNest fail to generate code for copy input operation.
Here is an example and reproducer:
for n_1679 in 16 : L1
for k_1678 in 256 : L2
for m_1677' in 4 : L3
%5[m_1677, k_1678] <- copy(%0)
for n_1679' in 4 : L5
for n_1679'' in 4 : L6
%6[k_1678, n_1679] <- copy(%1)
for m_1677' in 4 : L8
for n_1679' in 4 : L9
for n_1679'' in 4 : L10
%2[m_1677, k_1678, n_1679] <- multiply(%5, %6)
%3[m_1677, n_1679] <- add(%2)
for m_1677' in 4 : L13
for n_1679 in 64 : L14
for n_1679' in 4 : L15
%4[m_1677, n_1679] <- write(%3)
Reproducer:
import loop_tool as lt
import numpy as np
import pdb
def mm(A, B):
s = lt.SymbolGenerator()
C = A.to(s.m, s.k) * B.to(s.k, s.n)
return C.sum(s.k)
m, n, k = 256, 256, 256
A = lt.Tensor(m, k).set(np.random.randn(m, k))
B = lt.Tensor(k, n).set(np.random.randn(k, n))
s = lt.SymbolGenerator()
C = mm(A, B)
loop_tree = C.loop_tree.split(0, 4)\
.swap_loops(1, 2)\
.swap_loops(2, 3)\
.swap_loops(2, 1)\
.split(1, 16)\
.swap_loops(2, 3)\
.swap_loops(3, 4)\
.copy_input(5, 0)\
.try_swap(5, 4)\
.split(5, 4)\
.copy_input(7, 1)\
.decrease_reuse(7)\
.decrease_reuse(7)\
.decrease_reuse(7)\
.split(14, 4)
C.set(loop_tree)
with open("data/mm256.txt", "w") as f:
f.write(C.ir.serialize())
pdb.set_trace()
** Loop Nest
Fails on:
with lt.Backend("loop_nest"):
mean_runtime = self.tensor.loop_tree.eval()
RuntimeError: assertion: fma_nest failed @ /home/dejang/loop_tool/src/backends/cpu/loop_nest.cpp:30
** Loop Tool
Fails on:
mean_runtime = self.tensor.loop_tree.eval()
error assertion: 0 failed @ /Users/dejang/Desktop/work/loop_tool/src/backends/cpu/cpp.cpp:228 can't emit code for copy
I was able to save a partially optimized loop_tree, so the agent just needs to label it properly
This is how it looks like:
for m_1677 in 64 : L0
Both LoopTool and LoopNest fail to generate code for copy input operation.
Here is an example and reproducer:
for n_1679 in 16 : L1
for k_1678 in 256 : L2
for m_1677' in 4 : L3
%5[m_1677, k_1678] <- copy(%0)
for n_1679' in 4 : L5
for n_1679'' in 4 : L6
%6[k_1678, n_1679] <- copy(%1)
for m_1677' in 4 : L8
for n_1679' in 4 : L9
for n_1679'' in 4 : L10
%2[m_1677, k_1678, n_1679] <- multiply(%5, %6)
%3[m_1677, n_1679] <- add(%2)
for m_1677' in 4 : L13
for n_1679 in 64 : L14
for n_1679' in 4 : L15
%4[m_1677, n_1679] <- write(%3)
Reproducer:
** Loop Nest
Fails on:
RuntimeError: assertion: fma_nest failed @ /home/dejang/loop_tool/src/backends/cpu/loop_nest.cpp:30
** Loop Tool
Fails on:
mean_runtime = self.tensor.loop_tree.eval()error assertion: 0 failed @ /Users/dejang/Desktop/work/loop_tool/src/backends/cpu/cpp.cpp:228 can't emit code for copy