动荡不安的一个月。
vlmax
最大向量长度。最多可以处理的元素(element)个数。
推导式为:VLMAX = LMULVLEN/SEW
比如:
在 VLEN = 256 bit, sew = 8 bit, lmul = 4 的情况下,VLMAX = LMULVLEN/SEW = 4*256/8 = 128。
代表最多可以处理128个 sew = 8 的元素(element)。
RIF暂且没有lmul相关的变量,将 vlen 假定为 256 bit。终于拼上了 spike。
spike
spike 中找到指定 element
// vector element for various SEW
template<typename T> T& elt(reg_t vReg, reg_t n, bool is_write = false) {
assert(vsew != 0);
assert((VLEN >> 3)/sizeof(T) > 0);
reg_t elts_per_reg = (VLEN >> 3) / (sizeof(T));
vReg += n / elts_per_reg;
n = n % elts_per_reg;
#ifdef WORDS_BIGENDIAN
// "V" spec 0.7.1 requires lower indices to map to lower significant
// bits when changing SEW, thus we need to index from the end on BE.
n ^= elts_per_reg - 1;
#endif
if (is_write)
log_elt_write_if_needed(vReg);
T *regStart = (T*)((char*)reg_file + vReg * (VLEN >> 3));
return regStart[n];
elts_per_reg: VLEN » 3 to get the vlenb, sizeof(T) to get the byte of sew.
vReg: The index of the starting vector register. The vReg value specifies which vector register(e.g., v0, v1, …, v31)
to access or start from when reading or writing vector elements.
RIF
配置多个节点查看生成的图:dot -Tpng output.dot > output.png
log:
rif:converter.f16 = inf, intrinsic:converter2.f16 = nan
rif:converter.u16 = 31744, intrinsic:converter2.u16 = 65535
rif:converter.u16 = 7c00, intrinsic:converter2.u16 = ffff
gdb:
pc 0x10fc4 0x10fc4 <vrgather_vv_operator_0+142>
(gdb) info vector
v0 {q = {0xfffffffffffffffffffffffffffffff2, 0xffffffffffffffffffffffffffffffff},
l = {0xfffffffffffffff2, 0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff},
w = {0xfffffff2, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff},
s = {0xfff2, 0xffff <repeats 15 times>},
b = {0xf2, 0xff <repeats 31 times>}}
v1 {q = {0x18810fc510a810c20e33092d21eb26dc, 0x114513ce13ef1544027b0d6a245e1431},
l = {0xe33092d21eb26dc, 0x18810fc510a810c2, 0x27b0d6a245e1431, 0x114513ce13ef1544},
w = {0x21eb26dc, 0xe33092d, 0x10a810c2, 0x18810fc5, 0x245e1431, 0x27b0d6a, 0x13ef1544, 0x114513ce},
s = {0x26dc, 0x21eb, 0x92d, 0xe33, 0x10c2, 0x10a8, 0xfc5, 0x1881, 0x1431, 0x245e, 0xd6a, 0x27b, 0x1544, 0x13ef, 0x13ce, 0x1145},
b = {0xdc, 0x26, 0xeb, 0x21, 0x2d, 0x9, 0x33, 0xe, 0xc2, 0x10, 0xa8, 0x10, 0xc5, 0xf, 0x81, 0x18, 0x31, 0x14, 0x5e, 0x24, 0x6a, 0xd, 0x7b, 0x2, 0x44, 0x15, 0xef, 0x13, 0xce, 0x13, 0x45, 0x11}}
IEEE754_2008
反正现在碰到的也都是浮点问题,不如直接从头到尾看完 IEEE 754。待更新。