动荡不安的一个月。

vlmax

最大向量长度。最多可以处理的元素(element)个数。
推导式为:VLMAX = LMULVLEN/SEW
比如:
在 VLEN = 256 bit, sew = 8 bit, lmul = 4 的情况下,VLMAX = LMUL
VLEN/SEW = 4*256/8 = 128。 代表最多可以处理128个 sew = 8 的元素(element)。

RIF暂且没有lmul相关的变量,将 vlen 假定为 256 bit。终于拼上了 spike。

spike

spike 中找到指定 element

  // vector element for various SEW
  template<typename T> T& elt(reg_t vReg, reg_t n, bool is_write = false) {
    assert(vsew != 0);
    assert((VLEN >> 3)/sizeof(T) > 0);
    reg_t elts_per_reg = (VLEN >> 3) / (sizeof(T));
    vReg += n / elts_per_reg;
    n = n % elts_per_reg;
#ifdef WORDS_BIGENDIAN
    // "V" spec 0.7.1 requires lower indices to map to lower significant
    // bits when changing SEW, thus we need to index from the end on BE.
    n ^= elts_per_reg - 1;
#endif
    if (is_write)
      log_elt_write_if_needed(vReg);

    T *regStart = (T*)((char*)reg_file + vReg * (VLEN >> 3));
    return regStart[n];

elts_per_reg: VLEN » 3 to get the vlenb, sizeof(T) to get the byte of sew.
vReg: The index of the starting vector register. The vReg value specifies which vector register(e.g., v0, v1, …, v31) to access or start from when reading or writing vector elements.

RIF

配置多个节点查看生成的图:dot -Tpng output.dot > output.png
log:

rif:converter.f16 = inf, intrinsic:converter2.f16 = nan
rif:converter.u16 = 31744, intrinsic:converter2.u16 = 65535
rif:converter.u16 = 7c00, intrinsic:converter2.u16 = ffff

gdb:

pc             0x10fc4  0x10fc4 <vrgather_vv_operator_0+142>
(gdb) info vector
v0             {q = {0xfffffffffffffffffffffffffffffff2, 0xffffffffffffffffffffffffffffffff}, 
                l = {0xfffffffffffffff2, 0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff}, 
                w = {0xfffffff2, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff}, 
                s = {0xfff2, 0xffff <repeats 15 times>}, 
                b = {0xf2, 0xff <repeats 31 times>}}
v1             {q = {0x18810fc510a810c20e33092d21eb26dc, 0x114513ce13ef1544027b0d6a245e1431}, 
                l = {0xe33092d21eb26dc, 0x18810fc510a810c2, 0x27b0d6a245e1431, 0x114513ce13ef1544}, 
                w = {0x21eb26dc, 0xe33092d, 0x10a810c2, 0x18810fc5, 0x245e1431, 0x27b0d6a, 0x13ef1544, 0x114513ce}, 
                s = {0x26dc, 0x21eb, 0x92d, 0xe33, 0x10c2, 0x10a8, 0xfc5, 0x1881, 0x1431, 0x245e, 0xd6a, 0x27b, 0x1544, 0x13ef, 0x13ce, 0x1145},
                b = {0xdc, 0x26, 0xeb, 0x21, 0x2d, 0x9, 0x33, 0xe, 0xc2, 0x10, 0xa8, 0x10, 0xc5, 0xf, 0x81, 0x18, 0x31, 0x14, 0x5e, 0x24, 0x6a, 0xd, 0x7b, 0x2, 0x44, 0x15, 0xef, 0x13, 0xce, 0x13, 0x45, 0x11}}

IEEE754_2008

反正现在碰到的也都是浮点问题,不如直接从头到尾看完 IEEE 754。待更新。