Ansor Features

特征值处理

对于每种实现，Ansor 会提取出多行的 164 个特征。每行对应于一个 Buffer。

除枚举类型转换的 one-hot 特征值是 0 / 1 以外，其他的特征值都需要将原始值传入 log2p 函数以缩小数据范围：

inline float log2p(float x) { return x < 0 ? -std::log2(-x + 1) : std::log2(x + 1); }

如，blockIdx.x 长度的原始值是 64，那么 blockIdx_x_len 这个特征值为 log2p(64)，即为 \(\log_2(64+1)\approx 6.022368\)。

Group 1：计算相关 (57)

浮点运算
- float_mad：float Multiply-add 操作的数量
- float_addsub：float 加减操作的数量
- float_mul：float 乘法操作的数量
- float_divmod：float 除法和取模操作的数量
- float_cmp：float 比较操作的数量
- float_math_func：float 数学函数调用的数量
- float_other_func：其他 float 函数调用的数量
整形运算
- int_mad：整数 multiply-add 操作的数量
- int_addsub：整数加减操作的数量
- int_mul：整数乘法操作的数量
- int_divmod：整数除法和取模操作的数量
- int_cmp：整数比较操作的数量
- int_math_func：整数数学函数调用的数量
- int_other_func：其他整数函数调用的数量
布尔运算
- bool_op：布尔运算操作的数量
分支操作
- select_op：选择操作的数量
向量化操作
- vec_num：向量化的 iterator 的数量
- vec_prod：向量化的 iterator 的长度的积
- vec_len：最内层向量化 iterator 的长度
- vec_type：向量化 iterator 的类型（枚举类型 AnnotationPosType，转换为 one-hot 编码，故有 8 个 feature）
循环展开操作
- unroll_num：循环展开 iterator 的数量
- unroll_prod：循环展开 iterator 长度的积
- unroll_len：最内层循环展开 iterator 的长度
- unroll_type：循环展开 iterator 的类型（枚举类型 AnnotationPosType，转为 8 个 one-hot 的 feature）
并行化操作
- parallel_num：并行 iterator 的数量
- parallel_prod：并行 iterator 长度的积
- parallel_len：最内层并行 iterator 的长度
- parallel_type：并行 iterator 的类型（枚举类型 AnnotationPosType，转为 8 个 one-hot 的 feature）
GPU 相关操作
- is_gpu：是否是 GPU 任务
- blockIdx_x_len：blockIdx.x 的长度
- blockIdx_y_len：blockIdx.y 的长度
- blockIdx_z_len：blockIdx.z 的长度
- threadIdx_x_len：threadIdx.x 的长度
- threadIdx_y_len：threadIdx.y 的长度
- threadIdx_z_len：threadIdx.z 的长度
- vthread_len：virtual thread 的长度

Group 2：Buffer 访问相关（90）

默认是 5 个 buffer 的 BufferAccessFeature struct 类型变量的内容，这 5 个 BufferAccessFeature 变量分别是 B0，B1，B2，B3，B4。如果任务 buffer 数量不足 5 个，其余 buffer 中所有 feature 用 0 填充（例如某任务只有 3 个 buffer，那么 B3 和 B4 对应的 feature 都是 0）；如果超过 5 个，则只保留前 5 个 buffer 的特征。

每个 buffer 需要提取如下特征（18 个）：

acc_type：读写访问类型（BufferAccessType 枚举类型，提取出 3 个 one-hot feature：kRead、kWrite、kReadWrite）
bytes：访问的内存大小（单位：B）
unique_bytes：访问的唯一内存大小（单位：B）
lines：访问的缓存的行数
unique_lines：访问的唯一缓存行数量
reuse_type：数据重用的类型（ ReuseType 枚举类型，提取出 3 个 one-hot feature）
reuse_dis_iter：重用距离，以 iterator 数量衡量
reuse_dis_bytes：重用距离，以总访问字节数衡量
reuse_ct：重用比率
bytes_d_reuse_ct：（bytes / reuse_ct）
unique_bytes_d_reuse_ct ：（unique_bytes / reuse_ct）
lines_d_reuse_ct：（lines / reuse_ct）
unique_lines_d_reuse_ct：（unique_lines / reuse_ct）
stride：访问的步长

Group 3：算术强度相关（10）

从算术强度曲线（arithmetic intensity curve）上取样的 10 个点

Group 4：存储分配相关（4）

alloc_size：分配缓冲区的大小（单位：B），对于 scope 为 local、shared 和 global 的 Buffer 即分别对应 register 大小，shared memory 大小和 global memory 的大小。
alloc_prod：注释写的是 alloc_outer_prod * alloc_inner_prod，实际上是 alloc_outer_prod * alloc_size/sizeof(datatype)
alloc_outer_prod：分配 scope 之外的循环长度的乘积
alloc_inner_prod：分配 scope 之内循环长度的乘积（ outer_prod / alloc_outer_prod）

Group 5：Outer scope 相关（3）

outer_prod：外层循环的乘积
num_loops：外层循环的数量
auto_unroll_max_step：pragma 指令中 auto_unroll_max_step 的值

附录 A - AnnotationPosType 类型

// Annotation position encoding
enum class AnnotationPosType : int {
  kPosNone = 0,           // Does not have this kind of annotation
  kPosInnerSpatial = 1,   // The annotated iterator is the innermost spatial iterator
  kPosMiddleSpatial = 2,  // The annotated iterator is a middle spatial iterator
  kPosOuterSpatial = 3,   // The annotated iterator is the outermost spatial iterator
  kPosInnerReduce = 4,    // The annotated iterator is the innermost reduce iterator
  kPosMiddleReduce = 5,   // The annotated iterator is a middle reduce iterator
  kPosOuterReduce = 6,    // The annotated iterator is the outermost reduce iterator
  kPosMixed = 7           // The annotated iterator is a mixed space and reduce iterator
};

附录 B - BufferAccessFeature 类型

// Feature for an access of a buffer
struct BufferAccessFeature {
  std::string buffer_name;        // The name of the buffer
  BufferAccessType acc_type;      // The type of the access
  float bytes;                    // The touched memory in bytes
  float unique_bytes;             // The touched unique memory in bytes
  float lines;                    // The number of touched cache lines
  float unique_lines;             // The number touched unique cache lines
  ReuseType reuse_type;           // Tye type of data reuse
  float reuse_dis_iter;           // The reuse distance in iterator number
  float reuse_dis_bytes;          // The reuse distance in total touched bytes
  float reuse_ct;                 // The reuse ratio
  float bytes_d_reuse_ct;         // bytes / reuse_ct
  float unique_bytes_d_reuse_ct;  // unique_bytes / reuse_ct
  float lines_d_reuse_ct;         // lines / reuse_ct
  float unique_lines_d_reuse_ct;  // unique_lines / reuse_ct
  float stride;                   // The stride in access
};

附录 C - BufferAccessType 类型

// Buffer access type
enum class BufferAccessType : int { 
    kRead = 0, 
    kWrite = 1, 
    kReadWrite = 2, 
    kUnknownRW = 3 
};

附录 D - ReuseType 类型

// Data reuse type
enum class ReuseType : int { 
    kLoopMultipleRead = 0, 
    kSerialMultipleReadWrite = 1, 
    kNoReuse = 2 
};