Skip to content
  • Jiong Wang's avatar
    lib: reciprocal_div: implement the improved algorithm on the paper mentioned · 06ae4826
    Jiong Wang authored
    
    
    The new added "reciprocal_value_adv" implements the advanced version of the
    algorithm described in Figure 4.2 of the paper except when
    "divisor > (1U << 31)" whose ceil(log2(d)) result will be 32 which then
    requires u128 divide on host. The exception case could be easily handled
    before calling "reciprocal_value_adv".
    
    The advanced version requires more complex calculation to get the
    reciprocal multiplier and other control variables, but then could reduce
    the required emulation operations.
    
    It makes no sense to use this advanced version for host divide emulation,
    those extra complexities for calculating multiplier etc could completely
    waive our saving on emulation operations.
    
    However, it makes sense to use it for JIT divide code generation (for
    example eBPF JIT backends) for which we are willing to trade performance of
    JITed code with that of host. As shown by the following pseudo code, the
    required emulation operations could go down from 6 (the basic version) to 3
    or 4.
    
    To use the result of "reciprocal_value_adv", suppose we want to calculate
    n/d, the C-style pseudo code will be the following, it could be easily
    changed to real code generation for other JIT targets.
    
      struct reciprocal_value_adv rvalue;
      u8 pre_shift, exp;
    
      // handle exception case.
      if (d >= (1U << 31)) {
        result = n >= d;
        return;
      }
      rvalue = reciprocal_value_adv(d, 32)
      exp = rvalue.exp;
      if (rvalue.is_wide_m && !(d & 1)) {
        // floor(log2(d & (2^32 -d)))
        pre_shift = fls(d & -d) - 1;
        rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
      } else {
        pre_shift = 0;
      }
    
      // code generation starts.
      if (imm == 1U << exp) {
        result = n >> exp;
      } else if (rvalue.is_wide_m) {
        // pre_shift must be zero when reached here.
        t = (n * rvalue.m) >> 32;
        result = n - t;
        result >>= 1;
        result += t;
        result >>= rvalue.sh - 1;
      } else {
        if (pre_shift)
          result = n >> pre_shift;
        result = ((u64)result * rvalue.m) >> 32;
        result >>= rvalue.sh;
      }
    
    Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
    Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    06ae4826