Skip to content
  • Akinobu Mita's avatar
    [PATCH] bitops: hweight() speedup · f9b41929
    Akinobu Mita authored
    <linux@horizon.com> wrote:
    
    This is an extremely well-known technique.  You can see a similar version that
    uses a multiply for the last few steps at
    http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel whch
    refers to "Software Optimization Guide for AMD Athlon 64 and Opteron
    Processors"
    http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
    
    It's section 8.6, "Efficient Implementation of Population-Count Function in
    32-bit Mode", pages 179-180.
    
    It uses the name that I am more familiar with, "popcount" (population count),
    although "Hamming weight" also makes sense.
    
    Anyway, the proof of correctness proceeds as follows:
    
    	b = a - ((a >> 1) & 0x55555555);
    	c = (b & 0x33333333) + ((b >> 2) & 0x33333333);
    	d = (c + (c >> 4)) & 0x0f0f0f0f;
    #if SLOW_MULTIPLY
    	e = d + (d >> 8)
    	f = e + (e >> 16);
    	return f & 63;
    #else
    	/* Useful if multiply takes at most 4 cycles */
    	return (d * 0x01010101) >> 24;
    #endif
    
    The input value a can be th...
    f9b41929