1. 08 Oct, 2016 1 commit
  2. 09 May, 2016 1 commit
    • Guoqing Jiang's avatar
      md: set MD_CHANGE_PENDING in a atomic region · 85ad1d13
      Guoqing Jiang authored
      Some code waits for a metadata update by:
      
      1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
      2. setting MD_CHANGE_PENDING and waking the management thread
      3. waiting for MD_CHANGE_PENDING to be cleared
      
      If the first two are done without locking, the code in md_update_sb()
      which checks if it needs to repeat might test if an update is needed
      before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
      in the wait returning early.
      
      So make sure all places that set MD_CHANGE_PENDING are atomicial, and
      bit_clear_unless (suggested by Neil) is introduced for the purpose.
      
      Cc: Martin Kepplinger <martink@posteo.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: <linux-kernel@vger.kernel.org>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      85ad1d13
  3. 09 Dec, 2015 1 commit
    • Sasha Levin's avatar
      bitops.h: correctly handle rol32 with 0 byte shift · d7e35dfa
      Sasha Levin authored
      ROL on a 32 bit integer with a shift of 32 or more is undefined and the
      result is arch-dependent. Avoid this by handling the trivial case of
      roling by 0 correctly.
      
      The trivial solution of checking if shift is 0 breaks gcc's detection
      of this code as a ROL instruction, which is unacceptable.
      
      This bug was reported and fixed in GCC
      (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):
      
      	The standard rotate idiom,
      
      	  (x << n) | (x >> (32 - n))
      
      	is recognized by gcc (for concreteness, I discuss only the case that x
      	is an uint32_t here).
      
      	However, this is portable C only for n in the range 0 < n < 32. For n
      	== 0, we get x >> 32 which gives undefined behaviour according to the
      	C standard (6.5.7, Bitwise shift operators). To portably support n ==
      	0, one has to write the rotate as something like
      
      	  (x << n) | (x >> ((-n) & 31))
      
      	And this is apparently not recognized by gcc.
      
      Note that this is broken on older GCCs and will result in slower ROL.
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7e35dfa
  4. 07 Nov, 2015 2 commits
  5. 05 Aug, 2015 1 commit
  6. 17 Apr, 2015 1 commit
    • Yury Norov's avatar
      lib: find_*_bit reimplementation · 2c57a0e2
      Yury Norov authored
      This patchset does rework to find_bit function family to achieve better
      performance, and decrease size of text.  All rework is done in patch 1.
      Patches 2 and 3 are about code moving and renaming.
      
      It was boot-tested on x86_64 and MIPS (big-endian) machines.
      Performance tests were ran on userspace with code like this:
      
      	/* addr[] is filled from /dev/urandom */
      	start = clock();
      	while (ret < nbits)
      		ret = find_next_bit(addr, nbits, ret + 1);
      
      	end = clock();
      	printf("%ld\t", (unsigned long) end - start);
      
      On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for
      find_next_bit, nbits is 8M, for find_first_bit - 80K)
      
      	find_next_bit:		find_first_bit:
      	new	current		new	current
      	26932	43151		14777	14925
      	26947	43182		14521	15423
      	26507	43824		15053	14705
      	27329	43759		14473	14777
      	26895	43367		14847	15023
      	26990	43693		15103	15163
      	26775	43299		15067	15232
      	27282	42752		14544	15121
      	27504	43088		14644	14858
      	26761	43856		14699	15193
      	26692	43075		14781	14681
      	27137	42969		14451	15061
      	...			...
      
      find_next_bit performance gain is 35-40%;
      find_first_bit - no measurable difference.
      
      On ARM machine, there is arch-specific implementation for find_bit.
      
      Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
      helpful discussions.
      
      This patch (of 3):
      
      New implementations takes less space in source file (see diffstat) and in
      object.  For me it's 710 vs 453 bytes of text.  It also shows better
      performance.
      
      find_last_bit description fixed due to obvious typo.
      
      [akpm@linux-foundation.org: include linux/bitmap.h, per Rasmus]
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Alexey Klimov <klimov.linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c57a0e2
  7. 16 Nov, 2014 1 commit
  8. 13 Aug, 2014 1 commit
    • Peter Zijlstra's avatar
      locking: Remove deprecated smp_mb__() barriers · 2e39465a
      Peter Zijlstra authored
      Its been a while and there are no in-tree users left, so remove the
      deprecated barriers.
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Chen, Gong <gong.chen@linux.intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Sullivan <jsrhbz@kanargh.force9.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2e39465a
  9. 18 Apr, 2014 1 commit
    • Peter Zijlstra's avatar
      arch: Prepare for smp_mb__{before,after}_atomic() · febdbfe8
      Peter Zijlstra authored
      Since the smp_mb__{before,after}*() ops are fundamentally dependent on
      how an arch can implement atomics it doesn't make sense to have 3
      variants of them. They must all be the same.
      
      Furthermore, the 3 variants suggest they're only valid for those 3
      atomic ops, while we have many more where they could be applied.
      
      So move away from
      smp_mb__{before,after}_{atomic,clear}_{dec,inc,bit}() and reduce the
      interface to just the two: smp_mb__{before,after}_atomic().
      
      This patch prepares the way by introducing default implementations in
      asm-generic/barrier.h that default to a full barrier and providing
      __deprecated inlines for the previous 6 barriers if they're not
      provided by the arch.
      
      This should allow for a mostly painless transition (lots of deprecated
      warns in the interim).
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/n/tip-wr59327qdyi9mbzn6x937s4e@git.kernel.org
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Chen, Gong" <gong.chen@linux.intel.com>
      Cc: John Sullivan <jsrhbz@kanargh.force9.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      febdbfe8
  10. 31 Mar, 2014 1 commit
  11. 21 Oct, 2013 1 commit
  12. 16 Oct, 2013 1 commit
  13. 23 Mar, 2012 3 commits
  14. 16 Feb, 2012 1 commit
  15. 15 Jan, 2012 1 commit
  16. 06 Dec, 2011 1 commit
  17. 27 May, 2011 2 commits
  18. 15 Nov, 2010 1 commit
  19. 09 Oct, 2010 2 commits
    • Akinobu Mita's avatar
      bitops: remove duplicated extern declarations · d852a6af
      Akinobu Mita authored
      If CONFIG_GENERIC_FIND_NEXT_BIT is enabled, find_next_bit() and
      find_next_zero_bit() are doubly declared in asm-generic/bitops/find.h
      and linux/bitops.h.
      
      asm/bitops.h includes asm-generic/bitops/find.h if and only if the
      architecture enables CONFIG_GENERIC_FIND_NEXT_BIT. And asm/bitops.h
      is included by linux/bitops.h
      
      So we can just remove the extern declarations of find_next_bit() and
      find_next_zero_bit() in linux/bitops.h.
      
      Also we can remove unneeded #ifndef CONFIG_GENERIC_FIND_NEXT_BIT in
      asm-generic/bitops/find.h.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      d852a6af
    • Akinobu Mita's avatar
      bitops: make asm-generic/bitops/find.h more generic · 708ff2a0
      Akinobu Mita authored
      asm-generic/bitops/find.h has the extern declarations of find_next_bit()
      and find_next_zero_bit() and the macro definitions of find_first_bit()
      and find_first_zero_bit(). It is only usable by the architectures which
      enables CONFIG_GENERIC_FIND_NEXT_BIT and disables
      CONFIG_GENERIC_FIND_FIRST_BIT.
      
      x86 and tile enable both CONFIG_GENERIC_FIND_NEXT_BIT and
      CONFIG_GENERIC_FIND_FIRST_BIT. These architectures cannot include
      asm-generic/bitops/find.h in their asm/bitops.h. So ifdefed extern
      declarations of find_first_bit and find_first_zero_bit() are put in
      linux/bitops.h.
      
      This makes asm-generic/bitops/find.h usable by these architectures
      and use it. Also this change is needed for the forthcoming duplicated
      extern declarations cleanup.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      708ff2a0
  20. 04 May, 2010 1 commit
  21. 07 Apr, 2010 1 commit
  22. 06 Apr, 2010 1 commit
  23. 06 Mar, 2010 1 commit
  24. 04 Feb, 2010 1 commit
  25. 29 Jan, 2010 1 commit
  26. 23 Apr, 2009 1 commit
  27. 31 Dec, 2008 1 commit
  28. 29 Apr, 2008 2 commits
    • Thomas Gleixner's avatar
      bitops: remove "optimizations" · fee4b19f
      Thomas Gleixner authored
      The mapsize optimizations which were moved from x86 to the generic
      code in commit 64970b68 increased the
      binary size on non x86 architectures.
      
      Looking into the real effects of the "optimizations" it turned out
      that they are not used in find_next_bit() and find_next_zero_bit().
      
      The ones in find_first_bit() and find_first_zero_bit() are used in a
      couple of places but none of them is a real hot path.
      
      Remove the "optimizations" all together and call the library functions
      unconditionally.
      
      Boot-tested on x86 and compile tested on every cross compiler I have.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fee4b19f
    • Eric Dumazet's avatar
      Avoid divides in BITS_TO_LONGS · ede9c697
      Eric Dumazet authored
      BITS_PER_LONG is a signed value (32 or 64)
      
      DIV_ROUND_UP(nr, BITS_PER_LONG) performs signed arithmetic if "nr" is signed too.
      
      Converting BITS_TO_LONGS(nr) to DIV_ROUND_UP(nr, BITS_PER_BYTE *
      sizeof(long)) makes sure compiler can perform a right shift, even if "nr"
      is a signed value, instead of an expensive integer divide.
      
      Applying this patch saves 141 bytes on x86 when CONFIG_CC_OPTIMIZE_FOR_SIZE=y
      and speedup bitmap operations.
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ede9c697
  29. 26 Apr, 2008 3 commits
    • Alexander van Heukelum's avatar
      x86: optimize find_first_bit for small bitmaps · 3a483050
      Alexander van Heukelum authored
      Avoid a call to find_first_bit if the bitmap size is know at
      compile time and small enough to fit in a single long integer.
      Modeled after an optimization in the original x86_64-specific
      code.
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3a483050
    • Alexander van Heukelum's avatar
      x86: generic versions of find_first_(zero_)bit, convert i386 · 77b9bd9c
      Alexander van Heukelum authored
      Generic versions of __find_first_bit and __find_first_zero_bit
      are introduced as simplified versions of __find_next_bit and
      __find_next_zero_bit. Their compilation and use are guarded by
      a new config variable GENERIC_FIND_FIRST_BIT.
      
      The generic versions of find_first_bit and find_first_zero_bit
      are implemented in terms of the newly introduced __find_first_bit
      and __find_first_zero_bit.
      
      This patch does not remove the i386-specific implementation,
      but it does switch i386 to use the generic functions by setting
      GENERIC_FIND_FIRST_BIT=y for X86_32.
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      77b9bd9c
    • Alexander van Heukelum's avatar
      x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps · 64970b68
      Alexander van Heukelum authored
      This moves an optimization for searching constant-sized small
      bitmaps form x86_64-specific to generic code.
      
      On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
      changes with this applied. I have observed only four places where this
      optimization avoids a call into find_next_bit:
      
      In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
      and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
      In __next_cpu a call is avoided for a 32-bit bitmap. That's it.
      
      On x86_64, 52 locations are optimized with a minimal increase in
      code size:
      
      Current #testing defconfig:
      	146 x bsf, 27 x find_next_*bit
         text    data     bss     dec     hex filename
         5392637  846592  724424 6963653  6a41c5 vmlinux
      
      After removing the x86_64 specific optimization for find_next_*bit:
      	94 x bsf, 79 x find_next_*bit
         text    data     bss     dec     hex filename
         5392358  846592  724424 6963374  6a40ae vmlinux
      
      After this patch (making the optimization generic):
      	146 x bsf, 27 x find_next_*bit
         text    data     bss     dec     hex filename
         5392396  846592  724424 6963412  6a40d4 vmlinux
      
      [ tglx@linutronix.de: build fixes ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      64970b68
  30. 28 Mar, 2008 1 commit
  31. 19 Oct, 2007 2 commits