aboutsummaryrefslogtreecommitdiff
path: root/include/asm-x86/bitops.h
AgeCommit message (Collapse)Author
2008-06-21x86, bitops: make constant-bit set/clear_bit ops faster, gcc workaroundIngo Molnar
Jeremy Fitzhardinge reported this compiler bug: Suggestion from Linus: add "r" to the input constraint of the set_bit()/clear_bit()'s constant 'nr' branch: Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)": CC init/main.o include2/asm/bitops.h: In function `start_kernel': include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints include2/asm/bitops.h:59: error: impossible constraint in `asm' include2/asm/bitops.h:59: error: impossible constraint in `asm' include2/asm/bitops.h:59: error: impossible constraint in `asm' Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20x86, bitops: make constant-bit set/clear_bit ops faster, adapt, clean upIngo Molnar
fix integration bug introduced by "x86: bitops take an unsigned long *" which turned "(void *) + x" into "(long *) + x". small cleanups to make it more apparent which value get propagated where. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19x86, bitops: make constant-bit set/clear_bit ops fasterLinus Torvalds
On Wed, 18 Jun 2008, Linus Torvalds wrote: > > And yes, the "lock andl" should be noticeably faster than the xchgl. I dunno. Here's a untested (!!) patch that turns constant-bit set/clear_bit ops into byte mask ops (lock orb/andb). It's not exactly pretty. The reason for using the byte versions is that a locked op is serialized in the memory pipeline anyway, so there are no forwarding issues (that could slow down things when we access things with different sizes), and the byte ops are a lot smaller than 32-bit and particularly 64-bit ops (big constants, and the 64-bit ops need the REX prefix byte too). [ Side note: I wonder if we should turn the "test_bit()" C version into a "char *" version too.. It could actually help with alias analysis, since char pointers can alias anything. So it might be the RightThing(tm) to do for multiple reasons. I dunno. It's a separate issue. ] It does actually shrink the kernel image a bit (a couple of hundred bytes on the text segment for my everything-compiled-in image), and while it's totally untested the (admittedly few) code generation points I looked at seemed sane. And "lock orb" should be noticeably faster than "lock bts". If somebody wants to play with it, go wild. I didn't do "change_bit()", because nobody sane uses that thing anyway. I guarantee nothing. And if it breaks, nobody saw me do anything. You can't prove this email wasn't sent by somebody who is good at forging smtp. This does require a gcc that is recent enough for "__builtin_constant_p()" to work in an inline function, but I suspect our kernel requirements are already higher than that. And if you do have an old gcc that is supported, the worst that would happen is that the optimization doesn't trigger. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25x86: bitops take an unsigned long *Andrew Morton
All (or most) other architectures do this. So should x86. Fix. Cc: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10x86: revert commit 709f744 ("x86: bitops asm constraint fixes")Simon Holm Thøgersen
709f744 causes my computer to freeze during the start up of X and my login manger (GDM). It gets to the point where it has shown the default X mouse cursor logo (a big X / cross) and does not respond to anything from that point on. This worked fine before 709f744, and it works fine with 709f744 reverted on top of Linus' current tree (f74d505). The revert had conflicts, as far as I can tell due to white space changes. The diff I ended up with is below. It is 100% reproducible. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting onlyJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26x86: finalize bitops unificationAlexander van Heukelum
include/asm-x86/bitops_32.h and include/asm-x86/bitops_64.h are now almost identical. The 64-bit version sets ARCH_HAS_FAST_MULTIPLIER and has an extra inline function set_bit_string. The define currently has no influence on the generated code, but it can be argued that setting it on i386 is the right thing to do anyhow. The addition of the extra inline function on i386 does not hurt either. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26x86: merge the simple bitops and move them to bitops.hAlexander van Heukelum
Some of those can be written in such a way that the same inline assembly can be used to generate both 32 bit and 64 bit code. For ffs and fls, x86_64 unconditionally used the cmov instruction and i386 unconditionally used a conditional branch over a mov instruction. In the current patch I chose to select the version based on the availability of the cmov instruction instead. A small detail here is that x86_64 did not previously set CONFIG_X86_CMOV=y. Improved comments for ffs, ffz, fls and variations. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26x86, generic: optimize find_next_(zero_)bit for small constant-size bitmapsAlexander van Heukelum
This moves an optimization for searching constant-sized small bitmaps form x86_64-specific to generic code. On an i386 defconfig (the x86#testing one), the size of vmlinux hardly changes with this applied. I have observed only four places where this optimization avoids a call into find_next_bit: In the functions return_unused_surplus_pages, alloc_fresh_huge_page, and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap. In __next_cpu a call is avoided for a 32-bit bitmap. That's it. On x86_64, 52 locations are optimized with a minimal increase in code size: Current #testing defconfig: 146 x bsf, 27 x find_next_*bit text data bss dec hex filename 5392637 846592 724424 6963653 6a41c5 vmlinux After removing the x86_64 specific optimization for find_next_*bit: 94 x bsf, 79 x find_next_*bit text data bss dec hex filename 5392358 846592 724424 6963374 6a40ae vmlinux After this patch (making the optimization generic): 146 x bsf, 27 x find_next_*bit text data bss dec hex filename 5392396 846592 724424 6963412 6a40d4 vmlinux [ tglx@linutronix.de: build fixes ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26x86: change x86 to use generic find_next_bitAlexander van Heukelum
The versions with inline assembly are in fact slower on the machines I tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid crashing the benchmark. Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size 1...512, for each possible bitmap with one bit set, for each possible offset: find the position of the first bit starting at offset. If you follow ;). Times include setup of the bitmap and checking of the results. Athlon Xeon Opteron 32/64bit x86-specific: 0m3.692s 0m2.820s 0m3.196s / 0m2.480s generic: 0m2.622s 0m1.662s 0m2.100s / 0m1.572s If the bitmap size is not a multiple of BITS_PER_LONG, and no set (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a value outside of the range [0, size]. The generic version always returns exactly size. The generic version also uses unsigned long everywhere, while the x86 versions use a mishmash of int, unsigned (int), long and unsigned long. Using the generic version does give a slightly bigger kernel, though. defconfig: text data bss dec hex filename x86-specific: 4738555 481232 626688 5846475 5935cb vmlinux (32 bit) generic: 4738621 481232 626688 5846541 59360d vmlinux (32 bit) x86-specific: 5392395 846568 724424 6963387 6a40bb vmlinux (64 bit) generic: 5392458 846568 724424 6963450 6a40fa vmlinux (64 bit) Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17include/asm-x86/bitops.h: checkpatch cleanups - formatting onlyJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: bitops asm constraint fixesJan Beulich
This (simplified) piece of code didn't behave as expected due to incorrect constraints in some of the bitops functions, when X86_FEATURE_xxx is referring to other than the first long: int test(struct cpuinfo_x86 *c) { if (cpu_has(c, X86_FEATURE_xxx)) clear_cpu_cap(c, X86_FEATURE_xxx); return cpu_has(c, X86_FEATURE_xxx); } I'd really like understand, though, what the policy of (not) having a "memory" clobber in these operations is - currently, this appears to be totally inconsistent. Also, many comments of the non-atomic functions say those may also be re-ordered - this contradicts the use of "asm volatile" in there, which again I'd like to understand. As much as all of these, using 'int' for the 'nr' parameter and 'void *' for the 'addr' one is in conflict with Documentation/atomic_ops.txt, especially because bt{,c,r,s} indeed take the bit index as signed (which hence would really need special precaution) and access the full 32 bits (if 'unsigned long' was used properly here, 64 bits for x86-64) pointed at, so invalid uses like referencing a 'char' array cannot currently be caught. Finally, the code with and without this patch relies heavily on the -fno-strict-aliasing compiler switch and I'm not certain this really is a good idea. In the light of all of this I'm sending this as RFC, as fixing the above might warrant a much bigger patch... Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: change bitwise operations to get a void parameter.Glauber de Oliveira Costa
This patch changes the bitwise operations in bitops.h to get a void pointers as a parameter. Before this patch, a lot of warnings can be seen. They're gone after it. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: partial unification of asm-x86/bitops.hJeremy Fitzhardinge
This unifies the set/clear/test bit functions of asm/bitops.h. I have not attempted to merge the bit-finding functions, since they rely on the machine word size and can't be easily restructured to work generically without a lot of #ifdefs. In particular, the 64-bit code can assume the presence of conditional move instructions, whereas 32-bit needs to be more careful. The inline assembly for the bit operations has been changed to remove explicit sizing hints on the instructions, so the assembler will pick the appropriate instruction forms depending on the architecture and the context. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-11i386/x86_64: move headers to include/asm-x86Thomas Gleixner
Move the headers to include/asm-x86 and fixup the header install make rules Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>