aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2005-09-05[PATCH] uml: fault handler micro-cleanupsPaolo 'Blaisorblade' Giarrusso
Avoid chomping low bits of address for functions doing it by themselves, fix whitespace, add a correctness checking. I did this for remap-file-pages protection support, it was useful on its own too. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: fixes performance regression in activate_mm and thus exec()Paolo 'Blaisorblade' Giarrusso
Normally, activate_mm() is called from exec(), and thus it used to be a no-op because we use a completely new "MM context" on the host (for instance, a new process), and so we didn't need to flush any "TLB entries" (which for us are the set of memory mappings for the host process from the virtual "RAM" file). Kernel threads, instead, are usually handled in a different way. So, when for AIO we call use_mm(), things used to break and so Benjamin implemented activate_mm(). However, that is only needed for AIO, and could slow down exec() inside UML, so be smart: detect being called for AIO (via PF_BORROWED_MM) and do the full flush only in that situation. Comment also the caller so that people won't go breaking UML without noticing. I also rely on the caller's locks for testing current->flags. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> CC: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: fix SIGWINCH handler race while waiting for signals.Bodo Stroesser
If a SIGWINCH comes in, while winch_thread() isn't waiting in wait(), winch_thread could miss signals. It isn't very probable, that anyone will see this causing trouble, as it would need a very special timing, that a missed SIGWINCH results in a wrong window size. So, this is a minor problem. But why not fix, as it can be done so easy? Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: workaround GDB problems on debuggingPaolo 'Blaisorblade' Giarrusso
Apparently, GDB gets confused when we do an execvp() on ourselves. Since it's simply done to allocate further space for command line arguments (which we'll use to allow gathering the startup command line for guest processes through the host), allow the user to disable that to get a debuggable UML binary. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: SYSEMU: slight cleanup and speedupPaolo 'Blaisorblade' Giarrusso
As a follow-up to "UML Support - Ptrace: adds the host SYSEMU support, for UML and general usage" (i.e. uml-support-* in current mm). Avoid unconditionally jumping to work_pending and code copying, just reuse the already existing resume_userspace path. One interesting note, from Charles P. Wright, suggested that the API is improvable with no downsides for UML (except that it will have to support yet another host API, since dropping support for the current API, for UML, is not reasonable from users' point of view). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> CC: Charles P. Wright <cwright@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] SYSEMU: fix sysaudit / singlestep interactionBodo Stroesser
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> This is simply an adjustment for "Ptrace - i386: fix Syscall Audit interaction with singlestep" to work on top of SYSEMU patches, too. On this patch, I have some doubts: I wonder why we need to alter that way ptrace_disable(). I left the patch this way because it has been extensively tested, but I don't understand the reason. The current PTRACE_DETACH handling simply clears child->ptrace; actually this is not enough because entry.S just looks at the thread_flags; actually, do_syscall_trace checks current->ptrace but I don't think depending on that is good, at least for performance, so I think the clearing is done elsewhere. For instance, on PTRACE_CONT it's done, but doing PTRACE_DETACH without PTRACE_CONT is possible (and happens when gdb crashes and one kills it manually). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> CC: Roland McGrath <roland@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] Uml support: add PTRACE_SYSEMU_SINGLESTEP option to i386Bodo Stroesser
This patch implements the new ptrace option PTRACE_SYSEMU_SINGLESTEP, which can be used by UML to singlestep a process: it will receive SINGLESTEP interceptions for normal instructions and syscalls, but syscall execution will be skipped just like with PTRACE_SYSEMU. Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] Uml support: reorganize PTRACE_SYSEMU supportBodo Stroesser
With this patch, we change the way we handle switching from PTRACE_SYSEMU to PTRACE_{SINGLESTEP,SYSCALL}, to free TIF_SYSCALL_EMU from double use as a preparation for PTRACE_SYSEMU_SINGLESTEP extension, without changing the behavior of the host kernel. Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and ↵Laurent Vivier
general usage Jeff Dike <jdike@addtoit.com>, Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>, Bodo Stroesser <bstroesser@fujitsu-siemens.com> Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL except that the kernel does not execute the requested syscall; this is useful to improve performance for virtual environments, like UML, which want to run the syscall on their own. In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry and on exit, and each time you also have two context switches; with SYSEMU you avoid the 2nd stop and so save two context switches per syscall. Also, some architectures don't have support in the host for changing the syscall number via ptrace(), which is currently needed to skip syscall execution (UML turns any syscall into getpid() to avoid it being executed on the host). Fixing that is hard, while SYSEMU is easier to implement. * This version of the patch includes some suggestions of Jeff Dike to avoid adding any instructions to the syscall fast path, plus some other little changes, by myself, to make it work even when the syscall is executed with SYSENTER (but I'm unsure about them). It has been widely tested for quite a lot of time. * Various fixed were included to handle the various switches between various states, i.e. when for instance a syscall entry is traced with one of PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit. Basically, this is done by remembering which one of them was used even after the call to ptrace_notify(). * We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP to make do_syscall_trace() notice that the current syscall was started with SYSEMU on entry, so that no notification ought to be done in the exit path; this is a bit of a hack, so this problem is solved in another way in next patches. * Also, the effects of the patch: "Ptrace - i386: fix Syscall Audit interaction with singlestep" are cancelled; they are restored back in the last patch of this series. Detailed descriptions of the patches doing this kind of processing follow (but I've already summed everything up). * Fix behaviour when changing interception kind #1. In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag only after doing the debugger notification; but the debugger might have changed the status of this flag because he continued execution with PTRACE_SYSCALL, so this is wrong. This patch fixes it by saving the flag status before calling ptrace_notify(). * Fix behaviour when changing interception kind #2: avoid intercepting syscall on return when using SYSCALL again. A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL crashes. The problem is in arch/i386/kernel/entry.S. The current SYSEMU patch inhibits the syscall-handler to be called, but does not prevent do_syscall_trace() to be called after this for syscall completion interception. The appended patch fixes this. It reuses the flag TIF_SYSCALL_EMU to remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since the flag is unused in the depicted situation. * Fix behaviour when changing interception kind #3: avoid intercepting syscall on return when using SINGLESTEP. When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had problems with singlestepping on UML in SKAS with SYSEMU. It looped receiving SIGTRAPs without moving forward. EIP of the traced process was the same for all SIGTRAPs. What's missing is to handle switching from PTRACE_SYSCALL_EMU to PTRACE_SINGLESTEP in a way very similar to what is done for the change from PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE. I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is notified and then wake ups the process; the syscall is executed (or skipped, when do_syscall_trace() returns 0, i.e. when using PTRACE_SYSEMU), and do_syscall_trace() is called again. Since we are on the return path of a SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL), we must still avoid notifying the parent of the syscall exit. Now, this behaviour is extended even to resuming with PTRACE_SINGLESTEP. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] Ptrace/i386: fix "syscall audit" interaction with singlestepBodo Stroesser
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Avoid giving two traps for singlestep instead of one, when syscall auditing is enabled. In fact no singlestep trap is sent on syscall entry, only on syscall exit, as can be seen in entry.S: # Note that in this mask _TIF_SINGLESTEP is not tested !!! <<<<<<<<<<<<<< testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),TI_flags(%ebp) jnz syscall_trace_entry ... syscall_trace_entry: ... call do_syscall_trace But auditing a SINGLESTEP'ed process causes do_syscall_trace to be called, so the tracer will get one more trap on the syscall entry path, which it shouldn't. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> CC: Roland McGrath <roland@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: Rename Kconfig files to be like the other archesJeff Dike
To the extent that sub-Kconfig files exist elsewhere in the tree, they are named Kconfig.foo, rather than the Kconfig_foo that UML has. This patch brings the names in line with the rest of the tree. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: remove debugging code from page fault pathJeff Dike
This eliminates the segfault info ring buffer, which added a system call to each page fault, and which hadn't been useful for debugging in ages. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] arch/cris/Kconfig.debug: use lib/Kconfig.debugAdrian Bunk
This patch converts arch/cris/Kconfig.debug to using lib/Kconfig.debug. This should fix a compile error in 2.6.13-rc4 caused by a missing CONFIG_LOG_BUF_SHIFT definition. While I was editing this file, I also converted some spaces to tabs. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Mikael Starvik <starvik@axis.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] m68k: cleanup inline mem functionsRoman Zippel
Use the builtin functions for memset/memclr/memcpy, special optimizations for page operations have dedicated functions now. Uninline memmove/memchr and move all functions into a single file and clean it up a little. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] m68k: move cache functions into separate fileRoman Zippel
Move a few cache functions into its own file and fix flush_icache_range() so it can handle both kernel and user addresses correctly (assuming context is set correctly). Turn copy_to_user_page/copy_from_user_page into inline functions and add a missing cache flush. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] m68k: sys_ptrace cleanupRoman Zippel
- create helper function singlestep_disable() - move variable definitions to the top of the function - use "out_eio" label as common error destination - don't clear failure value for PTRACE_SETREGS/PTRACE_GETREGS Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] m68k: indent sys_ptraceRoman Zippel
This reformats and properly indents sys_ptrace (only whitespace changes). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] add suspend/resume for timerShaohua Li
The timers lack .suspend/.resume methods. Because of this, jiffies got a big compensation after a S3 resume. And then softlockup watchdog reports an oops. This occured with HPET enabled, but it's also possible for other timers. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] pm: clean up /sys/power/diskPavel Machek
Clean code up a bit, and only show suspend to disk as available when it is configured in. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] pm: fix process freezingPavel Machek
If process freezing fails, some processes are frozen, and rest are left in "were asked to be frozen" state. Thats wrong, we should leave it in some consistent state. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] swsusp: fix error handling and cleanupsPavel Machek
Drop printing during normal boot (when no image exists in swap), print message when drivers fail, fix error paths and consolidate near-identical functions in disk.c (and functions with just one statement). Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] swsusp: add locking to software_resumeShaohua Li
It is trying to protect swsusp_resume_device and software_resume() from two users banging it from userspace at the same time. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] swsusup with dm-crypt mini howtoAndreas Steinmetz
The attached patch contains a mini howto for using dm-crypt together with swsusp. Signed-off-by: Andreas Steinmetz <ast@domdv.de> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] swsusp: simpler calculation of number of pages in PBE listMichal Schmidt
The function calc_nr uses an iterative algorithm to calculate the number of pages needed for the image and the pagedir. Exactly the same result can be obtained with a one-line expression. Note that this was even proved correct ;-). Signed-off-by: Michal Schmidt <xschmi00@stud.feec.vutbr.cz> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] swsusp: prevent disks from spinning down and upMichal Schmidt
Stop the disks from spinning down and up on suspend. Signed-off-by: Michal Schmidt <xschmi00@stud.feec.vutbr.cz> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] encrypt suspend data for easy wipingAndreas Steinmetz
The patch protects from leaking sensitive data after resume from suspend. During suspend a temporary key is created and this key is used to encrypt the data written to disk. When, during resume, the data was read back into memory the temporary key is destroyed which simply means that all data written to disk during suspend are then inaccessible so they can't be stolen lateron. Think of the following: you suspend while an application is running that keeps sensitive data in memory. The application itself prevents the data from being swapped out. Suspend, however, must write these data to swap to be able to resume lateron. Without suspend encryption your sensitive data are then stored in plaintext on disk. This means that after resume your sensitive data are accessible to all applications having direct access to the swap device which was used for suspend. If you don't need swap after resume these data can remain on disk virtually forever. Thus it can happen that your system gets broken in weeks later and sensitive data which you thought were encrypted and protected are retrieved and stolen from the swap device. Signed-off-by: Andreas Steinmetz <ast@domdv.de> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] fix pm_message_t stuff in -mm treePavel Machek
This should bits from -mm tree that are affected by pm_message_t conversion. [I'm not 100% sure I got all of them, but I certainly got all the errors on make allyesconfig build, and most of warnings, too. I'll go through the buildlog tommorow and fix any remaining bits]. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] swsusp: switch pm_message_t to structPavel Machek
This adds type-checking to pm_message_t, so that people can't confuse it with int or u32. It also allows us to fix "disk yoyo" during suspend (disk spinning down/up/down). [We've tried that before; since that cpufreq problems were fixed and I've tried make allyes config and fixed resulting damage.] Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] swsusp: fix remaining u32 vs. pm_message_t confusionPavel Machek
Fix remaining bits of u32 vs. pm_message confusion. Should not break anything. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] suspend: update documentationPavel Machek
Update suspend documentation. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] ISA DMA suspend for x86_64Pierre Ossman
Reset the ISA DMA controller into a known state after a suspend. Primary concern was reenabling the cascading DMA channel (4). Signed-off-by: Pierre Ossman <drzeus@drzeus.cx> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] ISA DMA suspend for i386Pierre Ossman
Reset the ISA DMA controller into a known state after a suspend. Primary concern was reenabling the cascading DMA channel (4). Signed-off-by: Pierre Ossman <drzeus@drzeus.cx> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] remove busywait in refrigeratorPavel Machek
This should make refrigerator sleep properly, not busywait after the first schedule() returns. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] unify x86/x86-64 semaphore codeBenjamin LaHaise
This patch moves the common code in x86 and x86-64's semaphore.c into a single file in lib/semaphore-sleepers.c. The arch specific asm stubs are left in the arch tree (in semaphore.c for i386 and in the asm for x86-64). There should be no changes in code/functionality with this patch. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386 boottime for_each_cpu brokenZwane Mwaikambo
for_each_cpu walks through all processors in cpu_possible_map, which is defined as cpu_callout_map on i386 and isn't initialised until all processors have been booted. This breaks things which do for_each_cpu iterations early during boot. So, define cpu_possible_map as a bitmap with NR_CPUS bits populated. This was triggered by a patch i'm working on which does alloc_percpu before bringing up secondary processors. From: Alexander Nyberg <alexn@telia.com> i386-boottime-for_each_cpu-broken.patch i386-boottime-for_each_cpu-broken-fix.patch The SMP version of __alloc_percpu checks the cpu_possible_map before allocating memory for a certain cpu. With the above patches the BSP cpuid is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor machines (as soon as someone tries to dereference something allocated via __alloc_percpu, which in fact is never allocated since the cpu is not set in cpu_possible_map). Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: encapsulate copying of pgd entriesZachary Amsden
Add a clone operation for pgd updates. This helps complete the encapsulation of updates to page tables (or pages about to become page tables) into accessor functions rather than using memcpy() to duplicate them. This is both generally good for consistency and also necessary for running in a hypervisor which requires explicit updates to page table entries. The new function is: clone_pgd_range(pgd_t *dst, pgd_t *src, int count); dst - pointer to pgd range anwhere on a pgd page src - "" count - the number of pgds to copy. dst and src can be on the same page, but the range must not overlap and must not cross a page boundary. Note that I ommitted using this call to copy pgd entries into the software suspend page root, since this is not technically a live paging structure, rather it is used on resume from suspend. CC'ing Pavel in case he has any feedback on this. Thanks to Chris Wright for noticing that this could be more optimal in PAE compiles by eliminating the memset. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86 NMI: better support for debuggersGeorge Anzinger
This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. We also change the nmi watchdog to carry on if die_nmi returns. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with "other" cpus held. Signed-off-by: George Anzinger<george@mvista.com> Cc: Keith Owens <kaos@ocs.com.au> Signed-off-by: George Anzinger <george@mvista.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: introduce a write acessor for updating the current LDTZachary Amsden
Introduce a write acessor for updating the current LDT. This is required for hypervisors like Xen that do not allow LDT pages to be directly written. Testing - here's a fun little LDT test that can be trivially modified to test limits as well. /* * Copyright (c) 2005, Zachary Amsden (zach@vmware.com) * This is licensed under the GPL. */ #include <stdio.h> #include <signal.h> #include <asm/ldt.h> #include <asm/segment.h> #include <sys/types.h> #include <unistd.h> #include <sys/mman.h> #define __KERNEL__ #include <asm/page.h> void main(void) { struct user_desc desc; char *code; unsigned long long tsc; code = (char *)mmap(0, 8192, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); desc.entry_number = 0; desc.base_addr = code; desc.limit = 1; desc.seg_32bit = 1; desc.contents = MODIFY_LDT_CONTENTS_CODE; desc.read_exec_only = 0; desc.limit_in_pages = 1; desc.seg_not_present = 0; desc.useable = 1; if (modify_ldt(1, &desc, sizeof(desc)) != 0) { perror("modify_ldt"); } printf("code base is 0x%08x\n", (unsigned)code); code[0x0ffe] = 0x0f; /* rdtsc */ code[0x0fff] = 0x31; code[0x1000] = 0xcb; /* lret */ __asm__ __volatile("lcall $7,$0xffe" : "=A" (tsc)); printf("TSC is 0x%016llx\n", tsc); } Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: remove redundant TSS clearingZachary Amsden
When reviewing GDT updates, I found the code: set_tss_desc(cpu,t); /* This just modifies memory; ... */ per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &= 0xfffffdff; This second line is unnecessary, since set_tss_desc() has already cleared the busy bit. Commented disassembly, line 1: c028b8bd: 8b 0c 86 mov (%esi,%eax,4),%ecx c028b8c0: 01 cb add %ecx,%ebx c028b8c2: 8d 0c 39 lea (%ecx,%edi,1),%ecx => %ecx = per_cpu(cpu_gdt_table, cpu) c028b8c5: 8d 91 80 00 00 00 lea 0x80(%ecx),%edx => %edx = &per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS] c028b8cb: 66 c7 42 00 73 20 movw $0x2073,0x0(%edx) c028b8d1: 66 89 5a 02 mov %bx,0x2(%edx) c028b8d5: c1 cb 10 ror $0x10,%ebx c028b8d8: 88 5a 04 mov %bl,0x4(%edx) c028b8db: c6 42 05 89 movb $0x89,0x5(%edx) => ((char *)%edx)[5] = 0x89 (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] = 0x89 c028b8df: c6 42 06 00 movb $0x0,0x6(%edx) c028b8e3: 88 7a 07 mov %bh,0x7(%edx) c028b8e6: c1 cb 10 ror $0x10,%ebx => other bits Commented disassembly, line 2: c028b8e9: 8b 14 86 mov (%esi,%eax,4),%edx c028b8ec: 8d 04 3a lea (%edx,%edi,1),%eax => %eax = per_cpu(cpu_gdt_table, cpu) c028b8ef: 81 a0 84 00 00 00 ff andl $0xfffffdff,0x84(%eax) => per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &= 0xfffffdff; (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] &= 0xfd Note that (0x89 & ~0xfd) == 0; i.e, set_tss_desc(cpu,t) has already stored the type field in the GDT with the busy bit clear. Eliminating redundant and obscure code is always a good thing; in fact, I pointed out this same optimization many moons ago in arch/i386/setup.c, back when it used to be called that. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: make IOPL explicitZachary Amsden
The pushf/popf in switch_to are ONLY used to switch IOPL. Making this explicit in C code is more clear. This pushf/popf pair was added as a bugfix for leaking IOPL to unprivileged processes when using sysenter/sysexit based system calls (sysexit does not restore flags). When requesting an IOPL change in sys_iopl(), it is just as easy to change the current flags and the flags in the stack image (in case an IRET is required), but there is no reason to force an IRET if we came in from the SYSENTER path. This change is the minimal solution for supporting a paravirtualized Linux kernel that allows user processes to run with I/O privilege. Other solutions require radical rewrites of part of the low level fault / system call handling code, or do not fully support sysenter based system calls. Unfortunately, this added one field to the thread_struct. But as a bonus, on P4, the fastest time measured for switch_to() went from 312 to 260 cycles, a win of about 17% in the fast case through this performance critical path. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: privilege cleanupZachary Amsden
Privilege checking cleanup. Originally, these diffs were much greater, but recent cleanups in Linux have already done much of the cleanup. I added some explanatory comments in places where the reasoning behind certain tests is rather subtle. Also, in traps.c, we can skip the user_mode check in handle_BUG(). The reason is, there are only two call chains - one via die_if_kernel() and one via do_page_fault(), both entering from die(). Both of these paths already ensure that a kernel mode failure has happened. Also, the original check here, if (user_mode(regs)) was insufficient anyways, since it would not rule out BUG faults from V8086 mode execution. Saving the %ss segment in show_regs() rather than assuming a fixed value also gives better information about the current kernel state in the register dump. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: more asm cleanupsZachary Amsden
Some more assembler cleanups I noticed along the way. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: fix incorrect TSS entry for LDTIngo Molnar
Noticed by Chuck Ebbert: the .ldt entry of the TSS was set up incorrectly. It never mattered since this was a leftover from old times, so remove it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: use set_pte macros in a couple places where they were missingZachary Amsden
Also, setting PDPEs in PAE mode does not require atomic operations, since the PDPEs are cached by the processor, and only reloaded on an explicit or implicit reload of CR3. Since the four PDPEs must always be present in an active root, and the kernel PDPE is never updated, we are safe even from SMIs and interrupts / NMIs using task gates (which reload CR3). Actually, much of this is moot, since the user PDPEs are never updated either, and the only usage of task gates is by the doublefault handler. It appears the only place PGDs get updated in PAE mode is in init_low_mappings() / zap_low_mapping() for initial page table creation and recovery from ACPI sleep state, and these sites are safe by inspection. Getting rid of the cmpxchg8b saves code space and 720 cycles in pgd_alloc on P4. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: load_tls() fixZachary Amsden
Subtle fix: load_TLS has been moved after saving %fs and %gs segments to avoid creating non-reversible segments. This could conceivably cause a bug if the kernel ever needed to save and restore fs/gs from the NMI handler. It currently does not, but this is the safest approach to avoiding fs/gs corruption. SMIs are safe, since SMI saves the descriptor hidden state. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: generate better code around descriptor update and access functionsZachary Amsden
GCC can generate better code around descriptor update and access functions when there is not an explicit "eax" register constraint. Testing: You won't boot if this is messed up, since the TSS descriptor will be corrupted. Verified the assembler and booted. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task ↵Zachary Amsden
register management i386 inline assembler cleanup. This change encapsulates descriptor and task register management. Also, it is possible to improve assembler generation in two cases; savesegment may store the value in a register instead of a memory location, which allows GCC to optimize stack variables into registers, and MOV MEM, SEG is always a 16-bit write to memory, making the casting in math-emu unnecessary. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: cleanup serialize msrZachary Amsden
i386 arch cleanup. Introduce the serialize macro to serialize processor state. Why the microcode update needs it I am not quite sure, since wrmsr() is already a serializing instruction, but it is a microcode update, so I will keep the semantic the same, since this could be a timing workaround. As far as I can tell, this has always been there since the original microcode update source. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: inline asm cleanupZachary Amsden
i386 Inline asm cleanup. Use cr/dr accessor functions. Also, a potential bugfix. Also, some CR accessors really should be volatile. Reads from CR0 (numeric state may change in an exception handler), writes to CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction re-ordering. I did not add memory clobber to CR3 / CR4 / CR0 updates, as it was not there to begin with, and in no case should kernel memory be clobbered, except when doing a TLB flush, which already has memory clobber. I noticed that page invalidation does not have a memory clobber. I can't find a bug as a result, but there is definitely a potential for a bug here: #define __flush_tlb_single(addr) \ __asm__ __volatile__("invlpg %0": :"m" (*(char *) addr)) Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: clean up vDSO alignment paddingRoland McGrath
This makes the vDSO use nops for all its padding around instructions, rather than sometimes zeros, and nop-pads the end of the area containing instructions to a 32-byte cache line, to keep text and data in separate lines. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>