aboutsummaryrefslogtreecommitdiff
path: root/arch/um/kernel/process.c
AgeCommit message (Collapse)Author
2008-07-18nohz: prevent tick stop outside of the idle loopThomas Gleixner
Jack Ren and Eric Miao tracked down the following long standing problem in the NOHZ code: scheduler switch to idle task enable interrupts Window starts here ----> interrupt happens (does not set NEED_RESCHED) irq_exit() stops the tick ----> interrupt happens (does set NEED_RESCHED) return from schedule() cpu_idle(): preempt_disable(); Window ends here The interrupts can happen at any point inside the race window. The first interrupt stops the tick, the second one causes the scheduler to rerun and switch away from idle again and we end up with the tick disabled. The fact that it needs two interrupts where the first one does not set NEED_RESCHED and the second one does made the bug obscure and extremly hard to reproduce and analyse. Kudos to Jack and Eric. Solution: Limit the NOHZ functionality to the idle loop to make sure that we can not run into such a situation ever again. cpu_idle() { preempt_disable(); while(1) { tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we are in the idle loop while (!need_resched()) halt(); tick_nohz_restart_sched_tick(); <- disables NOHZ mode preempt_enable_no_resched(); schedule(); preempt_disable(); } } In hindsight we should have done this forever, but ... /me grabs a large brown paperbag. Debugged-by: Jack Ren <jack.ren@marvell.com>, Debugged-by: eric miao <eric.y.miao@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-29proc: remove proc_root from driversAlexey Dobriyan
Remove proc_root export. Creation and removal works well if parent PDE is supplied as NULL -- it worked always that way. So, one useless export removed and consistency added, some drivers created PDEs with &proc_root as parent but removed them as NULL and so on. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-25sched: add declaration of sched_tail to sched.hHarvey Harrison
Avoids sparse warnings: kernel/sched.c:2170:17: warning: symbol 'schedule_tail' was not declared. Should it be static? Avoids the need for an external declaration in arch/um/process.c Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-08aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUTDavid Howells
Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set. Not all architectures support the A.OUT binfmt, so the ELF binfmt should not be permitted to go looking for A.OUT libraries to load in such a case. Not only that, but under such conditions A.OUT core dumps are not produced either. To make this work, this patch also does the following: (1) Makes the existence of the contents of linux/a.out.h contingent on CONFIG_ARCH_SUPPORTS_AOUT. (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT core dumping code. (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This is then included only where needed. This means that this bit of arch code will be stored in the appropriate A.OUT binfmt module rather than the core kernel. (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not needed) and FRV. This patch depends on the previous patch to move STACK_TOP[_MAX] out of asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT format is available. [jdike@addtoit.com: uml: re-remove accidentally restored code] Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: style fixes in arch/um/kernelJeff Dike
Joe Perches noticed some printks in smp.c that needed fixing. While I was in there, I did the usual tidying in arch/um/kernel, which should be fairly style-clean at this point: copyright updates emacs formatting comments removal include tidying style fixes Cc: Joe Perches <joe@perches.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: remove duplicate config symbol and unused file and variablesKarol Swietlicki
Fix the repetition of the NET symbol. It was once in UML specific options and once in networking. I removed the first occurrence, as it makes more sense to me to keep it only in networking. It also removes a mostly empty file which is not used anymore and some unused variables. Signed-off-by: Karol Swietlicki <magotari@gmail.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: use ptrace directly in libc codeJeff Dike
Some register accessor cleanups - userspace() was calling restore_registers and save_registers for no reason, since userspace() is on the libc side of the house, and these add no value over calling ptrace directly init_thread_registers and get_safe_registers were the same thing, so init_thread_registers is gone Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: current.h cleanupJeff Dike
Tidy current-related stuff. There was a comment in current.h saying that current_thread was obsolete, so this patch turns all instances of current_thread into current_thread_info(). There's some simplifying of the result in arch/um/sys-i386/signal.c. current.h and thread_info also get style cleanups. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: header untanglingJeff Dike
Untangle UML headers somewhat and add some includes where they were needed explicitly, but gotten accidentally via some other header. arch/um/include/um_uaccess.h loses asm/fixmap.h because it uses no fixmap stuff and gains elf.h, because it needs FIXADDR_USER_*, and archsetjmp.h, because it needs jmp_buf. pmd_alloc_one is uninlined because it needs mm_struct, and that's inconvenient to provide in asm-um/pgtable-3level.h. elf_core_copy_fpregs is also uninlined from elf-i386.h and elf-x86_64.h, which duplicated the code anyway, to arch/um/kernel/process.c, so that the reference to current_thread doesn't pull sched.h or anything related into asm/elf.h. arch/um/sys-i386/ldt.c, arch/um/kernel/tlb.c and arch/um/kernel/skas/uaccess.c got sched.h because they dereference task_structs. Its includes of linux and asm headers got turned from "" to <>. arch/um/sys-i386/bug.c gets asm/errno.h because it needs errno constants. asm/elf-i386 gets asm/user.h because it needs user_regs_struct. asm/fixmap.h gets page.h because it needs PAGE_SIZE and PAGE_MASK and system.h for BUG_ON. asm/pgtable doesn't need sched.h. asm/processor-generic.h defined mm_segment_t, but didn't use it. So, that definition is moved to uaccess.h, which defines a bunch of mm_segment_t-related stuff. thread_info.h uses mm_segment_t, and includes uaccess.h, which causes a recursion. So, the definition is placed above the include of thread_info. in uaccess.h. thread_info.h also gets page.h because it needs PAGE_SIZE. ObCheckpatchViolationJustification - I'm not adding a typedef; I'm moving mm_segment_t from one place to another. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: move um_virt_to_physJeff Dike
This patchset makes UML build and run with three-level page tables on 32-bit hosts. This is an uncommon use case, but the code here needed fixing and cleaning up, so 32-bit three-level pages tables were tested to make sure the changes are good. Patch 1 - code movement Patch 2 - header untangling Patch 3 - style fixups in files affected so far Patch 4 - clean up use of current.h Patch 5 - fix sizes of types that are different between 2 and 3-level page tables - three-level page table support should build at this point Patch 6 - tidy (i.e. eliminate much of) the code that figures out how big the address space is Patch 7 - change um_virt_to_phys into virt_to_pte, clean its interface, and clean its (so far) one caller Patch 8 - the stub pages are covered with a VMA, allowing some nasty code to be thrown out - three-level page tables now work This patch: um_virt_to_phys only has one user, so it can be moved to the same file and made static. Its declarations in pgtable.h and ksyms.c are also gone. current_cmd was another apparent user, but it itself isn't used, so it is deleted. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: remove unused variables in the context switcherKarol Swietlicki
This patch removes a variable which was not used in two functions. Yet another code cleanup, nothing really significant. Please note that I could not test this on x86_64. I don't have the hardware for it. [ jdike - Bits of tidying around the affected code. Also, it's fine on x86_64 ] Signed-off-by: Karol Swietlicki <magotari@gmail.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: const and other tidyingWANG Cong
This patch also does some improvements for uml code. Improvements include dropping unnecessary cast, killing some unnecessary code and still some constifying for pointers etc.. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: implement get_wchanJeff Dike
Implement get_wchan - the algorithm is similar to x86. It starts with the stack pointer of the process in question and looks above that for addresses that are kernel text. The second one which isn't in the scheduler is the one that's returned. The first one is ignored because that will be UML's own context switching routine. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16uml: eliminate interrupts in the idle loopJeff Dike
Now, the idle loop now longer needs SIGALRM firing - it can just sleep for the requisite amount of time and fake a timer interrupt when it finishes. Any use of ITIMER_REAL now goes away. disable_timer only turns off ITIMER_VIRTUAL. switch_timers is no longer needed, so it, and all calls, goes away. disable_timer now returns the amount of time remaining on the timer. default_idle uses this to tell idle_sleep how long to sleep. idle_sleep will call alarm_handler if nanosleep returns 0, which is the case if it didn't return early due to an interrupt. Otherwise, it just returns. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16uml: tickless supportJeff Dike
Enable tickless support. CONFIG_TICK_ONESHOT and CONFIG_NO_HZ are enabled. itimer_clockevent gets CLOCK_EVT_FEAT_ONESHOT and an implementation of .set_next_event. CONFIG_UML_REAL_TIME_CLOCK goes away because it only makes sense when there is a clock ticking away all the time. timer_handler now just calls do_IRQ once without trying to figure out how many ticks to emulate. The idle loop now needs to turn ticking on and off. Userspace ticks keep happening as usual. However, the userspace loop keep track of when the next wakeup should happen and suppresses process ticks until that happens. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16uml: fix timer switchingJeff Dike
Fix up the switching between virtual and real timers. The idle loop sleeps, so the timer at that point must be real time. At all other times, the timer must be virtual. Even when userspace is running, and the kernel is asleep, the virtual timer is correct because the process timer will be running and the process timer will be firing. The timer switch used to be in the context switch and timer handler code. This is moved to the idle loop and the signal handler, making it much more clear why it is happening. switch_timers now returns the old timer type so that it may be restored. The signal handler uses this in order to restore the previous timer type when it returns. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16uml: rename pt_regs general-purpose register fileJeff Dike
Before the removal of tt mode, access to a register on the skas-mode side of a pt_regs struct looked like pt_regs.regs.skas.regs.regs[FOO]. This was bad enough, but it became pt_regs.regs.regs.regs[FOO] with the removal of the union from the middle. To get rid of the run of three "regs", the last field is renamed to "gp". Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16uml: style fixes pass 3Jeff Dike
Formatting changes in the files which have been changed in the course of folding foo_skas functions into their callers. These include: copyright updates header file trimming style fixes adding severity to printks These changes should be entirely non-functional. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16uml: remove code made redundant by CHOOSE_MODE removalJeff Dike
This patch makes a number of simplifications enabled by the removal of CHOOSE_MODE. There were lots of functions that looked like int foo(args){ foo_skas(args); } The bodies of foo_skas are now folded into foo, and their declarations (and sometimes entire header files) are deleted. In addition, the union uml_pt_regs, which was a union between the tt and skas register formats, is now a struct, with the tt-mode arm of the union being removed. It turns out that usr2_handler was unused, so it is gone. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16uml: throw out CHOOSE_MODEJeff Dike
The next stage after removing code which depends on CONFIG_MODE_TT is removing the CHOOSE_MODE abstraction, which provided both compile-time and run-time branching to either tt-mode or skas-mode code. This patch removes choose-mode.h and all inclusions of it, and replaces all CHOOSE_MODE invocations with the skas branch. This leaves a number of trivial functions which will be dealt with in a later patch. There are some changes in the uaccess and tls support which go somewhat beyond this and eliminate some of the now-redundant functions. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16uml: throw out CONFIG_MODE_TTJeff Dike
This patchset throws out tt mode, which has been non-functional for a while. This is done in phases, interspersed with code cleanups on the affected files. The removal is done as follows: remove all code, config options, and files which depend on CONFIG_MODE_TT get rid of the CHOOSE_MODE macro, which decided whether to call tt-mode or skas-mode code, and replace invocations with their skas portions replace all now-trivial procedures with their skas equivalents There are now a bunch of now-redundant pieces of data structures, including mode-specific pieces of the thread structure, pt_regs, and mm_context. These are all replaced with their skas-specific contents. As part of the ongoing style compliance project, I made a style pass over all files that were changed. There are three such patches, one for each phase, covering the files affected by that phase but no later ones. I noticed that we weren't freeing the LDT state associated with a process when it exited, so that's fixed in one of the later patches. The last patch is a tidying patch which I've had for a while, but which caused inexplicable crashes under tt mode. Since that is no longer a problem, this can now go in. This patch: Start getting rid of tt mode support. This patch throws out CONFIG_MODE_TT and all config options, code, and files which depend on it. CONFIG_MODE_SKAS is gone and everything that depends on it is included unconditionally. The few changed lines are in re-written Kconfig help, lines which needed something skas-related removed from them, and a few more which weren't strictly deletions. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16uml: stop specially protecting kernel stacksJeff Dike
Map all of physical memory as executable to avoid having to change stack protections during fork and exit. unprotect_stack is now called only from MODE_TT code, so it is marked as such. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16uml: Eliminate kernel allocator wrappersJeff Dike
UML had two wrapper procedures for kmalloc, um_kmalloc and um_kmalloc_atomic because the flag constants weren't available in userspace code. kern_constants.h had made kernel constants available for a long time, so there is no need for these wrappers any more. Rather, userspace code calls kmalloc directly with the userspace versions of the gfp flags. kmalloc isn't a real procedure, so I had to essentially copy the inline wrapper around __kmalloc. vmalloc also had its own wrapper for no good reason. This is now gone. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07uml: kernel_thread shouldn't panicJeff Dike
kernel_thread() should just return an error value on do_fork failure, not panic. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07uml: remove page_size()Jeff Dike
userspace code used to have to call the kernelspace function page_size() in order to determine the value of the kernel's PAGE_SIZE. Since this is now available directly from kern_constants.h as UM_KERN_PAGE_SIZE, page_size() can be deleted and calls changed to use the constant. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07uml: tidy process.cJeff Dike
Clean up arch/um/kernel/process.c: - lots of return(x); -> return x; conversions - a number of the small functions are either unused, in which case they are gone, along any declarations in a header, or could be made static. - current_pid is ifdefed on CONFIG_MODE_TT and its declaration is ifdefed on both CONFIG_MODE_TT and UML_CONFIG_MODE_TT because we don't know whether it's being used in a userspace or kernel file. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07uml: remove user_util.hJeff Dike
user_util.h isn't needed any more, so delete it and remove all includes of it. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07uml: create as-layout.hJeff Dike
This patch moves all the the symbols defined in um_arch.c, which are mostly boundaries between different parts of the UML kernel address space, to a new header, as-layout.h. There are also a few things here which aren't really related to address space layout, but which don't really have a better place to go. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-10-20[PATCH] uml: split memory allocation prototypes out of user.hPaolo 'Blaisorblade' Giarrusso
user.h is too generic a header name. I've split out allocation routines from it. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] uml: file renamingJeff Dike
Move some foo_kern.c files to foo.c now that the old foo.c files are out of the way. Also cleaned up some whitespace and an emacs formatting comment. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] uml: move libc-dependent startup and signal codeGennady Sharapov
The serial UML OS-abstraction layer patch (um/kernel dir). This moves all systemcalls from process.c file under os-Linux dir and join process.c and process_kern.c files. Signed-off-by: Gennady Sharapov <gennady.v.sharapov@intel.com> Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28[PATCH] uml: fix TT mode by reverting "use fork instead of clone"Jeff Dike
With Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Revert the following patch, because of miscompilation problems in different environments leading to UML not working *at all* in TT mode; it was merged lately in 2.6 development cycle, a little after being written, and has caused problems to lots of people; I know it's a bit too long, but it shouldn't have been merged in first place, so I still apply for inclusion in the -stable tree. Anyone using this feature currently is either using some older kernel (some reports even used 2.6.12-rc4-mm2) or using this patch, as included in my -bs patchset. For now there's not yet a fix for this patch, so for now the best thing is to drop it (which was widely reported to give a working kernel, and as such was even merged in -stable tree). "Convert the boot-time host ptrace testing from clone to fork. They were essentially doing fork anyway. This cleans up the code a bit, and makes valgrind a bit happier about grinding it." URL: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=98fdffccea6cc3fe9dba32c0fcc310bcb5d71529 Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27[PATCH] uml: add skas0 command-line optionPaolo 'Blaisorblade' Giarrusso
This adds the "skas0" parameter to force skas0 operation on SKAS3 host and shows which operating mode has been selected. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07[PATCH] uml: skas0 - separate kernel address space on stock hostsJeff Dike
UML has had two modes of operation - an insecure, slow mode (tt mode) in which the kernel is mapped into every process address space which requires no host kernel modifications, and a secure, faster mode (skas mode) in which the UML kernel is in a separate host address space, which requires a patch to the host kernel. This patch implements something very close to skas mode for hosts which don't support skas - I'm calling this skas0. It provides the security of the skas host patch, and some of the performance gains. The two main things that are provided by the skas patch, /proc/mm and PTRACE_FAULTINFO, are implemented in a way that require no host patch. For the remote address space changing stuff (mmap, munmap, and mprotect), we set aside two pages in the process above its stack, one of which contains a little bit of code which can call mmap et al. To update the address space, the system call information (system call number and arguments) are written to the stub page above the code. The %esp is set to the beginning of the data, the %eip is set the the start of the stub, and it repeatedly pops the information into its registers and makes the system call until it sees a system call number of zero. This is to amortize the cost of the context switch across multiple address space updates. When the updates are done, it SIGSTOPs itself, and the kernel process continues what it was doing. For a PTRACE_FAULTINFO replacement, we set up a SIGSEGV handler in the child, and let it handle segfaults rather than nullifying them. The handler is in the same page as the mmap stub. The second page is used as the stack. The handler reads cr2 and err from the sigcontext, sticks them at the base of the stack in a faultinfo struct, and SIGSTOPs itself. The kernel then reads the faultinfo and handles the fault. A complication on x86_64 is that this involves resetting the registers to the segfault values when the process is inside the kill system call. This breaks on x86_64 because %rcx will contain %rip because you tell SYSRET where to return to by putting the value in %rcx. So, this corrupts $rcx on return from the segfault. To work around this, I added an arch_finish_segv, which on x86 does nothing, but which on x86_64 ptraces the child back through the sigreturn. This causes %rcx to be restored by sigreturn and avoids the corruption. Ultimately, I think I will replace this with the trick of having it send itself a blocked signal which will be unblocked by the sigreturn. This will allow it to be stopped just after the sigreturn, and PTRACE_SYSCALLed without all the back-and-forth of PTRACE_SYSCALLing it through sigreturn. This runs on a stock host, so theoretically (and hopefully), tt mode isn't needed any more. We need to make sure that this is better in every way than tt mode, though. I'm concerned about the speed of address space updates and page fault handling, since they involve extra round-trips to the child. We can amortize the round-trip cost for large address space updates by writing all of the operations to the data page and having the child execute them all at the same time. This will help fork and exec, but not page faults, since they involve only one page. I can't think of any way to help page faults, except to add something like PTRACE_FAULTINFO to the host. There is PTRACE_SIGINFO, but UML doesn't use siginfo for SIGSEGV (or anything else) because there isn't enough information in the siginfo struct to handle page faults (the faulting operation type is missing). Adding that would make PTRACE_SIGINFO a usable equivalent to PTRACE_FAULTINFO. As for the code itself: - The system call stub is in arch/um/kernel/sys-$(SUBARCH)/stub.S. It is put in its own section of the binary along with stub_segv_handler in arch/um/kernel/skas/process.c. This is manipulated with run_syscall_stub in arch/um/kernel/skas/mem_user.c. syscall_stub will execute any system call at all, but it's only used for mmap, munmap, and mprotect. - The x86_64 stub calls sigreturn by hand rather than allowing the normal sigreturn to happen, because the normal sigreturn is a SA_RESTORER in UML's address space provided by libc. Needless to say, this is not available in the child's address space. Also, it does a couple of odd pops before that which restore the stack to the state it was in at the time the signal handler was called. - There is a new field in the arch mmu_context, which is now a union. This is the pid to be manipulated rather than the /proc/mm file descriptor. Code which deals with this now checks proc_mm to see whether it should use the usual skas code or the new code. - userspace_tramp is now used to create a new host process for every UML process, rather than one per UML processor. It checks proc_mm and ptrace_faultinfo to decide whether to map in the pages above its stack. - start_userspace now makes CLONE_VM conditional on proc_mm since we need separate address spaces now. - switch_mm_skas now just sets userspace_pid[0] to the new pid rather than PTRACE_SWITCH_MM. There is an addition to userspace which updates its idea of the pid being manipulated each time around the loop. This is important on exec, when the pid will change underneath userspace(). - The stub page has a pte, but it can't be mapped in using tlb_flush because it is part of tlb_flush. This is why it's required for it to be mapped in by userspace_tramp. Other random things: - The stub section in uml.lds.S is page aligned. This page is written out to the backing vm file in setup_physmem because it is mapped from there into user processes. - There's some confusion with TASK_SIZE now that there are a couple of extra pages that the process can't use. TASK_SIZE is considered by the elf code to be the usable process memory, which is reasonable, so it is decreased by two pages. This confuses the definition of USER_PGDS_IN_LAST_PML4, making it too small because of the rounding down of the uneven division. So we round it to the nearest PGDIR_SIZE rather than the lower one. - I added a missing PT_SYSCALL_ARG6_OFFSET macro. - um_mmu.h was made into a userspace-usable file. - proc_mm and ptrace_faultinfo are globals which say whether the host supports these features. - There is a bad interaction between the mm.nr_ptes check at the end of exit_mmap, stack randomization, and skas0. exit_mmap will stop freeing pages at the PGDIR_SIZE boundary after the last vma. If the stack isn't on the last page table page, the last pte page won't be freed, as it should be since the stub ptes are there, and exit_mmap will BUG because there is an unfreed page. To get around this, TASK_SIZE is set to the next lowest PGDIR_SIZE boundary and mm->nr_ptes is decremented after the calls to init_stub_pte. This ensures that we know the process stack (and all other process mappings) will be below the top page table page, and thus we know that mm->nr_ptes will be one too many, and can be decremented. Things that need fixing: - We may need better assurrences that the stub code is PIC. - The stub pte is set up in init_new_context_skas. - alloc_pgdir is probably the right place. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-13[PATCH] uml: use fork instead of cloneJeff Dike
Convert the boot-time host ptrace testing from clone to fork. They were essentially doing fork anyway. This cleans up the code a bit, and makes valgrind a bit happier about grinding it. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-13[PATCH] uml: remove duplicate includesJeff Dike
A few files include the same header twice. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05[PATCH] uml: Fix SIGWINCH relayingJeff Dike
This makes SIGWINCH work again, and fixes a couple of SIGWINCH-associated crashes. First, the sigio thread disables SIGWINCH because all hell breaks loose if it ever gets one and tries to call the signal handling code. Second, there was a problem with deferencing tty structs after they were freed. The SIGWINCH support for a tty wasn't being turned off or freed after the tty went away. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!