Age | Commit message (Collapse) | Author |
|
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().
Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.
Eric's original description:
There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.
Introduce is_init to capture this case.
With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Make PROT_WRITE imply PROT_READ for a number of architectures which don't
support write only in hardware.
While looking at this, I noticed that some architectures which do not
support write only mappings already take the exact same approach. For
example, in arch/alpha/mm/fault.c:
"
if (cause < 0) {
if (!(vma->vm_flags & VM_EXEC))
goto bad_area;
} else if (!cause) {
/* Allow reads even for write-only mappings */
if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
goto bad_area;
} else {
if (!(vma->vm_flags & VM_WRITE))
goto bad_area;
}
"
Thus, this patch brings other architectures which do not support write only
mappings in-line and consistent with the rest. I've verified the patch on
ia64, x86_64 and x86.
Additional discussion:
Several architectures, including x86, can not support write-only mappings.
The pte for x86 reserves a single bit for protection and its two states are
read only or read/write. Thus, write only is not supported in h/w.
Currently, if i 'mmap' a page write-only, the first read attempt on that page
creates a page fault and will SEGV. That check is enforced in
arch/blah/mm/fault.c. However, if i first write that page it will fault in
and the pte will be set to read/write. Thus, any subsequent reads to the page
will succeed. It is this inconsistency in behavior that this patch is
attempting to address. Furthermore, if the page is swapped out, and then
brought back the first read will also cause a SEGV. Thus, any arbitrary read
on a page can potentially result in a SEGV.
According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
implementation may also allow read access." Also as mentioned, some
archtectures, such as alpha, shown above already take the approach that i am
suggesting.
The counter-argument to this raised by Arjan, is that the kernel is enforcing
the write only mapping the best it can given the h/w limitations. This is
true, however Alan Cox, and myself would argue that the inconsitency in
behavior, that is applications can sometimes work/sometimes fails is highly
undesireable. If you read through the thread, i think people, came to an
agreement on the last patch i posted, as nobody has objected to it...
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Size zones and holes in an architecture independent manner for Power.
[judith@osdl.org: build fix]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
This adds a vdso_base element to the mm_context_t for 32-bit compiles
(both for ARCH=powerpc and ARCH=ppc). This fixes the compile errors
that have been reported in arch/powerpc/kernel/signal_32.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
Fix 44x and BookE page fault handler to correctly lock PTE before
trying to pte_update() it, otherwise this PTE might be swapped out
after pte_present() check but before pte_uptdate() call, resulting in
corrupted PTE. This can happen with enabled preemption and low memory
condition.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
32-bit CHRP machines are now supported only in arch/powerpc, as are
all 64-bit PowerPC processors. This means that we don't use
Open Firmware on any platform in arch/ppc any more.
This makes PReP support a single-platform option like every other
platform support option in arch/ppc now, thus CONFIG_PPC_MULTIPLATFORM
is gone from arch/ppc. CONFIG_PPC_PREP is the option that selects
PReP support and is generally what has replaced
CONFIG_PPC_MULTIPLATFORM within arch/ppc.
_machine is all but dead now, being #defined to 0.
Updated Makefiles, comments and Kconfig options generally to reflect
these changes.
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (78 commits)
[PATCH] powerpc: Add FSL SEC node to documentation
[PATCH] macintosh: tidy-up driver_register() return values
[PATCH] powerpc: tidy-up of_register_driver()/driver_register() return values
[PATCH] powerpc: via-pmu warning fix
[PATCH] macintosh: cleanup the use of i2c headers
[PATCH] powerpc: dont allow old RTC to be selected
[PATCH] powerpc: make powerbook_sleep_grackle static
[PATCH] powerpc: Fix warning in add_memory
[PATCH] powerpc: update mailing list addresses
[PATCH] powerpc: Remove calculation of io hole
[PATCH] powerpc: iseries: Add bootargs to /chosen
[PATCH] powerpc: iseries: Add /system-id, /model and /compatible
[PATCH] powerpc: Add strne2a() to convert a string from EBCDIC to ASCII
[PATCH] powerpc: iseries: Make more stuff static in platforms/iseries/mf.c
[PATCH] powerpc: iseries: Remove pointless iSeries_(restart|power_off|halt)
[PATCH] powerpc: iseries: mf related cleanups
[PATCH] powerpc: Replace platform_is_lpar() with a firmware feature
[PATCH] powerpc: trivial: Cleanup whitespace in cputable.h
[PATCH] powerpc: Remove unused iommu_off logic from pSeries_init_early()
[PATCH] powerpc: Unconfuse htab_bolt_mapping() callers
...
|
|
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().
This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree. I think this accomplises
everything wanted, though there might be a few references I missed.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
The following patch generalizes PPC44x_PIN_SIZE by changing it to
PPC_PIN_SIZE, which can be defined by any sub-arch to automatically adjust
VMALLOC_START.
Define PPC_PIN_SIZE on 8xx, avoiding potential conflicts with the
pinned space.
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
This makes it possible to build kernels for PReP and/or CHRP
with ARCH=ppc by removing the (non-building) powermac support.
It's now also possible to select PReP and CHRP independently.
Powermac users should now build with ARCH=powerpc instead of
ARCH=ppc. (This does mean that it is no longer possible to
build a 32-bit kernel for a G5.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
Currently 8xx fails to boot due to endless pagefaults.
Seems the bug is exposed by the change which avoids flushing the
TLB when not necessary (in case the pte has not changed), introduced
recently:
__handle_mm_fault():
entry = pte_mkyoung(entry);
if (!pte_same(old_entry, entry)) {
ptep_set_access_flags(vma, address, pte, entry, write_access);
update_mmu_cache(vma, address, entry);
lazy_mmu_prot_update(entry);
} else {
/*
* This is needed only for protection faults but the arch code
* is not yet telling us if this is a protection fault or not.
* This still avoids useless tlb flushes for .text page faults
* with threads.
*/
if (write_access)
flush_tlb_page(vma, address);
}
The "update_mmu_cache()" call was unconditional before, which caused the TLB
to be flushed by:
if (pfn_valid(pfn)) {
struct page *page = pfn_to_page(pfn);
if (!PageReserved(page)
&& !test_bit(PG_arch_1, &page->flags)) {
if (vma->vm_mm == current->active_mm) {
#ifdef CONFIG_8xx
/* On 8xx, cache control instructions (particularly
* "dcbst" from flush_dcache_icache) fault as write
* operation if there is an unpopulated TLB entry
* for the address in question. To workaround that,
* we invalidate the TLB here, thus avoiding dcbst
* misbehaviour.
*/
_tlbie(address);
#endif
__flush_dcache_icache((void *) address);
} else
flush_dcache_icache_page(page);
set_bit(PG_arch_1, &page->flags);
}
Which worked to due to pure luck: PG_arch_1 was always unset before, but
now it isnt.
The root of the problem are the changes against the 8xx TLB handlers introduced
during v2.6. What happens is the TLBMiss handlers load the zeroed pte into
the TLB, causing the TLBError handler to be invoked (thats two TLB faults per
pagefault), which then jumps to the generic MM code to setup the pte.
The bug is that the zeroed TLB is not invalidated (the same reason
for the "dcbst" misbehaviour), resulting in infinite TLBError faults.
The "two exception" approach requires a TLB flush (to nuke the zeroed TLB)
at each PTE update for correct behaviour:
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
Changed jobs and the Freescale address is no longer valid.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
First step in pushing down the page_table_lock. init_mm.page_table_lock has
been used throughout the architectures (usually for ioremap): not to serialize
kernel address space allocation (that's usually vmlist_lock), but because
pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
and drop it when allocating a new one, to check lest a racing task already
did. Similarly no page_table_lock in vmalloc's map_vm_area.
Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
user mms, which are converted only by a later patch, for now they have to lock
differently according to whether or not it's init_mm.
If sources get muddled, there's a danger that an arch source taking
init_mm.page_table_lock will be mixed with common source also taking it (or
neither take it). So break the rules and make another change, which should
break the build for such a mismatch: remove the redundant mm arg from
pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
took page_table_lock for no good reason.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Change the phys_mem_access_prot() function to take a pfn instead of an
address. This allows mmap64() to work on /dev/mem for addresses above 4G
on 32-bit architectures. We start with a pfn in mmap_mem(), so there's no
need to convert to an address; in fact, it's actively bad, since the
conversion can overflow when the address is above 4G.
Similarly fix the ppc32 page_is_ram() function to avoid a conversion to an
address by directly comparing to max_pfn. Working with max_pfn instead of
high_memory fixes page_is_ram() to give the right answer for highmem pages.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
... instead of exporting it in arch/*/kernel/ppc_ksyms.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
Here is a new patch that removes all notion of the pmac, prep,
chrp and openfirmware initialization sections, and then unifies
the sections.h files without those __pmac, etc, sections identifiers
cluttering things up.
Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
This is a patch that I have had in my tree for ages. If init causes
an exception that raises a signal, such as a SIGSEGV, SIGILL or
SIGFPE, and it hasn't registered a handler for it, we don't deliver
the signal, since init doesn't get any signals that it doesn't have a
handler for. But that means that we just return to userland and
generate the same exception again immediately. With this patch we
print a message and kill init in this situation.
This is very useful when you have a bug in the kernel that means that
init doesn't get as far as executing its first instruction. :)
Without this patch the system hangs when it gets to starting the
userland init; with it you at least get a message giving you a clue
about what has gone wrong.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Delete obsoleted parts form arch makefiles and rename to asm-offsets.h
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
|
Here is the default configuration for Marvell EV64360BP board. It has been
tested on the board.
Signed-off-by: Lee Nicks <allinux@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
flush_dcache_icache_page() will be called on an instruction page fault. We
can't sleep in the fault handler, so use kmap_atomic() instead of just
kmap() for the Book-E case.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
On 8xx, in the case where a pagefault happens for a process who's not
the owner of the vma in question (ptrace for instance), the flush
operation is performed via the physical address.
Unfortunately, that results in a strange, unexplainable "icbi"
instruction fault, most likely due to a CPU bug (see oops below).
Avoid that by flushing the page via its kernel virtual address.
Oops: kernel access of bad area, sig: 11 [#2]
NIP: C000543C LR: C000B060 SP: C0F35DF0 REGS: c0f35d40 TRAP: 0300 Not tainted
MSR: 00009022 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 10
DAR: 00000010, DSISR: C2000000
TASK = c0ea8430[761] 'gdbserver' THREAD: c0f34000
Last syscall: 26
GPR00: 00009022 C0F35DF0 C0EA8430 00F59000 00000100 FFFFFFFF 00F58000
00000001
GPR08: C021DAEF C0270000 00009032 C0270000 22044024 10025428 01000800
00000001
GPR16: 007FFF3F 00000001 00000000 7FBC6AC0 00F61022 00000001 C0839300
C01E0000
GPR24: 00CD0889 C082F568 3000AC18 C02A7A00 C0EA15C8 00F588A9 C02ACB00
C02ACB00
NIP [c000543c] __flush_dcache_icache_phys+0x38/0x54
LR [c000b060] flush_dcache_icache_page+0x20/0x30
Call trace:
[c000b154] update_mmu_cache+0x7c/0xa4
[c005ae98] do_wp_page+0x460/0x5ec
[c005c8a0] handle_mm_fault+0x7cc/0x91c
[c005ccec] get_user_pages+0x2fc/0x65c
[c0027104] access_process_vm+0x9c/0x1d4
[c00076e0] sys_ptrace+0x240/0x4a4
[c0002bd0] ret_from_syscall+0x0/0x44
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The proposed _tlbie call at update_mmu_cache() is safe because:
Addresses for which update_mmu_cache() gets invocated are never inside the
static kernel virtual mapping, meaning that there is no risk for the
_tlbie() here to be thrashing the pinned entry, as Dan suspected.
The intermediate TLB state in which this bug can be triggered is not
visible by userspace or any other contexts, except the page fault handling
path. So there is no need to worry about userspace dcbxxx users.
The other solution to this is to avoid dcbst misbehaviour in the first
place, which involves changing in-kernel "dcbst" callers to use 8xx
specific SPR's.
Summary:
On 8xx, cache control instructions (particularly "dcbst" from
flush_dcache_icache) fault as write operation if there is an unpopulated
TLB entry for the address in question. To workaround that, we invalidate
the TLB here, thus avoiding dcbst misbehaviour.
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Continue the Good Fight: Limit bootmem.h include creep.
Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The e200 core is a Book-E core (similar to e500) that has a unified L1 cache
and is not cache coherent on the bus. The e200 core also adds a separate
exception level for debug exceptions. Part of this patch helps to cleanup a
few cases that are true for all Freescale Book-E parts, not just e500.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch kills the whole embedded System.map mecanism and the
bootloader-passed System.map that was used to provide symbol resolution in
xmon. Instead, xmon now uses kallsyms like ppc64 does.
No hurry getting that in Linus tree, let it be tested in -mm for a while
first and make sure it doesn't break various embedded configs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Made the number of TLB CAM entries private and converted the board
consumers to use num_tlbcam_entries which is setup at boot time from
configuration registers. This way the only consumers of the #define
NUM_TLBCAMS are the arrays used to manage the TLB.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Remove PG_highmem, to save a page flag. Use is_highmem() instead. It'll
generate a little more code, but we don't use PageHigheMem() in many places.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
On ppc32, the platform code can supply a "progress" function that is
used to show progress through the boot. These functions are usually
in an init section and so can't be called after the init pages are
freed. Now that the cpu bringup code can be called after the system
is booted (for hotplug cpu) we can get the situation where the
progress function can be called after boot. The simple fix is to set
the progress function pointer to NULL when the init pages are freed,
and that is what this patch does (note that all callers already check
whether the function pointer is NULL before trying to call it).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
CONFIG_PTE_64BIT & CONFIG_PHYS_64BIT are not currently consistently used in
the code base. Fixed up the usage such that CONFIG_PTE_64BIT is used when we
have a 64-bit PTE regardless of physical address width. CONFIG_PHYS_64BIT is
used if the physical address width is larger than 32-bits, regardless of PTE
size.
These changes required a few sub-arch specific ifdef's to be fixed and the
introduction of a physical address format string.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!
|