aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2006-06-30Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [IPV6]: Added GSO support for TCPv6 [NET]: Generalise TSO-specific bits from skb_setup_caps [IPV6]: Added GSO support for TCPv6 [IPV6]: Remove redundant length check on input [NETFILTER]: SCTP conntrack: fix crash triggered by packet without chunks [TG3]: Update version and reldate [TG3]: Add TSO workaround using GSO [TG3]: Turn on hw fix for ASF problems [TG3]: Add rx BD workaround [TG3]: Add tg3_netif_stop() in vlan functions [TCP]: Reset gso_segs if packet is dodgy
2006-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: Remove obsolete #include <linux/config.h> remove obsolete swsusp_encrypt arch/arm26/Kconfig typos Documentation/IPMI typos Kconfig: Typos in net/sched/Kconfig v9fs: do not include linux/version.h Documentation/DocBook/mtdnand.tmpl: typo fixes typo fixes: specfic -> specific typo fixes in Documentation/networking/pktgen.txt typo fixes: occuring -> occurring typo fixes: infomation -> information typo fixes: disadvantadge -> disadvantage typo fixes: aquire -> acquire typo fixes: mecanism -> mechanism typo fixes: bandwith -> bandwidth fix a typo in the RTC_CLASS help text smb is no longer maintained Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6/Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6/: [PATCH] pcmcia: fix deadlock in pcmcia_parse_events [PATCH] com20020_cs: more device support [PATCH] au1xxx: pcmcia: fix __init called from non-init [PATCH] kill open-coded offsetof in cm4000_cs.c ZERO_DEV() [PATCH] pcmcia: convert pcmcia_cs to kthread [PATCH] pcmcia: fix kernel-doc function name [PATCH] pcmcia: hostap_cs.c - 0xc00f,0x0000 conflicts with pcnet_cs [PATCH] pcmcia: at91_cf suspend/resume/wakeup [PATCH] pcmcia: Make ide_cs work with the memory space of CF-Cards if IO space is not available [PATCH] pcmcia: TI PCIxx12 CardBus controller support [PATCH] pcmcia: warn if driver requests exclusive, but gets a shared IRQ [PATCH] pcmcia: expose tool in pcmcia/Documentation/pcmcia/ [PATCH] pcmcia: another ID for serial_cs.c [PATCH] yenta: fix hidden PCI bus numbers [PATCH] yenta: do power-up only after socket is configured
2006-06-30Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (25 commits) ACPI: Kconfig: ACPI_SRAT depends on ACPI ACPI: drivers/acpi/scan.c: make acpi_bus_type static ACPI: fixup memhotplug debug message ACPI: ACPICA 20060623 ACPI: C-States: only demote on current bus mastering activity ACPI: C-States: bm_activity improvements ACPI: C-States: accounting of sleep states ACPI: additional blacklist entry for ThinkPad R40e ACPI: restore comment justifying 'extra' P_LVLx access ACPI: fix battery on HP NX6125 ACPIPHP: prevent duplicate slot numbers when no _SUN ACPI: static-ize handle_hotplug_event_func() ACPIPHP: use ACPI dock driver ACPI: dock driver KEVENT: add new uevent for dock ACPI: asus_acpi_init: propagate correct return value [ACPI] Print error message if remove/install notify handler fails ACPI: delete tracing macros from drivers/acpi/*.c ACPI: HW P-state coordination support ACPI: un-export ACPI_ERROR() -- use printk(KERN_ERR...) ...
2006-06-30[IPV6]: Added GSO support for TCPv6Herbert Xu
This patch adds GSO support for IPv6 and TCPv6. This is based on a patch by Ananda Raju <Ananda.Raju@neterion.com>. His original description is: This patch enables TSO over IPv6. Currently Linux network stacks restricts TSO over IPv6 by clearing of the NETIF_F_TSO bit from "dev->features". This patch will remove this restriction. This patch will introduce a new flag NETIF_F_TSO6 which will be used to check whether device supports TSO over IPv6. If device support TSO over IPv6 then we don't clear of NETIF_F_TSO and which will make the TCP layer to create TSO packets. Any device supporting TSO over IPv6 will set NETIF_F_TSO6 flag in "dev->features" along with NETIF_F_TSO. In case when user disables TSO using ethtool, NETIF_F_TSO will get cleared from "dev->features". So even if we have NETIF_F_TSO6 we don't get TSO packets created by TCP layer. SKB_GSO_TCPV4 renamed to SKB_GSO_TCP to make it generic GSO packet. SKB_GSO_UDPV4 renamed to SKB_GSO_UDP as UFO is not a IPv4 feature. UFO is supported over IPv6 also The following table shows there is significant improvement in throughput with normal frames and CPU usage for both normal and jumbo. -------------------------------------------------- | | 1500 | 9600 | | ------------------|-------------------| | | thru CPU | thru CPU | -------------------------------------------------- | TSO OFF | 2.00 5.5% id | 5.66 20.0% id | -------------------------------------------------- | TSO ON | 2.63 78.0 id | 5.67 39.0% id | -------------------------------------------------- Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30[NET]: Generalise TSO-specific bits from skb_setup_capsHerbert Xu
This patch generalises the TSO-specific bits from sk_setup_caps by adding the sk_gso_type member to struct sock. This makes sk_setup_caps generic so that it can be used by TCPv6 or UFO. The only catch is that whoever uses this must provide a GSO implementation for their protocol which I think is a fair deal :) For now UFO continues to live without a GSO implementation which is OK since it doesn't use the sock caps field at the moment. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30[PATCH] pcmcia: TI PCIxx12 CardBus controller supportAlex Williamson
The patch below adds support for the TI PCIxx12 CardBus controllers. This seems to be sufficient to detect the cardbus bridge on an HP nc6320 and works with an orinoco wifi card. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2006-06-30[PATCH] knfsd: nfsd: mark rqstp to prevent use of sendfile in privacy caseJ. Bruce Fields
Add a rq_sendfile_ok flag to svc_rqst which will be cleared in the privacy case so that the wrapping code will get copies of the read data instead of real page cache pages. This makes life simpler when we encrypt the response. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] rcu: Add lock annotations to RCU locking primitivesJosh Triplett
Add __acquire annotations to rcu_read_lock and rcu_read_lock_bh, and add __release annotations to rcu_read_unlock and rcu_read_unlock_bh. This allows sparse to detect improperly paired calls to these functions. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] Correct rtc_wkalrm commentsAndrew Victor
This corrects the comments describing the 'enabled' and 'pending' flags in struct rtc_wkalrm of include/linux/rtc.h. Signed-off-by: Andrew Victor <andrew@sanpeople.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] add smp_setup_processor_id()Andrew Morton
Presently, smp_processor_id() isn't necessarily set up until setup_arch(). But it's used in boot_cpu_init() and printk() and perhaps in other places, prior to setup_arch() being called. So provide a new smp_setup_processor_id() which is called before anything else, wire it up for Voyager (which boots on a CPU other than #0, and broke). Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] SELinux: Add security hook definition for getioprio and insert hooksDavid Quigley
Add a new security hook definition for the sys_ioprio_get operation. At present, the SELinux hook function implementation for this hook is identical to the getscheduler implementation but a separate hook is introduced to allow this check to be specialized in the future if necessary. This patch also creates a helper function get_task_ioprio which handles the access check in addition to retrieving the ioprio value for the task. Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] SELinux: add security hook call to kill_proc_info_as_uidDavid Quigley
This patch adds a call to the extended security_task_kill hook introduced by the prior patch to the kill_proc_info_as_uid function so that these signals can be properly mediated by security modules. It also updates the existing hook call in check_kill_permission. Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] SELinux: extend task_kill hook to handle signals sent by AIO completionDavid Quigley
This patch extends the security_task_kill hook to handle signals sent by AIO completion. In this case, the secid of the task responsible for the signal needs to be obtained and saved earlier, so a security_task_getsecid() hook is added, and then this saved value is passed subsequently to the extended task_kill hook for use in checking. Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] Light weight event countersChristoph Lameter
The remaining counters in page_state after the zoned VM counter patches have been applied are all just for show in /proc/vmstat. They have no essential function for the VM. We use a simple increment of per cpu variables. In order to avoid the most severe races we disable preempt. Preempt does not prevent the race between an increment and an interrupt handler incrementing the same statistics counter. However, that race is exceedingly rare, we may only loose one increment or so and there is no requirement (at least not in kernel) that the vm event counters have to be accurate. In the non preempt case this results in a simple increment for each counter. For many architectures this will be reduced by the compiler to a single instruction. This single instruction is atomic for i386 and x86_64. And therefore even the rare race condition in an interrupt is avoided for both architectures in most cases. The patchset also adds an off switch for embedded systems that allows a building of linux kernels without these counters. The implementation of these counters is through inline code that hopefully results in only a single instruction increment instruction being emitted (i386, x86_64) or in the increment being hidden though instruction concurrency (EPIC architectures such as ia64 can get that done). Benefits: - VM event counter operations usually reduce to a single inline instruction on i386 and x86_64. - No interrupt disable, only preempt disable for the preempt case. Preempt disable can also be avoided by moving the counter into a spinlock. - Handling is similar to zoned VM counters. - Simple and easily extendable. - Can be omitted to reduce memory use for embedded use. References: RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2 RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2 local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2 V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] Use Zoned VM Counters for NUMA statisticsChristoph Lameter
The numa statistics are really event counters. But they are per node and so we have had special treatment for these counters through additional fields on the pcp structure. We can now use the per zone nature of the zoned VM counters to realize these. This will shrink the size of the pcp structure on NUMA systems. We will have some room to add additional per zone counters that will all still fit in the same cacheline. Bits Prior pcp size Size after patch We can add ------------------------------------------------------------------ 64 128 bytes (16 words) 80 bytes (10 words) 48 32 76 bytes (19 words) 56 bytes (14 words) 8 (64 byte cacheline) 72 (128 byte) Remove the special statistics for numa and replace them with zoned vm counters. This has the side effect that global sums of these events now show up in /proc/vmstat. Also take the opportunity to move the zone_statistics() function from page_alloc.c into vmstat.c. Discussions: V2 http://marc.theaimsgroup.com/?t=115048227000002&r=1&w=2 Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned-vm-counters: remove read_page_state()Andrew Morton
No callers. Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: conversion of nr_bounce to per zone counterChristoph Lameter
Conversion of nr_bounce to a per zone counter nr_bounce is only used for proc output. So it could be left as an event counter. However, the event counters may not be accurate and nr_bounce is categorizing types of pages in a zone. So we really need this to also be a per zone counter. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: conversion of nr_unstable to per zone counterChristoph Lameter
Conversion of nr_unstable to a per zone counter We need to do some special modifications to the nfs code since there are multiple cases of disposition and we need to have a page ref for proper accounting. This converts the last critical page state of the VM and therefore we need to remove several functions that were depending on GET_PAGE_STATE_LAST in order to make the kernel compile again. We are only left with event type counters in page state. [akpm@osdl.org: bugfixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: conversion of nr_writeback to per zone counterChristoph Lameter
Conversion of nr_writeback to per zone counter. This removes the last page_state counter from arch/i386/mm/pgtable.c so we drop the page_state from there. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: conversion of nr_dirty to per zone counterChristoph Lameter
This makes nr_dirty a per zone counter. Looping over all processors is avoided during writeback state determination. The counter aggregation for nr_dirty had to be undone in the NFS layer since we summed up the page counts from multiple zones. Someone more familiar with NFS should probably review what I have done. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: conversion of nr_pagetables to per zone counterChristoph Lameter
Conversion of nr_page_table_pages to a per zone counter [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: conversion of nr_slab to per zone counterChristoph Lameter
- Allows reclaim to access counter without looping over processor counts. - Allows accurate statistics on how many pages are used in a zone by the slab. This may become useful to balance slab allocations over various zones. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: zone_reclaim: remove ↵Christoph Lameter
/proc/sys/vm/zone_reclaim_interval The zone_reclaim_interval was necessary because we were not able to determine how many unmapped pages exist in a zone. Therefore we had to scan in intervals to figure out if any pages were unmapped. With the zoned counters and NR_ANON_PAGES we now know the number of pagecache pages and the number of mapped pages in a zone. So we can simply skip the reclaim if there is an insufficient number of unmapped pages. We use SWAP_CLUSTER_MAX as the boundary. Drop all support for /proc/sys/vm/zone_reclaim_interval. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPEDChristoph Lameter
The current NR_FILE_MAPPED is used by zone reclaim and the dirty load calculation as the number of mapped pagecache pages. However, that is not true. NR_FILE_MAPPED includes the mapped anonymous pages. This patch separates those and therefore allows an accurate tracking of the anonymous pages per zone. It then becomes possible to determine the number of unmapped pages per zone and we can avoid scanning for unmapped pages if there are none. Also it may now be possible to determine the mapped/unmapped ratio in get_dirty_limit. Isnt the number of anonymous pages irrelevant in that calculation? Note that this will change the meaning of the number of mapped pages reported in /proc/vmstat /proc/meminfo and in the per node statistics. This may affect user space tools that monitor these counters! NR_FILE_MAPPED works like NR_FILE_DIRTY. It is only valid for pagecache pages. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: conversion of nr_pagecache to per zone counterChristoph Lameter
Currently a single atomic variable is used to establish the size of the page cache in the whole machine. The zoned VM counters have the same method of implementation as the nr_pagecache code but also allow the determination of the pagecache size per zone. Remove the special implementation for nr_pagecache and make it a zoned counter named NR_FILE_PAGES. Updates of the page cache counters are always performed with interrupts off. We can therefore use the __ variant here. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: convert nr_mapped to per zone counterChristoph Lameter
nr_mapped is important because it allows a determination of how many pages of a zone are not mapped, which would allow a more efficient means of determining when we need to reclaim memory in a zone. We take the nr_mapped field out of the page state structure and define a new per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off from NR_MAPPED in the next patch). We replace the use of nr_mapped in various kernel locations. This avoids the looping over all processors in try_to_free_pages(), writeback, reclaim (swap + zone reclaim). [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: basic ZVC (zoned vm counter) implementationChristoph Lameter
Per zone counter infrastructure The counters that we currently have for the VM are split per processor. The processor however has not much to do with the zone these pages belong to. We cannot tell f.e. how many ZONE_DMA pages are dirty. So we are blind to potentially inbalances in the usage of memory in various zones. F.e. in a NUMA system we cannot tell how many pages are dirty on a particular node. If we knew then we could put measures into the VM to balance the use of memory between different zones and different nodes in a NUMA system. For example it would be possible to limit the dirty pages per node so that fast local memory is kept available even if a process is dirtying huge amounts of pages. Another example is zone reclaim. We do not know how many unmapped pages exist per zone. So we just have to try to reclaim. If it is not working then we pause and try again later. It would be better if we knew when it makes sense to reclaim unmapped pages from a zone. This patchset allows the determination of the number of unmapped pages per zone. We can remove the zone reclaim interval with the counters introduced here. Futhermore the ability to have various usage statistics available will allow the development of new NUMA balancing algorithms that may be able to improve the decision making in the scheduler of when to move a process to another node and hopefully will also enable automatic page migration through a user space program that can analyse the memory load distribution and then rebalance memory use in order to increase performance. The counter framework here implements differential counters for each processor in struct zone. The differential counters are consolidated when a threshold is exceeded (like done in the current implementation for nr_pageache), when slab reaping occurs or when a consolidation function is called. Consolidation uses atomic operations and accumulates counters per zone in the zone structure and also globally in the vm_stat array. VM functions can access the counts by simply indexing a global or zone specific array. The arrangement of counters in an array also simplifies processing when output has to be generated for /proc/*. Counters can be updated by calling inc/dec_zone_page_state or _inc/dec_zone_page_state analogous to *_page_state. The second group of functions can be called if it is known that interrupts are disabled. Special optimized increment and decrement functions are provided. These can avoid certain checks and use increment or decrement instructions that an architecture may provide. We also add a new CONFIG_DMA_IS_NORMAL that signifies that an architecture can do DMA to all memory and therefore ZONE_NORMAL will not be populated. This is only currently set for IA64 SGI SN2 and currently only affects node_page_state(). In the best case node_page_state can be reduced to retrieving a single counter for the one zone on the node. [akpm@osdl.org: cleanups] [akpm@osdl.org: export vm_stat[] for filesystems] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.hChristoph Lameter
NOTE: ZVC are *not* the lightweight event counters. ZVCs are reliable whereas event counters do not need to be. Zone based VM statistics are necessary to be able to determine what the state of memory in one zone is. In a NUMA system this can be helpful for local reclaim and other memory optimizations that may be able to shift VM load in order to get more balanced memory use. It is also useful to know how the computing load affects the memory allocations on various zones. This patchset allows the retrieval of that data from userspace. The patchset introduces a framework for counters that is a cross between the existing page_stats --which are simply global counters split per cpu-- and the approach of deferred incremental updates implemented for nr_pagecache. Small per cpu 8 bit counters are added to struct zone. If the counter exceeds certain thresholds then the counters are accumulated in an array of atomic_long in the zone and in a global array that sums up all zone values. The small 8 bit counters are next to the per cpu page pointers and so they will be in high in the cpu cache when pages are allocated and freed. Access to VM counter information for a zone and for the whole machine is then possible by simply indexing an array (Thanks to Nick Piggin for pointing out that approach). The access to the total number of pages of various types does no longer require the summing up of all per cpu counters. Benefits of this patchset right now: - Ability for UP and SMP configuration to determine how memory is balanced between the DMA, NORMAL and HIGHMEM zones. - loops over all processors are avoided in writeback and reclaim paths. We can avoid caching the writeback information because the needed information is directly accessible. - Special handling for nr_pagecache removed. - zone_reclaim_interval vanishes since VM stats can now determine when it is worth to do local reclaim. - Fast inline per node page state determination. - Accurate counters in /sys/devices/system/node/node*/meminfo. Current counters are counting simply which processor allocated a page somewhere and guestimate based on that. So the counters were not useful to show the actual distribution of page use on a specific zone. - The swap_prefetch patch requires per node statistics in order to figure out when processors of a node can prefetch. This patch provides some of the needed numbers. - Detailed VM counters available in more /proc and /sys status files. References to earlier discussions: V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2 V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2 Performance tests with AIM7 did not show any regressions. Seems to be a tad faster even. Tested on ia64/NUMA. Builds fine on i386, SMP / UP. Includes fixes for s390/arm/uml arch code. This patch: Move counter code from page_alloc.c/page-flags.h to vmstat.c/h. Create vmstat.c/vmstat.h by separating the counter code and the proc functions. Move the vm_stat_text array before zoneinfo_show. [akpm@osdl.org: s390 build fix] [akpm@osdl.org: HOTPLUG_CPU build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30typo fixes: infomation -> informationAdrian Bunk
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30typo fixes: mecanism -> mechanismAdrian Bunk
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-29[NET]: make skb_release_data() staticAdrian Bunk
skb_release_data() no longer has any users in other files. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29[ATM]: basic sysfs support for ATM devicesRoman Kagan
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29[NET]: Add ECN support for TSOMichael Chan
In the current TSO implementation, NETIF_F_TSO and ECN cannot be turned on together in a TCP connection. The problem is that most hardware that supports TSO does not handle CWR correctly if it is set in the TSO packet. Correct handling requires CWR to be set in the first packet only if it is set in the TSO header. This patch adds the ability to turn on NETIF_F_TSO and ECN using GSO if necessary to handle TSO packets with CWR set. Hardware that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev-> features flag. All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set. If the output device does not have the NETIF_F_TSO_ECN feature set, GSO will split the packet up correctly with CWR only set in the first segment. With help from Herbert Xu <herbert@gondor.apana.org.au>. Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock flag is completely removed. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29[AF_UNIX]: Datagram getpeersecCatherine Zhang
This patch implements an API whereby an application can determine the label of its peer's Unix datagram sockets via the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of the peer of a Unix datagram socket. The application can then use this security context to determine the security context for processing on behalf of the peer who sent the packet. Patch design and implementation: The design and implementation is very similar to the UDP case for INET sockets. Basically we build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). To retrieve the security context, the application first indicates to the kernel such desire by setting the SO_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for Unix datagram socket should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_SOCKET && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow a server socket to receive security context of the peer. Testing: We have tested the patch by setting up Unix datagram client and server applications. We verified that the server can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com> Acked-by: Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29[NET]: Fix logical error in skb_gso_okHerbert Xu
The test in skb_gso_ok is backwards. Noticed by Michael Chan <mchan@broadcom.com>. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29[NETLINK]: Encapsulate eff_cap usage within security framework.Darrel Goeddel
This patch encapsulates the usage of eff_cap (in netlink_skb_params) within the security framework by extending security_netlink_recv to include a required capability parameter and converting all direct usage of eff_caps outside of the lsm modules to use the interface. It also updates the SELinux implementation of the security_netlink_send and security_netlink_recv hooks to take advantage of the sid in the netlink_skb_params struct. This also enables SELinux to perform auditing of netlink capability checks. Please apply, for 2.6.18 if possible. Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29[NET]: Added GSO header verificationHerbert Xu
When GSO packets come from an untrusted source (e.g., a Xen guest domain), we need to verify the header integrity before passing it to the hardware. Since the first step in GSO is to verify the header, we can reuse that code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this bit set can only be fed directly to devices with the corresponding bit NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb is fed to the GSO engine which will allow the packet to be sent to the hardware if it passes the header check. This patch changes the sg flag to a full features flag. The same method can be used to implement TSO ECN support. We simply have to mark packets with CWR set with SKB_GSO_ECN so that only hardware with a corresponding NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment the packet, or segment the first MTU and pass the rest to the hardware for further segmentation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29merge linus into release branchLen Brown
Conflicts: drivers/acpi/acpi_memhotplug.c
2006-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6: (22 commits) [PATCH] devfs: Remove it from the feature_removal.txt file [PATCH] devfs: Last little devfs cleanups throughout the kernel tree. [PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV [PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed [PATCH] devfs: Remove the line_driver devfs_name field as it's no longer needed [PATCH] devfs: Remove the videodevice devfs_name field as it's no longer needed [PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed [PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree [PATCH] devfs: Remove devfs_remove() function from the kernel tree [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree [PATCH] devfs: Remove devfs_mk_symlink() function from the kernel tree [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree [PATCH] devfs: Remove devfs_*_tape() functions from the kernel tree [PATCH] devfs: Remove devfs support from the sound subsystem [PATCH] devfs: Remove devfs support from the ide subsystem. [PATCH] devfs: Remove devfs support from the serial subsystem [PATCH] devfs: Remove devfs from the init code [PATCH] devfs: Remove devfs from the partition code ...
2006-06-29Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (33 commits) [MIPS] Add missing backslashes to macro definitions. [MIPS] Death list of board support to be removed after 2.6.18. [MIPS] Remove BSD and Sys V compat data types. [MIPS] ioc3.h: Uses u8, so include <linux/types.h>. [MIPS] 74K: Assume it will also have an AR bit in config7 [MIPS] Treat CPUs with AR bit as physically indexed. [MIPS] Oprofile: Support VSMP on 34K. [MIPS] MIPS32/MIPS64 S-cache fix and cleanup [MIPS] excite: PCI makefile needs to use += if it wants a chance to work. [MIPS] excite: plat_setup -> plat_mem_setup. [MIPS] au1xxx: export dbdma functions [MIPS] au1xxx: dbdma, no sleeping under spin_lock [MIPS] au1xxx: fix PSC_SMBTXRX_RSR. [MIPS] Early printk for IP27. [MIPS] Fix handling of 0 length I & D caches. [MIPS] Typo fixes. [MIPS] MIPS32/MIPS64 secondary cache management [MIPS] Fix FIXADDR_TOP for TX39/TX49. [MIPS] Remove first timer interrupt setup in wrppmc_timer_setup() [MIPS] Fix configuration of R2 CPU features and multithreading. ...
2006-06-29elf-em.h: Define and explain both EM_MIPS_RS3_LE and EM_MIPS_RS4_BE.Ralf Baechle
They have been obsoleted by the ELF header EI_CLASS and EI_DATA fields in combination with e_flags. Afaics EM_MIPS_RS3_LE and EM_MIPS_RS4_BE never had any practical relevance. Binutils will not produce such binaries and the kernel will not accept them as MIPS binaries. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: [PATCH] i386: export memory more than 4G through /proc/iomem [PATCH] 64bit Resource: finally enable 64bit resource sizes [PATCH] 64bit Resource: convert a few remaining drivers to use resource_size_t where needed [PATCH] 64bit resource: change pnp core to use resource_size_t [PATCH] 64bit resource: change pci core and arch code to use resource_size_t [PATCH] 64bit resource: change resource core to use resource_size_t [PATCH] 64bit resource: introduce resource_size_t for the start and end of struct resource [PATCH] 64bit resource: fix up printks for resources in misc drivers [PATCH] 64bit resource: fix up printks for resources in arch and core code [PATCH] 64bit resource: fix up printks for resources in pcmcia drivers [PATCH] 64bit resource: fix up printks for resources in video drivers [PATCH] 64bit resource: fix up printks for resources in ide drivers [PATCH] 64bit resource: fix up printks for resources in mtd drivers [PATCH] 64bit resource: fix up printks for resources in pci core and hotplug drivers [PATCH] 64bit resource: fix up printks for resources in networks drivers [PATCH] 64bit resource: fix up printks for resources in sound drivers [PATCH] 64bit resource: C99 changes for struct resource declarations Fixed up trivial conflict in drivers/ide/pci/cmd64x.c (the printk that was changed by the 64-bit resources had been deleted in the meantime ;)
2006-06-29[PATCH] genirq: add chip->eoi(), fastack -> fasteoiIngo Molnar
Clean up the fastack concept by turning it into fasteoi and introducing the ->eoi() method for chips. This also allows the cleanup of an i386 EOI quirk - now the quirk is cleanly separated from the pure ACK implementation. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add IRQ_TYPE_SENSE_MASKBenjamin Herrenschmidt
Add a #define for the mask of the part of IRQ_TYPE that represents the trigger type. I use that in my in-progress work as I've standardized the way the irq description in the firmware device-tree get translated to linux useable things by using those constants. Having this mask to isolate the "trigger type" part of the flags is useful in a few places. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add irq-wake (power-management) supportThomas Gleixner
Enable platforms to set the irq-wake (power-management) properties of an IRQ. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add irq-chip supportThomas Gleixner
Enable platforms to use the irq-chip and irq-flow abstractions: allow setting of the chip, the type and provide highlevel handlers for common irq-flows. [rostedt@goodmis.org: misroute-irq: Don't call desc->chip->end because of edge interrupts] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq MSI fixesIngo Molnar
This is a fixed up and cleaned up replacement for genirq-msi-fixes.patch, which should solve the i386 4KSTACKS problem. I also added Ben's idea of pushing the __do_IRQ() check into generic_handle_irq(). I booted this with MSI enabled, but i only have MSI devices, not MSI-X devices. I'd still expect MSI-X to work now. irqchip migration helper: call __do_IRQ() if a descriptor is attached to an irqtype-style controller. This also fixes MSI-X IRQ handling on i386 and x86_64. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: coreThomas Gleixner
Core genirq support: add the irq-chip and irq-flow abstractions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>