Age | Commit message (Collapse) | Author |
|
Core functions for the rt-mutex subsystem.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add framework to boost/unboost the priority of RT tasks.
This consists of:
- caching the 'normal' priority in ->normal_prio
- providing a functions to set/get the priority of the task
- make sched_setscheduler() aware of boosting
The effective_prio() cleanups also fix a priority-calculation bug pointed out
by Andrey Gelman, in set_user_nice().
has_rt_policy() fix: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrey Gelman <agelman@012.net.il>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add the priority-sorted list (plist) implementation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add debug_check_no_locks_freed(), as a central inline to add
bad-lock-free-debugging functionality to.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We are pleased to announce "lightweight userspace priority inheritance" (PI)
support for futexes. The following patchset and glibc patch implements it,
ontop of the robust-futexes patchset which is included in 2.6.16-mm1.
We are calling it lightweight for 3 reasons:
- in the user-space fastpath a PI-enabled futex involves no kernel work
(or any other PI complexity) at all. No registration, no extra kernel
calls - just pure fast atomic ops in userspace.
- in the slowpath (in the lock-contention case), the system call and
scheduling pattern is in fact better than that of normal futexes, due to
the 'integrated' nature of FUTEX_LOCK_PI. [more about that further down]
- the in-kernel PI implementation is streamlined around the mutex
abstraction, with strict rules that keep the implementation relatively
simple: only a single owner may own a lock (i.e. no read-write lock
support), only the owner may unlock a lock, no recursive locking, etc.
Priority Inheritance - why, oh why???
-------------------------------------
Many of you heard the horror stories about the evil PI code circling Linux for
years, which makes no real sense at all and is only used by buggy applications
and which has horrible overhead. Some of you have dreaded this very moment,
when someone actually submits working PI code ;-)
So why would we like to see PI support for futexes?
We'd like to see it done purely for technological reasons. We dont think it's
a buggy concept, we think it's useful functionality to offer to applications,
which functionality cannot be achieved in other ways. We also think it's the
right thing to do, and we think we've got the right arguments and the right
numbers to prove that. We also believe that we can address all the
counter-arguments as well. For these reasons (and the reasons outlined below)
we are submitting this patch-set for upstream kernel inclusion.
What are the benefits of PI?
The short reply:
----------------
User-space PI helps achieving/improving determinism for user-space
applications. In the best-case, it can help achieve determinism and
well-bound latencies. Even in the worst-case, PI will improve the statistical
distribution of locking related application delays.
The longer reply:
-----------------
Firstly, sharing locks between multiple tasks is a common programming
technique that often cannot be replaced with lockless algorithms. As we can
see it in the kernel [which is a quite complex program in itself], lockless
structures are rather the exception than the norm - the current ratio of
lockless vs. locky code for shared data structures is somewhere between 1:10
and 1:100. Lockless is hard, and the complexity of lockless algorithms often
endangers to ability to do robust reviews of said code. I.e. critical RT
apps often choose lock structures to protect critical data structures, instead
of lockless algorithms. Furthermore, there are cases (like shared hardware,
or other resource limits) where lockless access is mathematically impossible.
Media players (such as Jack) are an example of reasonable application design
with multiple tasks (with multiple priority levels) sharing short-held locks:
for example, a highprio audio playback thread is combined with medium-prio
construct-audio-data threads and low-prio display-colory-stuff threads. Add
video and decoding to the mix and we've got even more priority levels.
So once we accept that synchronization objects (locks) are an unavoidable fact
of life, and once we accept that multi-task userspace apps have a very fair
expectation of being able to use locks, we've got to think about how to offer
the option of a deterministic locking implementation to user-space.
Most of the technical counter-arguments against doing priority inheritance
only apply to kernel-space locks. But user-space locks are different, there
we cannot disable interrupts or make the task non-preemptible in a critical
section, so the 'use spinlocks' argument does not apply (user-space spinlocks
have the same priority inversion problems as other user-space locking
constructs). Fact is, pretty much the only technique that currently enables
good determinism for userspace locks (such as futex-based pthread mutexes) is
priority inheritance:
Currently (without PI), if a high-prio and a low-prio task shares a lock [this
is a quite common scenario for most non-trivial RT applications], even if all
critical sections are coded carefully to be deterministic (i.e. all critical
sections are short in duration and only execute a limited number of
instructions), the kernel cannot guarantee any deterministic execution of the
high-prio task: any medium-priority task could preempt the low-prio task while
it holds the shared lock and executes the critical section, and could delay it
indefinitely.
Implementation:
---------------
As mentioned before, the userspace fastpath of PI-enabled pthread mutexes
involves no kernel work at all - they behave quite similarly to normal
futex-based locks: a 0 value means unlocked, and a value==TID means locked.
(This is the same method as used by list-based robust futexes.) Userspace uses
atomic ops to lock/unlock these mutexes without entering the kernel.
To handle the slowpath, we have added two new futex ops:
FUTEX_LOCK_PI
FUTEX_UNLOCK_PI
If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to TID
fails], then FUTEX_LOCK_PI is called. The kernel does all the remaining work:
if there is no futex-queue attached to the futex address yet then the code
looks up the task that owns the futex [it has put its own TID into the futex
value], and attaches a 'PI state' structure to the futex-queue. The pi_state
includes an rt-mutex, which is a PI-aware, kernel-based synchronization
object. The 'other' task is made the owner of the rt-mutex, and the
FUTEX_WAITERS bit is atomically set in the futex value. Then this task tries
to lock the rt-mutex, on which it blocks. Once it returns, it has the mutex
acquired, and it sets the futex value to its own TID and returns. Userspace
has no other work to perform - it now owns the lock, and futex value contains
FUTEX_WAITERS|TID.
If the unlock side fastpath succeeds, [i.e. userspace manages to do a TID ->
0 atomic transition of the futex value], then no kernel work is triggered.
If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), then
FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the behalf of
userspace - and it also unlocks the attached pi_state->rt_mutex and thus wakes
up any potential waiters.
Note that under this approach, contrary to other PI-futex approaches, there is
no prior 'registration' of a PI-futex. [which is not quite possible anyway,
due to existing ABI properties of pthread mutexes.]
Also, under this scheme, 'robustness' and 'PI' are two orthogonal properties
of futexes, and all four combinations are possible: futex, robust-futex,
PI-futex, robust+PI-futex.
glibc support:
--------------
Ulrich Drepper and Jakub Jelinek have written glibc support for PI-futexes
(and robust futexes), enabling robust and PI (PTHREAD_PRIO_INHERIT) POSIX
mutexes. (PTHREAD_PRIO_PROTECT support will be added later on too, no
additional kernel changes are needed for that). [NOTE: The glibc patch is
obviously inofficial and unsupported without matching upstream kernel
functionality.]
the patch-queue and the glibc patch can also be downloaded from:
http://redhat.com/~mingo/PI-futex-patches/
Many thanks go to the people who helped us create this kernel feature: Steven
Rostedt, Esben Nielsen, Benedikt Spranger, Daniel Walker, John Cooper, Arjan
van de Ven, Oleg Nesterov and others. Credits for related prior projects goes
to Dirk Grambow, Inaky Perez-Gonzalez, Bill Huey and many others.
Clean up the futex code, before adding more features to it:
- use u32 as the futex field type - that's the ABI
- use __user and pointers to u32 instead of unsigned long
- code style / comment style cleanups
- rename hash-bucket name from 'bh' to 'hb'.
I checked the pre and post futex.o object files to make sure this
patch has no code effects.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in
/sys/devices/system/cpu/ control the MC/SMT power savings policy for the
scheduler.
Based on the values (1-enable, 0-disable) for these controls, sched groups
cpu power will be determined for different domains. When power savings
policy is enabled and under light load conditions, scheduler will minimize
the physical packages/cpu cores carrying the load and thus conserving
power(with a perf impact based on the workload characteristics... see OLS
2005 CMP kernel scheduler paper for more details..)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Con Kolivas <kernel@kolivas.org>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Try to handle mem allocation failures in build_sched_domains by bailing out
and cleaning up thus-far allocated memory. The patch has a direct consequence
that we disable load balancing completely (even at sibling level) upon *any*
memory allocation failure.
[Lee.Schermerhorn@hp.com: bugfix]
Signed-off-by: Srivatsa Vaddagir <vatsa@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Problem:
The introduction of separate run queues per CPU has brought with it "nice"
enforcement problems that are best described by a simple example.
For the sake of argument suppose that on a single CPU machine with a
nice==19 hard spinner and a nice==0 hard spinner running that the nice==0
task gets 95% of the CPU and the nice==19 task gets 5% of the CPU. Now
suppose that there is a system with 2 CPUs and 2 nice==19 hard spinners and
2 nice==0 hard spinners running. The user of this system would be entitled
to expect that the nice==0 tasks each get 95% of a CPU and the nice==19
tasks only get 5% each. However, whether this expectation is met is pretty
much down to luck as there are four equally likely distributions of the
tasks to the CPUs that the load balancing code will consider to be balanced
with loads of 2.0 for each CPU. Two of these distributions involve one
nice==0 and one nice==19 task per CPU and in these circumstances the users
expectations will be met. The other two distributions both involve both
nice==0 tasks being on one CPU and both nice==19 being on the other CPU and
each task will get 50% of a CPU and the user's expectations will not be
met.
Solution:
The solution to this problem that is implemented in the attached patch is
to use weighted loads when determining if the system is balanced and, when
an imbalance is detected, to move an amount of weighted load between run
queues (as opposed to a number of tasks) to restore the balance. Once
again, the easiest way to explain why both of these measures are necessary
is to use a simple example. Suppose that (in a slight variation of the
above example) that we have a two CPU system with 4 nice==0 and 4 nice=19
hard spinning tasks running and that the 4 nice==0 tasks are on one CPU and
the 4 nice==19 tasks are on the other CPU. The weighted loads for the two
CPUs would be 4.0 and 0.2 respectively and the load balancing code would
move 2 tasks resulting in one CPU with a load of 2.0 and the other with
load of 2.2. If this was considered to be a big enough imbalance to
justify moving a task and that task was moved using the current
move_tasks() then it would move the highest priority task that it found and
this would result in one CPU with a load of 3.0 and the other with a load
of 1.2 which would result in the movement of a task in the opposite
direction and so on -- infinite loop. If, on the other hand, an amount of
load to be moved is calculated from the imbalance (in this case 0.1) and
move_tasks() skips tasks until it find ones whose contributions to the
weighted load are less than this amount it would move two of the nice==19
tasks resulting in a system with 2 nice==0 and 2 nice=19 on each CPU with
loads of 2.1 for each CPU.
One of the advantages of this mechanism is that on a system where all tasks
have nice==0 the load balancing calculations would be mathematically
identical to the current load balancing code.
Notes:
struct task_struct:
has a new field load_weight which (in a trade off of space for speed)
stores the contribution that this task makes to a CPU's weighted load when
it is runnable.
struct runqueue:
has a new field raw_weighted_load which is the sum of the load_weight
values for the currently runnable tasks on this run queue. This field
always needs to be updated when nr_running is updated so two new inline
functions inc_nr_running() and dec_nr_running() have been created to make
sure that this happens. This also offers a convenient way to optimize away
this part of the smpnice mechanism when CONFIG_SMP is not defined.
int try_to_wake_up():
in this function the value SCHED_LOAD_BALANCE is used to represent the load
contribution of a single task in various calculations in the code that
decides which CPU to put the waking task on. While this would be a valid
on a system where the nice values for the runnable tasks were distributed
evenly around zero it will lead to anomalous load balancing if the
distribution is skewed in either direction. To overcome this problem
SCHED_LOAD_SCALE has been replaced by the load_weight for the relevant task
or by the average load_weight per task for the queue in question (as
appropriate).
int move_tasks():
The modifications to this function were complicated by the fact that
active_load_balance() uses it to move exactly one task without checking
whether an imbalance actually exists. This precluded the simple
overloading of max_nr_move with max_load_move and necessitated the addition
of the latter as an extra argument to the function. The internal
implementation is then modified to move up to max_nr_move tasks and
max_load_move of weighted load. This slightly complicates the code where
move_tasks() is called and if ever active_load_balance() is changed to not
use move_tasks() the implementation of move_tasks() should be simplified
accordingly.
struct sched_group *find_busiest_group():
Similar to try_to_wake_up(), there are places in this function where
SCHED_LOAD_SCALE is used to represent the load contribution of a single
task and the same issues are created. A similar solution is adopted except
that it is now the average per task contribution to a group's load (as
opposed to a run queue) that is required. As this value is not directly
available from the group it is calculated on the fly as the queues in the
groups are visited when determining the busiest group.
A key change to this function is that it is no longer to scale down
*imbalance on exit as move_tasks() uses the load in its scaled form.
void set_user_nice():
has been modified to update the task's load_weight field when it's nice
value and also to ensure that its run queue's raw_weighted_load field is
updated if it was runnable.
From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
With smpnice, sched groups with highest priority tasks can mask the imbalance
between the other sched groups with in the same domain. This patch fixes some
of the listed down scenarios by not considering the sched groups which are
lightly loaded.
a) on a simple 4-way MP system, if we have one high priority and 4 normal
priority tasks, with smpnice we would like to see the high priority task
scheduled on one cpu, two other cpus getting one normal task each and the
fourth cpu getting the remaining two normal tasks. but with current
smpnice extra normal priority task keeps jumping from one cpu to another
cpu having the normal priority task. This is because of the
busiest_has_loaded_cpus, nr_loaded_cpus logic.. We are not including the
cpu with high priority task in max_load calculations but including that in
total and avg_load calcuations.. leading to max_load < avg_load and load
balance between cpus running normal priority tasks(2 Vs 1) will always show
imbalanace as one normal priority and the extra normal priority task will
keep moving from one cpu to another cpu having normal priority task..
b) 4-way system with HT (8 logical processors). Package-P0 T0 has a
highest priority task, T1 is idle. Package-P1 Both T0 and T1 have 1 normal
priority task each.. P2 and P3 are idle. With this patch, one of the
normal priority tasks on P1 will be moved to P2 or P3..
c) With the current weighted smp nice calculations, it doesn't always make
sense to look at the highest weighted runqueue in the busy group..
Consider a load balance scenario on a DP with HT system, with Package-0
containing one high priority and one low priority, Package-1 containing one
low priority(with other thread being idle).. Package-1 thinks that it need
to take the low priority thread from Package-0. And find_busiest_queue()
returns the cpu thread with highest priority task.. And ultimately(with
help of active load balance) we move high priority task to Package-1. And
same continues with Package-0 now, moving high priority task from package-1
to package-0.. Even without the presence of active load balance, load
balance will fail to balance the above scenario.. Fix find_busiest_queue
to use "imbalance" when it is lightly loaded.
[kernel@kolivas.org: sched: store weighted load on up]
[kernel@kolivas.org: sched: add discrete weighted cpu load function]
[suresh.b.siddha@intel.com: sched: remove dead code]
Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Cc: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use of dev_dbg() and friends is considered good practice. dev_dbg() needs a
struct device *devp, but nsc_gpio is only a helper module, so it doesnt
have/need its own. To provide devp to the user-modules (scx200 & pc8736x
_gpio), we add it to the vtable, and set it during init.
Also squeeze nsc_gpio_dump()'s format a little.
[ 199.259879] pc8736x_gpio.0: io09: 0x0044 TS OD PUE EDGE LO DEBOUNCE
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Now that the read(), write() file-ops are dispatching gpio-ops via the vtable,
they are generic, and can be moved 'verbatim' to the nsc_gpio common-support
module. After the move, various symbols are renamed to update 'scx200_' to
'nsc_', and headers are adjusted accordingly.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Abstract the gpio operations into a new nsc_gpio_ops vtable.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
unsigned ints
Per kernel headers, device minor numbers are unsigned ints. Do the same in
this driver.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
GPIO SUPPORT FOR SCx200 & PC8736x
The patch-set reworks the 2.4 vintage scx200_gpio driver for modern 2.6, and
refactors GPIO support to reuse it in a new driver for the GPIO on PC-8736x
chips. Its handy for the Soekris.com net-4801, which has both chips.
These patches have been seen recently on Kernel-Mentors, and then
Kernel-Newbies ML, where Jesper Juhl kindly reviewed it. His feedback has
been incorporated. Thanks Jesper !
Its also gone to soekris-tech@soekris.com for possible testing by linux folks,
I've gotten 1 promise so far. Theyre mostly BSD folk over there, but we'll
see..
Device-file & Sysfs
The driver preserves the existing device-file interface, including the
write/cmd set, but adds v to 'view' the pin-settings & configs by inducing,
via gpio_dump(), a dev_info() call. Its a fairly crappy way to get status,
but it sticks to the syslog approach, conservatively.
Allowing users to voluntarily trigger logging is good, it gives them a
familiar way to confirm their app's control & use of the pins, and I've thus
reduced the pin-mode-updates from dev_info to dev_dbg.
I've recently bolted on a proto sysfs interface for both new drivers. Im not
including those patches here; they (the patch + doc-pre-patch) are still quite
raw (and unreviewed on KNML), and since they 'invent' a convention for GPIO, a
proper vetting is needed. Since this patchset is much bigger than my previous
ones, Id like to keep things simpler, and address it 1st, before bolting on
more stuff.
The driver-split
The Geode CPU and the PC-87366 Super-IO chip have GPIO units which share a
common pin-architecture (same pin features, with same bits controlling), but
with different addressing mechanics and port organizations.
The vintage driver expresses the pin capabilities with pin-mode commands
[OoPpTt],etc that change the pin configurations, and since the 2 chips share
pin-arch, we can reuse the read(), write() commands, once the implementation
is suitably adjusted.
The patchset adds a vtable: struct nsc_gpio_ops, to abstract the existing gpio
operations, then adjusts fileops.write() code to invoke operations via that
vtable. Driver specific open()s set private_data to the vtable so its
available for use by write().
The vtable gets the gpio_dump() too, since its user-friendly, and (could be
construed as) part of the current device-file interface. To support use of
dev_dbg() in write() & _dump(), the vtable gets a dev ptr too, set by both
scx200 & pc8736x _gpio drivers.
heres how the pins are presented in syslog:
[ 1890.176223] scx200_gpio.0: io00: 0x0044 TS OD PUE EDGE LO DEBOUNCE
[ 1890.287223] scx200_gpio.0: io01: 0x0003 OE PP PUD EDGE LO
nsc_gpio.c: new file is new home of several file-ops methods, which are
modified to get their vtable from filp->private_data, and use it where needed.
scx200_gpio.c: keeps some of its existing gpio routines, but now wires them up
via the vtable (they're invoked by nsc_gpio.c:nsc_gpio_write() thru this
vtable). A driver-spcific open() initializes filp->private_data with the
vtable.
Once the split is clean, and the scx200_gpio driver is working, we copy and
modify the function and variable names, and rework the access-method bodies
for the different addressing scheme.
Heres a working overview of the patchset:
# series file for GPIO
# Spring Cleaning
gpio-scx/patch.preclean # scripts/Lindent fixes, editor-ctrl comments
# API Modernization
gpio-scx/patch.api26 # what I learned from LDD3
gpio-scx/patch.platform-dev-2 # get pdev, support for dev_dbg()
gpio-scx/patch.unsigned-minor # fix to match std practice
# Debuggability
gpio-scx/patch.dump-diet # shrink gpio_dump()
gpio-scx/patch.viewpins # add new 'command' to call dump()
gpio-scx/patch.init-refactor # pull shadow-register init to sub
# Access-Abstraction (add vtable)
gpio-scx/patch.access-vtable # introduce nsg_gpio_ops vtable, w dump
gpio-scx/patch.vtable-calls # add & use the vtable in scx200_gpio
gpio-scx/patch.nscgpio-shell # add empty driver for common-fops
# move code under abstraction
gpio-scx/patch.migrate-fops # move file-ops methods from scx200_gpio
gpio-scx/patch.common-dump # mv scx200.c:scx200_gpio_dump() to nsc_gpio.c
gpio-scx/patch.add-pc8736x-gpio # add new driver, like old, w chip adapt
# gpio-scx/patch.add-DEBUG # enable all dev_dbg()s
# Cleanups
# finish printk -> dev_dbg() etc
gpio-scx/patch.pdev-pc8736x # new drvr needs pdev too,
gpio-scx/patch.devdbg-nscgpio # add device to 'vtable', use in dev_dbg()
# gpio-scx/patch.pin-config-view # another 'c' 'command'
# gpio-scx/quiet-getset # take out excess dbg stuff (pretty quiet
now)
gpio-scx/patch.shadow-current # imitate scx200_gpio's shadow regs in
pc87*
# post KMentors-post patches ..
gpio-scx/patch.mutexes # use mutexes for config-locks
gpio-scx/patch.viewpins-values # extend dump to obsolete separate 'c' cmd
gpio-scx/patch.kconfig # add stuff for kbuild
# TBC
# combine api26 with pdev, which is just one step.
# merge c&v commands to single do-all-fn
# delay viewpins, dump-diet should also un-ifdef it too.
diff.sys-gpio-rollup-1
This patch:
Removed editor format-control comments, and used scripts/Lindent to clean up
whitespace, then deleted the bogus chunks :-(
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Define new macros register_hotcpu_notifier() and unregister_hotcpu_notifier()
that redefines register_cpu_notifier() and unregister_cpu_notifier() for use
only when HOTPLUG_CPU is defined.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
CPUs come online only at init time (unless CONFIG_HOTPLUG_CPU is defined).
So, cpu_notifier functionality need to be available only at init time.
This patch makes register_cpu_notifier() available only at init time, unless
CONFIG_HOTPLUG_CPU is defined.
This patch exports register_cpu_notifier() and unregister_cpu_notifier() only
if CONFIG_HOTPLUG_CPU is defined.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add operations for the call_rcu_bh() variant of RCU. Also add an
rcu_batches_completed_bh() function, which is needed by rcutorture.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We include config.h on the compiler command line. There's no need for it
to be included again.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
- add a proper prototype for the following global function:
- buffer_init()
- make the following needlessly global function static:
- end_buffer_async_write()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add more poison values to include/linux/poison.h. It's not clear to me
whether some others should be added or not, so I haven't added any of
these:
./include/linux/libata.h:#define ATA_TAG_POISON 0xfafbfcfdU
./arch/ppc/8260_io/fcc_enet.c:1918: memset((char *)(&(immap->im_dprambase[(mem_addr+64)])), 0x88, 32);
./drivers/usb/mon/mon_text.c:429: memset(mem, 0xe5, sizeof(struct mon_event_text));
./drivers/char/ftape/lowlevel/ftape-ctl.c:738: memset(ft_buffer[i]->address, 0xAA, FT_BUFF_SIZE);
./drivers/block/sx8.c:/* 0xf is just arbitrary, non-zero noise; this is sorta like poisoning */
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Update two drivers to use poison.h.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Localize poison values into one header file for better documentation and
easier/quicker debugging and so that the same values won't be used for
multiple purposes.
Use these constants in core arch., mm, driver, and fs code.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Move the i386 VDSO down into a vma and thus randomize it.
Besides the security implications, this feature also helps debuggers, which
can COW a vma-backed VDSO just like a normal DSO and can thus do
single-stepping and other debugging features.
It's good for hypervisors (Xen, VMWare) too, which typically live in the same
high-mapped address space as the VDSO, hence whenever the VDSO is used, they
get lots of guest pagefaults and have to fix such guest accesses up - which
slows things down instead of speeding things up (the primary purpose of the
VDSO).
There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
for older glibcs that still rely on a prelinked high-mapped VDSO. Newer
distributions (using glibc 2.3.3 or later) can turn this option off. Turning
it off is also recommended for security reasons: attackers cannot use the
predictable high-mapped VDSO page as syscall trampoline anymore.
There is a new vdso=[0|1] boot option as well, and a runtime
/proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
on/off.
(This version of the VDSO-randomization patch also has working ELF
coredumping, the previous patch crashed in the coredumping code.)
This code is a combined work of the exec-shield VDSO randomization
code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
started this patch and i completed it.
[akpm@osdl.org: cleanups]
[akpm@osdl.org: compile fix]
[akpm@osdl.org: compile fix 2]
[akpm@osdl.org: compile fix 3]
[akpm@osdl.org: revernt MAXMEM change]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Cc: Gerd Hoffmann <kraxel@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
With Goto-san's patch, we can add new pgdat/node at runtime. I'm now
considering node-hot-add with cpu + memory on ACPI.
I found acpi container, which describes node, could evaluate cpu before
memory. This means cpu-hot-add occurs before memory hot add.
In most part, cpu-hot-add doesn't depend on node hot add. But register_cpu(),
which creates symbolic link from node to cpu, requires that node should be
onlined before register_cpu(). When a node is onlined, its pgdat should be
there.
This patch-set holds off creating symbolic link from node to cpu
until node is onlined.
This removes node arguments from register_cpu().
Now, register_cpu() requires 'struct node' as its argument. But the array of
struct node is now unified in driver/base/node.c now (By Goto's node hotplug
patch). We can get struct node in generic way. So, this argument is not
necessary now.
This patch also guarantees add cpu under node only when node is onlined. It
is necessary for node-hot-add vs. cpu-hot-add patch following this.
Moreover, register_cpu calculates cpu->node_id by cpu_to_node() without regard
to its 'struct node *root' argument. This patch removes it.
Also modify callers of register_cpu()/unregister_cpu, whose args are changed
by register-cpu-remove-node-struct patch.
[Brice.Goglin@ens-lyon.org: fix it]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
pgdat and per node data
This is a patch to allocate pgdat and per node data area for ia64. The size
for them can be calculated by compute_pernodesize().
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
address array
This is to refresh node_data[] array for ia64. As I mentioned previous
patches, ia64 has copies of information of pgdat address array on each node as
per node data.
At v2 of node_add, this function used stop_machine_run() to update them. (I
wished that they were copied safety as much as possible.) But, in this patch,
this arrays are just copied simply, and set node_online_map bit after
completion of pgdat initialization.
So, kernel must touch NODE_DATA() macro after checking node_online_map().
(Current code has already done it.) This is more simple way for just
hot-add.....
Note : It will be problem when hot-remove will occur,
because, even if online_map bit is set, kernel may
touch NODE_DATA() due to race condition. :-(
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
When new node becomes enable by hot-add, new sysfs file must be created for
new node. So, if new node is enabled by add_memory(), register_one_node() is
called to create it. In addition, I386's arch_register_node() and a part of
register_nodes() of powerpc are consolidated to register_one_node() as a
generic_code().
This is tested by Tiger4(IPF) with node hot-plug emulation.
Signed-off-by: Keiichiro Tokunaga <tokuanga.keiich@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch allows hot-add memory which is not aligned to section.
Now, hot-added memory has to be aligned to section size. Considering big
section sized archs, this is not useful.
When hot-added memory is registerd as iomem resoruce by iomem resource
patch, we can make use of that information to detect valid memory range.
Note: With this, not-aligned memory can be registerd. To allow hot-add
memory with holes, we have to do more work around add_memory().
(It doesn't allows add memory to already existing mem section.)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
When node is hot-added, kswapd for the node should start. This export kswapd
start function as kswapd_run() to use at add_memory().
[akpm@osdl.org: daemonize() isn't needed when using the kthread API]
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Refresh NODE_DATA() for generic archs. In this case, NODE_DATA(nid) ==
node_data[nid]. node_data[] is array of address of pgdat. So, refresh is
quite simple.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
For node hotplug, basically we have to allocate new pgdat. But, there are
several types of implementations of pgdat.
1. Allocate only pgdat.
This style allocate only pgdat area.
And its address is recorded in node_data[].
It is most popular style.
2. Static array of pgdat
In this case, all of pgdats are static array.
Some archs use this style.
3. Allocate not only pgdat, but also per node data.
To increase performance, each node has copy of some data as
a per node data. So, this area must be allocated too.
Ia64 is this style. Ia64 has the copies of node_data[] array
on each per node data to increase performance.
In this series of patches, treat (1) as generic arch.
generic archs can use generic function. (2) and (3) should have
its own if necessary.
This patch defines pgdat allocator.
Updating NODE_DATA() macro function is in other patch.
Signed-off-by: Yasonori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This is to find node id from acpi's handle of memory_device in DSDT. _PXM for
the new node can be found by acpi_get_pxm() by using new memory's handle. So,
node id can be found by pxm_to_nid_map[].
This patch becomes simpler than v2 of node hot-add patch.
Because old add_memory() function doesn't have node id parameter.
So, kernel must find its handle by physical address via DSDT again.
But, v3 just give node id to add_memory() now.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Change the name of old add_memory() to arch_add_memory. And use node id to
get pgdat for the node at NODE_DATA().
Note: Powerpc's old add_memory() is defined as __devinit. However,
add_memory() is usually called only after bootup.
I suppose it may be redundant. But, I'm not well known about powerpc.
So, I keep it. (But, __meminit is better at least.)
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
typo fixes
Clean up 'inline is not at beginning' warnings for usb storage
Storage class should be first
i386: Trivial typo fixes
ixj: make ixj_set_tone_off() static
spelling fixes
fix paniced->panicked typos
Spelling fixes for Documentation/atomic_ops.txt
move acknowledgment for Mark Adler to CREDITS
remove the bouncing email address of David Campbell
|
|
This reverts commit c7b2eff059fcc2d1b7085ee3d84b79fd657a537b.
Hugh Dickins explains:
"It seems too little tested: "losetup -d /dev/loop0" fails with
EINVAL because nothing sets lo_thread; but even when you patch
loop_thread() to set lo->lo_thread = current, it can't survive
more than a few dozen iterations of the loop below (with a tmpfs
mounted on /tst):
j=0
cp /dev/zero /tst
while :
do
let j=j+1
echo "Doing pass $j"
losetup /dev/loop0 /tst/zero
mkfs -t ext2 -b 1024 /dev/loop0 >/dev/null 2>&1
mount -t ext2 /dev/loop0 /mnt
umount /mnt
losetup -d /dev/loop0
done
it collapses with failed ioctl then BUG_ON(!bio).
I think the original lo_done completion was more subtle and safe
than the kthread conversion has allowed for."
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
kbuild: trivial fixes in Makefile
kbuild: adding symbols in Kconfig and defconfig to TAGS
kbuild: replace abort() with exit(1)
kbuild: support for %.symtypes files
kbuild: fix silentoldconfig recursion
kbuild: add option for stripping modules while installing them
kbuild: kill some false positives from modpost
kbuild: export-symbol usage report generator
kbuild: fix make -rR breakage
kbuild: append -dirty for updated but uncommited changes
kbuild: append git revision for all untagged commits
kbuild: fix module.symvers parsing in modpost
kbuild: ignore make's built-in rules & variables
kbuild: bugfix with initramfs
kbuild: modpost build fix
kbuild: check license compatibility when building modules
kbuild: export-type enhancement to modpost.c
kbuild: add dependency on kernel.release to the package targets
kbuild: `make kernelrelease' speedup
kconfig: KCONFIG_OVERWRITECONFIG
...
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6:
[CRYPTO] tcrypt: Forbid tcrypt from being built-in
[CRYPTO] aes: Add wrappers for assembly routines
[CRYPTO] tcrypt: Speed benchmark support for digest algorithms
[CRYPTO] tcrypt: Return -EAGAIN from module_init()
[CRYPTO] api: Allow replacement when registering new algorithms
[CRYPTO] api: Removed const from cra_name/cra_driver_name
[CRYPTO] api: Added cra_init/cra_exit
[CRYPTO] api: Fixed incorrect passing of context instead of tfm
[CRYPTO] padlock: Rearrange context structure to reduce code size
[CRYPTO] all: Pass tfm instead of ctx to algorithms
[CRYPTO] digest: Remove unnecessary zeroing during init
[CRYPTO] aes-i586: Get rid of useless function wrappers
[CRYPTO] digest: Add alignment handling
[CRYPTO] khazad: Use 32-bit reads on key
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/dtor/input:
Input: iforce - remove some pointless casts
Input: psmouse - add support for Intellimouse 4.0
Input: atkbd - fix HANGEUL/HANJA keys
Input: fix misspelling of Hangeul key
Input: via-pmu - add input device support
Input: rearrange exports
Input: fix formatting to better follow CodingStyle
Input: reset name, phys and uniq when unregistering
Input: return correct size when reading modalias attribute
Input: change my e-mail address in MAINTAINERS file
Input: fix potential overflows in driver/input/keyboard
Input: fix potential overflows in driver/input/touchscreen
Input: fix potential overflows in driver/input/joystick
Input: fix potential overflows in driver/input/mouse
Input: fix accuracy of fixp-arith.h
Input: iforce - use ENOSPC instead of ENOMEM
Input: constify drivers/char/keyboard.c
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
V4L/DVB (4227): Update this driver for recent header file movement.
V4L/DVB (4223): Add V4L2_CID_MPEG_STREAM_VBI_FMT control
V4L/DVB (4222): Always switch tuner mode when calling VIDIOC_S_FREQUENCY.
V4L/DVB (4221): Add HM12 YUV format define.
V4L/DVB (4219): Av7110: analog sound output of DVB-C rev 2.3
V4L/DVB (4217): Fix a misplaced closing bracket/else, which caused swzigzag not to be called
V4L/DVB (4215): Make VIDEO_CX88_BLACKBIRD a separate build option
V4L/DVB (4214): Make VIDEO_CX2341X a selectable build option
V4L/DVB (4213): Cx88: cleanups
V4L/DVB (4211): Fix an Oops for all fe that have get_frontend_algo == NULL
|
|
* x86-64: (83 commits)
[PATCH] x86_64: x86_64 stack usage debugging
[PATCH] x86_64: (resend) x86_64 stack overflow debugging
[PATCH] x86_64: msi_apic.c build fix
[PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
[PATCH] x86_64: Avoid broadcasting NMI IPIs
[PATCH] x86_64: fix apic error on bootup
[PATCH] x86_64: enlarge window for stack growth
[PATCH] x86_64: Minor string functions optimizations
[PATCH] x86_64: Move export symbols to their C functions
[PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR
[PATCH] x86_64: Fix modular pc speaker
[PATCH] x86_64: remove sys32_ni_syscall()
[PATCH] x86_64: Do not use -ffunction-sections for modules
[PATCH] x86_64: Add cpu_relax to apic_wait_icr_idle
[PATCH] x86_64: adjust kstack_depth_to_print default
[PATCH] i386/x86-64: adjust /proc/interrupts column headings
[PATCH] x86_64: Fix race in cpu_local_* on preemptible kernels
[PATCH] x86_64: Fix fast check in safe_smp_processor_id
[PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information
[PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
...
Manual resolve of trivial conflict in arch/i386/kernel/Makefile
|
|
In timekeeping code, one often does need to use conversion constants. Naming
these leads to code that's easier to understand, showing the reader between
which units the conversion is made.
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add proper conditionals to be able to build with CONFIG_MODULES=n.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
If no unwinding is possible at all for a certain exception instance,
fall back to the old style call trace instead of not showing any trace
at all.
Also, allow setting the stack trace mode at the command line.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
These are the generic bits needed to enable reliable stack traces based
on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will
enable x86-64 and i386 to make use of this.
Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities
for improvement.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use inline code bitmaps <= BITS_PER_LONG in bitmap_weight. This
gives _much_ better code.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Sometimes e.g. with crashme the compat layer warnings can be noisy.
Add a way to turn them off by gating all output through compat_printk
that checks a global sysctl. The default is not changed.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
[IOAT]: Do not dereference THIS_MODULE directly to set unsafe.
[NETROM]: Fix possible null pointer dereference.
[NET] netpoll: break recursive loop in netpoll rx path
[NET] netpoll: don't spin forever sending to stopped queues
[IRDA]: add some IBM think pads
[ATM]: atm/mpc.c warning fix
[NET]: skb_find_text ignores to argument
[NET]: make net/core/dev.c:netdev_nit static
[NET]: Fix GSO problems in dev_hard_start_xmit()
[NET]: Fix CHECKSUM_HW GSO problems.
[TIPC]: Fix incorrect correction to discovery timer frequency computation.
[TIPC]: Get rid of dynamically allocated arrays in broadcast code.
[TIPC]: Fixed link switchover bugs
[TIPC]: Enhanced & cleaned up system messages; fixed 2 obscure memory leaks.
[TIPC]: First phase of assert() cleanup
[TIPC]: Disallow config operations that aren't supported in certain modes.
[TIPC]: Fixed memory leak in tipc_link_send() when destination is unreachable
[TIPC]: Added missing warning for out-of-memory condition
[TIPC]: Withdrawing all names from nameless port now returns success, not error
[TIPC]: Optimized argument validation done by connect().
...
|
|
- record the 'event' count on each individual device (they
might sometimes be slightly different now)
- add a new value for 'sb_dirty': '3' means that the super
block only needs to be updated to record a clean<->dirty
transition.
- Prefer odd event numbers for dirty states and even numbers
for clean states
- Using all the above, don't update the superblock on
a spare device if the update is just doing a clean-dirty
transition. To accomodate this, a transition from
dirty back to clean might now decrement the events counter
if nothing else has changed.
The net effect of this is that spare drives will not see any IO requests
during normal running of the array, so they can go to sleep if that is what
they want to do.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
If md is asked to store a bitmap in a file, it tries to hold onto the page
cache pages for that file, manipulate them directly, and call a cocktail of
operations to write the file out. I don't believe this is a supportable
approach.
This patch changes the approach to use the same approach as swap files. i.e.
bmap is used to enumerate all the block address of parts of the file and we
write directly to those blocks of the device.
swapfile only uses parts of the file that provide a full pages at contiguous
addresses. We don't have that luxury so we have to cope with pages that are
non-contiguous in storage. To handle this we attach buffers to each page, and
store the addresses in those buffers.
With this approach the pagecache may contain data which is inconsistent with
what is on disk. To alleviate the problems this can cause, md invalidates the
pagecache when releasing the file. If the file is to be examined while the
array is active (a non-critical but occasionally useful function), O_DIRECT io
must be used. And new version of mdadm will have support for this.
This approach simplifies a lot of code:
- we no longer need to keep a list of pages which we need to wait for,
as the b_endio function can keep track of how many outstanding
writes there are. This saves a mempool.
- -EAGAIN returns from write_page are no longer possible (not sure if
they ever were actually).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
md/bitmap currently has a separate thread to wait for writes to the bitmap
file to complete (as we cannot get a callback on that action).
However this isn't needed as bitmap_unplug is called from process context and
waits for the writeback thread to do it's work. The same result can be
achieved by doing the waiting directly in bitmap_unplug.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch makes the needlessly global md_print_devices() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|