Age | Commit message (Collapse) | Author |
|
By default, tun.c running in TUN_TUN_DEV mode will set the protocol of
packet to IPv4 if TUN_NO_PI is set. My program failed to work when I
assumed that the driver will check the first nibble of packet,
determine IP version and set the appropriate protocol.
Signed-off-by: Ang Way Chuang <wcang@nav6.org>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When generating the ip header for the transformed packet we just copy
the frag_off field of the ip header from the original packet to the ip
header of the new generated packet. If we receive a packet as a chain
of fragments, all but the last of the new generated packets have the
IP_MF flag set. We have to mask the frag_off field to only keep the
IP_DF flag from the original packet. This got lost with git commit
36cf9acf93e8561d9faec24849e57688a81eb9c5 ("[IPSEC]: Separate
inner/outer mode processing on output")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The H.245 helper is not registered/unregistered, but assigned to
connections manually from the Q.931 helper. This means on unload
existing expectations and connections using the helper are not
cleaned up, leading to the following oops on module unload:
CPU 0 Unable to handle kernel paging request at virtual address c00a6828, epc == 802224dc, ra == 801d4e7c
Oops[#1]:
Cpu 0
$ 0 : 00000000 00000000 00000004 c00a67f0
$ 4 : 802a5ad0 81657e00 00000000 00000000
$ 8 : 00000008 801461c8 00000000 80570050
$12 : 819b0280 819b04b0 00000006 00000000
$16 : 802a5a60 80000000 80b46000 80321010
$20 : 00000000 00000004 802a5ad0 00000001
$24 : 00000000 802257a8
$28 : 802a4000 802a59e8 00000004 801d4e7c
Hi : 0000000b
Lo : 00506320
epc : 802224dc ip_conntrack_help+0x38/0x74 Tainted: P
ra : 801d4e7c nf_iterate+0xbc/0x130
Status: 1000f403 KERNEL EXL IE
Cause : 00800008
BadVA : c00a6828
PrId : 00019374
Modules linked in: ip_nat_pptp ip_conntrack_pptp ath_pktlog wlan_acl wlan_wep wlan_tkip wlan_ccmp wlan_xauth ath_pci ath_dev ath_dfs ath_rate_atheros wlan ath_hal ip_nat_tftp ip_conntrack_tftp ip_nat_ftp ip_conntrack_ftp pppoe ppp_async ppp_deflate ppp_mppe pppox ppp_generic slhc
Process swapper (pid: 0, threadinfo=802a4000, task=802a6000)
Stack : 801e7d98 00000004 802a5a60 80000000 801d4e7c 801d4e7c 802a5ad0 00000004
00000000 00000000 801e7d98 00000000 00000004 802a5ad0 00000000 00000010
801e7d98 80b46000 802a5a60 80320000 80000000 801d4f8c 802a5b00 00000002
80063834 00000000 80b46000 802a5a60 801e7d98 80000000 802ba854 00000000
81a02180 80b7e260 81a021b0 819b0000 819b0000 80570056 00000000 00000001
...
Call Trace:
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4f8c>] nf_hook_slow+0x9c/0x1a4
One way to fix this would be to split helper cleanup from the unregistration
function and invoke it for the H.245 helper, but since ctnetlink needs to be
able to find the helper for synchonization purposes, a better fix is to
register it normally and make sure its not assigned to connections during
helper lookup. The missing l3num initialization is enough for this, this
patch changes it to use AF_UNSPEC to make it more explicit though.
Reported-by: liannan <liannan@twsz.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
path
Properly free h323_buffer when helper registration fails.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix three ct_extend/NAT extension related races:
- When cleaning up the extension area and removing it from the bysource hash,
the nat->ct pointer must not be set to NULL since it may still be used in
a RCU read side
- When replacing a NAT extension area in the bysource hash, the nat->ct
pointer must be assigned before performing the replacement
- When reallocating extension storage in ct_extend, the old memory must
not be freed immediately since it may still be used by a RCU read side
Possibly fixes https://bugzilla.redhat.com/show_bug.cgi?id=449315
and/or http://bugzilla.kernel.org/show_bug.cgi?id=10875
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
From: "Robert T. Johnson" <rtjohnso@eecs.berkeley.edu>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
|
|
From: Eric Kinzie <ekinzie@cmf.nrl.navy.mil>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This causes the suni driver to oops if you try to use sonetdiag to get
the statistics. Also add the corresponding phy->stop call to fix another
oops if you try to remove the module.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It happens that if a packet arrives in a VC between the call to open it on
the hardware and the call to change the backend to br2684, br2684_regvcc
processes the packet and oopses dereferencing skb->dev because it is
NULL before the call to br2684_push().
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
|
|
1) Remove ICMP_MIN_LENGTH, as it is unused.
2) Remove unneeded tcp_v4_send_check() declaration.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I just noticed "cat /proc/net/raw" was buggy, missing '\n' separators.
I believe this was introduced by commit 8cd850efa4948d57a2ed836911cfd1ab299e89c6
([RAW]: Cleanup IPv4 raw_seq_show.)
This trivial patch restores correct behavior, and applies to current
Linus tree (should also be applied to stable tree as well.)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Selected device feature bits can be propagated to VLAN devices, so we
can make use of TX checksum offload and TSO on VLAN-tagged packets.
However, if the physical device does not do VLAN tag insertion or
generic checksum offload then the test for TX checksum offload in
dev_queue_xmit() will see a protocol of htons(ETH_P_8021Q) and yield
false.
This splits the checksum offload test into two functions:
- can_checksum_protocol() tests a given protocol against a feature bitmask
- dev_can_checksum() first tests the skb protocol against the device
features; if that fails and the protocol is htons(ETH_P_8021Q) then
it tests the encapsulated protocol against the effective device
features for VLANs
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Right now, any time we set a primary transport we set
the changeover_active flag. As a result, we invoke SFR-CACC
even when there has been no changeover events.
Only set changeover_active, when there is a true changeover
event, i.e. we had a primary path and we are changing to
another transport.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch remove the proc fs entry which has been created if fail to
set up proc fs entry for the SCTP protocol.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ingo's system is still seeing strange behavior, and he
reports that is goes away if the rest of the deferred
accept changes are reverted too.
Therefore this reverts e4c78840284f3f51b1896cf3936d60a6033c4d2c
("[TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack") and
539fae89bebd16ebeafd57a87169bc56eb530d76 ("[TCP]: TCP_DEFER_ACCEPT
updates - defer timeout conflicts with max_thresh").
Just like the other revert, these ideas can be revisited for
2.6.27
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We've introduced extra need of compat layer for ip_tunnel_prl{}
for PRL (Potential Router List) management. Though compat_ioctl
is still missing in ipv4/ipv6, let's make the interface more
straight-forward and eliminate extra need for nasty compat layer
anyway since the interface is new for 2.6.26.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a htb_hysteresis parameter to htb_sch.ko and by sysfs magic make
it runtime adjustable via
/sys/module/sch_htb/parameters/htb_hysteresis mode 640.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Martin Devera <devik@cdi.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The HTB hysteresis mode reduce the CPU load, but at the
cost of scheduling accuracy.
On ADSL links (512 kbit/s upstream), this inaccuracy introduce
significant jitter, enought to disturbe VoIP. For details see my
masters thesis (http://www.adsl-optimizer.dk/thesis/), chapter 7,
section 7.3.1, pp 69-70.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Martin Devera <devik@cdi.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6
|
|
Add new rt73usb USB ID for D-Link DWA111
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
This patch adds '\n' in debug printk (wme.c HT DEBUG)
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
The patch checks interface status, if it is in IBSS_JOINED mode
show cell id it is associated with.
Signed-off-by: Abhijeet Kolekar <abhijeet.kolekar@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
This fixes setting the coherent DMA mask for PCI devices.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
Config symbols that select LEDS_CLASS need to depend on NEW_LEDS so that
undefined symbols are not used in the build.
The alternative is to select NEW_LEDS, which some drivers do.
This patch fixes the led_* symbols build errors.
(.text+0x174cdc): undefined reference to `input_unregister_device'
(.text+0x174d9f): undefined reference to `input_allocate_device'
(.text+0x174e2d): undefined reference to `input_register_device'
(.text+0x174e53): undefined reference to `input_free_device'
rt2x00rfkill.c:(.text+0x176dc4): undefined reference to `input_allocate_polled_device'
rt2x00rfkill.c:(.text+0x176e8b): undefined reference to `input_event'
rt2x00rfkill.c:(.text+0x176e9f): undefined reference to `input_event'
(.text+0x176eca): undefined reference to `input_unregister_polled_device'
(.text+0x176efc): undefined reference to `input_free_polled_device'
(.text+0x176f37): undefined reference to `input_free_polled_device'
(.text+0x176fd8): undefined reference to `input_register_polled_device'
(.text+0x1772c0): undefined reference to `led_classdev_resume'
(.text+0x1772d4): undefined reference to `led_classdev_resume'
(.text+0x1772e8): undefined reference to `led_classdev_resume'
(.text+0x17730a): undefined reference to `led_classdev_suspend'
(.text+0x17731e): undefined reference to `led_classdev_suspend'
(.text+0x17732f): undefined reference to `led_classdev_suspend'
rt2x00leds.c:(.text+0x177348): undefined reference to `led_classdev_unregister'
rt2x00leds.c:(.text+0x1773c0): undefined reference to `led_classdev_register'
rfkill-input.c:(.text+0x209e4c): undefined reference to `input_close_device'
rfkill-input.c:(.text+0x209e53): undefined reference to `input_unregister_handle'
rfkill-input.c:(.text+0x209ea1): undefined reference to `input_register_handle'
rfkill-input.c:(.text+0x209eae): undefined reference to `input_open_device'
rfkill-input.c:(.text+0x209ebb): undefined reference to `input_unregister_handle'
rfkill-input.c:(.init.text+0x17405): undefined reference to `input_register_handler'
rfkill-input.c:(.exit.text+0x194f): undefined reference to `input_unregister_handler'
make[1]: *** [vmlinux] Error 1
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
Config symbols that select RFKILL need to depend on INPUT so that
undefined symbols are not used in the build.
This patch fixes the input_* symbols build errors.
(.text+0x174cdc): undefined reference to `input_unregister_device'
(.text+0x174d9f): undefined reference to `input_allocate_device'
(.text+0x174e2d): undefined reference to `input_register_device'
(.text+0x174e53): undefined reference to `input_free_device'
rt2x00rfkill.c:(.text+0x176dc4): undefined reference to `input_allocate_polled_device'
rt2x00rfkill.c:(.text+0x176e8b): undefined reference to `input_event'
rt2x00rfkill.c:(.text+0x176e9f): undefined reference to `input_event'
(.text+0x176eca): undefined reference to `input_unregister_polled_device'
(.text+0x176efc): undefined reference to `input_free_polled_device'
(.text+0x176f37): undefined reference to `input_free_polled_device'
(.text+0x176fd8): undefined reference to `input_register_polled_device'
(.text+0x1772c0): undefined reference to `led_classdev_resume'
(.text+0x1772d4): undefined reference to `led_classdev_resume'
(.text+0x1772e8): undefined reference to `led_classdev_resume'
(.text+0x17730a): undefined reference to `led_classdev_suspend'
(.text+0x17731e): undefined reference to `led_classdev_suspend'
(.text+0x17732f): undefined reference to `led_classdev_suspend'
rt2x00leds.c:(.text+0x177348): undefined reference to `led_classdev_unregister'
rt2x00leds.c:(.text+0x1773c0): undefined reference to `led_classdev_register'
rfkill-input.c:(.text+0x209e4c): undefined reference to `input_close_device'
rfkill-input.c:(.text+0x209e53): undefined reference to `input_unregister_handle'
rfkill-input.c:(.text+0x209ea1): undefined reference to `input_register_handle'
rfkill-input.c:(.text+0x209eae): undefined reference to `input_open_device'
rfkill-input.c:(.text+0x209ebb): undefined reference to `input_unregister_handle'
rfkill-input.c:(.init.text+0x17405): undefined reference to `input_register_handler'
rfkill-input.c:(.exit.text+0x194f): undefined reference to `input_unregister_handler'
make[1]: *** [vmlinux] Error 1
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
This removes a WARN_ON that is responsible for the following koops:
http://www.kerneloops.org/searchweek.php?search=b43_generate_noise_sample
The comment in the patch describes why it's safe to simply remove
the check.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
This fixes a possible NULL pointer dereference in an error path of the
DMA allocation error checking code. This is also necessary for a future
DMA API change that is on its way into the mainline kernel that adds
an additional dev parameter to dma_mapping_error().
This patch moves the whole struct b43_dmaring struct initialization
right before any DMA allocation operation.
Reported-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
None of the rt2x00 PCI devices support 64-bit DMA addresses (they all
only accept 32-bit buffer addresses). Hence it makes no sense to try to
enable 64-bit DMA addresses. Only try to enable 32-bit DMA addresses.
Signed-off-by: Gertjan van Wingerde <gwingerde@kpnplanet.nl>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
This fixes a "BUG: unable to handle kernel paging request"
bug in rt73usb which was caused by killing the guardian_urb
while it had never been allocated for rt73usb.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
parisc: update my email address
parisc: fix miscompilation of ip_fast_csum with gcc >= 4.3
parisc: fix off by one in setup_sigcontext32
parisc: export empty_zero_page
parisc: export copy_user_page_asm
parisc: move head.S to head.text section
Revert "parisc: fix trivial section name warnings"
|
|
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
ip_fast_csum needs an asm "memory" clobber, otherwise the aggressive
optimizations in gcc-4.3 cause it to be miscompiled.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
Thankfully, the values were irrelevant... Spotted by
newer gcc.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
Needed by ext4 when built as a module.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
Needed by fuse (via copy_highpage).
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
And explicitly list it in vmlinux.lds...
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
This reverts commit bd3bb8c15b9a80dbddfb7905b237a4a11a4725b4.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ahci: Workaround HW bug for SB600/700 SATA controller PMP support
ahci: workarounds for mcp65
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
tcp: Revert 'process defer accept as established' changes.
ipv6: Fix duplicate initialization of rawv6_prot.destroy
bnx2x: Updating the Maintainer
net: Eliminate flush_scheduled_work() calls while RTNL is held.
drivers/net/r6040.c: correct bad use of round_jiffies()
fec_mpc52xx: MPC52xx_MESSAGES_DEFAULT: 2nd NETIF_MSG_IFDOWN => IFUP
ipg: fix receivemode IPG_RM_RECEIVEMULTICAST{,HASH} in ipg_nic_set_multicast_list()
netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()
netfilter: Make nflog quiet when no one listen in userspace.
ipv6: Fail with appropriate error code when setting not-applicable sockopt.
ipv6: Check IPV6_MULTICAST_LOOP option value.
ipv6: Check the hop limit setting in ancillary data.
ipv6 route: Fix route lifetime in netlink message.
ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).
dccp: Bug in initial acknowledgment number assignment
dccp ccid-3: X truncated due to type conversion
dccp ccid-3: TFRC reverse-lookup Bug-Fix
dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets
dccp: Fix sparse warnings
dccp ccid-3: Bug-Fix - Zero RTT is possible
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: get leo framebuffer working
|
|
There is one bug in ATI SATA PMP of SB600 and SB700 old revision, which leads
to soft reset failure. This patch can fix the bug.
Signed-off-by: Shane Huang <shane.huang@amd.com>
Acked-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
|
MCP65 ahci can do NCQ but doesn't set the CAP bit and rev A0 and A1
can't do MSI but have MSI capability. Implement AHCI_HFLAG_YES_NCQ
and apply appropriate workarounds.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
|
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 5091/1: Add missing bitfield include to regs-lcd.h
[ARM] 5090/1: Correct pxafb palette typo error
[ARM] 5077/1: spi: fix list scan success verification in PXA ssp driver
|
|
Ramtron FM3130 is a chip with two separate devices inside, RTC clock and
FRAM. This driver provides only RTC functionality.
This chip is met in lots of custom boards with AT91SAMXXXX CPU I work
with, is cheap and in no way better or worse than any other RTC on market.
While it is mostly met on much smaller devices, I think it is great to
have it supported in Linux.
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
More Kconfig tweaks related to the legacy PC RTC code:
- Describe the legacy PC RTC driver as such ... it's never quite
been clear that this driver is for PC RTCs, and now it's fair
to call this the "legacy" driver.
- Force it to understand about HPET stealing its IRQs ... kernel
code does this always when HPET is in use, there should be no
option for users to goof up the config.
This seems to fix kernel bugzilla #10729.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Recently (around 2.6.25) I've noticed that RTC no longer works for me. It
turned out this is because I use pnpacpi=off kernel option to work around
the parport_pc bugs. I always did so, but RTC used to work fine in the
past, and now it have regressed.
The patch fixes the problem by creating the platform device for the RTC
when PNP is disabled. This may also help running the PNP-enabled kernel
on an older PCs.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Cc: David Brownell <david-b@pacbell.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We shrink a radix tree when its root node has only one child, in the left
most slot. The child becomes the new root node. To perform this
operation in a manner compatible with concurrent lockless lookups, we
atomically switch the root pointer from the parent to its child.
However a concurrent lockless lookup may now have loaded a pointer to the
parent (and is presently deciding what to do next). For this reason, we
also have to keep the parent node in a valid state after shrinking the
tree, until the next RCU grace period -- otherwise this lookup with the
parent pointer may not do the right thing. Notably, we need to keep the
child in the left most slot there in case that is requested by the lookup.
This is all pretty standard RCU stuff. It is worth repeating because in
my eagerness to obey the radix tree node constructor scheme, I had broken
it by zeroing the radix tree node before the grace period.
What could happen is that a lookup can load the parent pointer, then
decide it wants to follow the left most child slot, only to find the slot
contained NULL due to the concurrent shrinker having zeroed the parent
node before waiting for a grace period. The lookup would return a false
negative as a result.
Fix it by doing that clearing in the RCU callback. I would normally want
to rip out the constructor entirely, but radix tree nodes are one of those
places where they make sense (only few cachelines will be touched soon
after allocation).
This was never actually found in any lockless pagecache testing or by the
test harness, but by seeing the odd problem with my scalable vmap rewrite.
I have not tickled the test harness into reproducing it yet, but I'll
keep working at it.
Fortunately, it is not a problem anywhere lockless pagecache is used in
mainline kernels (pagecache probe is not a guarantee, and brd does not
have concurrent lookups and deletes).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|