Age | Commit message (Collapse) | Author |
|
NFS unregisters sysctls only if V4 support is compiled in. However, sysctl
table is not V4 specific, so unregister it always.
Steps to reproduce:
[build nfs.ko with CONFIG_NFS_V4=n]
modrobe nfs
rmmod nfs
ls /proc/sys
Unable to handle kernel paging request at ffffffff880661c0 RIP:
[<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
PGD 203067 PUD 207063 PMD 7e216067 PTE 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: lockd nfs_acl sunrpc
Pid: 3335, comm: ls Not tainted 2.6.23-rc3-bloat #2
RIP: 0010:[<ffffffff802af8e3>] [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
RSP: 0018:ffff81007fd93e78 EFLAGS: 00010286
RAX: ffffffff880661c0 RBX: ffffffff80466370 RCX: ffffffff880661c0
RDX: 00000000000014c0 RSI: ffff81007f3ad020 RDI: ffff81007efd8b40
RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff802a8570 R12: ffffffff880661c0
R13: ffff81007e219640 R14: ffff81007efd8b40 R15: ffff81007ded7280
FS: 00002ba25ef03060(0000) GS:ffff81007ff81258(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff880661c0 CR3: 000000007dfaf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 3335, threadinfo ffff81007fd92000, task ffff81007d8a0000)
Stack: ffff81007f3ad150 ffffffff80283f30 ffff81007fd93f48 ffff81007efd8b40
ffff81007ee00440 0000000422222222 0000000200035593 ffffffff88037e9a
2222222222222222 ffffffff80466500 ffff81007e416400 ffff81007e219640
Call Trace:
[<ffffffff80283f30>] filldir+0x0/0xf0
[<ffffffff80283f30>] filldir+0x0/0xf0
[<ffffffff802840c7>] vfs_readdir+0xa7/0xc0
[<ffffffff80284376>] sys_getdents+0x96/0xe0
[<ffffffff8020bb3e>] system_call+0x7e/0x83
Code: 41 8b 14 24 85 d2 74 dc 49 8b 44 24 08 48 85 c0 74 e7 49 3b
RIP [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
RSP <ffff81007fd93e78>
CR2: ffffffff880661c0
Kernel panic - not syncing: Fatal exception
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Ryusuke Konishi says:
The recent truncate_complete_page() clears the dirty flag from a page
before calling a_ops->invalidatepage(),
^^^^^^
static void
truncate_complete_page(struct address_space *mapping, struct page *page)
{
...
cancel_dirty_page(page, PAGE_CACHE_SIZE); <--- Inserted here at
kernel 2.6.20
if (PagePrivate(page))
do_invalidatepage(page, 0); ---> will call
a_ops->invalidatepage()
...
}
and this is disturbing nfs_wb_page_priority() from calling
nfs_writepage_locked() that is expected to handle the pending
request (=nfs_page) associated with the page.
int nfs_wb_page_priority(struct inode *inode, struct page *page, int how)
{
...
if (clear_page_dirty_for_io(page)) {
ret = nfs_writepage_locked(page, &wbc);
if (ret < 0)
goto out;
}
...
}
Since truncate_complete_page() will get rid of the page after
a_ops->invalidatepage() returns, the request (=nfs_page) associated
with the page becomes a garbage in nfs_inode->nfs_page_tree.
------------------------
Fix this by ensuring that nfs_wb_page_priority() recognises that it may
also need to clear out non-dirty pages that have an nfs_page associated
with them.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
According to the mount(2) man page, the proper error return code for the
mount(2) system call when the special device name or the mounted-on
directory name is too long is ENAMETOOLONG.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The hostname was getting truncated in the new text-based NFS mount API.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Don't filter the return code from the in-kernel rpcbind or NFS mount
clients. Return the real error code so that callers of the new NFS
text-based mount API can apply a useful retry strategy.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The new text-based NFS mount option parsing logic doesn't recognize any
valid transport protocols due to a silly mistake in the protocol token
matching logic. This prevents basic mount requests such as:
mount.nfs server:/export /mnt -o proto=tcp
from working with the new text-based NFS mount API.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This patch fixes an Oops that was reported by Gabriel Barazer.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This should fix the following Oops reported by Jeff Garzik:
kernel BUG at fs/nfs/nfs4xdr.c:1040!
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: nfs lockd sunrpc af_packet
ipv6 cpufreq_ondemand acpi_cpufreq battery floppy nvram sg snd_hda_intel
ata_generic snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_page_alloc e1000
firewire_ohci ata_piix i2c_core sr_mod cdrom sata_sil ahci libata sd_mod
scsi_mod ext3 jbd ehci_hcd uhci_hcd
Pid: 16353, comm: 10.10.10.1-recl Not tainted 2.6.23-rc3 #1
RIP: 0010:[<ffffffff88240980>] [<ffffffff88240980>] :nfs:encode_open+0x1c0/0x330
RSP: 0018:ffff8100467c5c60 EFLAGS: 00010202
RAX: ffff81000f89b8b8 RBX: 00000000697a6f6d RCX: ffff81000f89b8b8
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff8100467c5c80
RBP: ffff8100467c5c80 R08: ffff81000f89bc30 R09: ffff81000f89b83f
R10: 0000000000000001 R11: ffffffff881e79e0 R12: ffff81003cbd1808
R13: ffff81000f89b860 R14: ffff81005fc984e0 R15: ffffffff88240af0
FS: 0000000000000000(0000) GS:ffffffff8052a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002adb9e51a030 CR3: 000000007ea7e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process 10.10.10.1-recl (pid: 16353, threadinfo ffff8100467c4000, task ffff8100038ce780)
Stack: ffff81004aeb6a40 ffff81003cbd1808 ffff81003cbd1808 ffffffff88240b5d
ffff81000f89b8bc ffff81005fc984e8 ffff81000f89bc30 ffff81005fc984e8
0000000300000000 0000000000000000 0000000000000000 ffff81003cbd1800
Call Trace:
[<ffffffff88240b5d>] :nfs:nfs4_xdr_enc_open_noattr+0x6d/0x90
[<ffffffff881e74b7>] :sunrpc:rpcauth_wrap_req+0x97/0xf0
[<ffffffff88240af0>] :nfs:nfs4_xdr_enc_open_noattr+0x0/0x90
[<ffffffff881df57a>] :sunrpc:call_transmit+0x18a/0x290
[<ffffffff881e5e7b>] :sunrpc:__rpc_execute+0x6b/0x290
[<ffffffff881dff76>] :sunrpc:rpc_do_run_task+0x76/0xd0
[<ffffffff882373f6>] :nfs:_nfs4_proc_open+0x76/0x230
[<ffffffff88237a2e>] :nfs:nfs4_open_recover_helper+0x5e/0xc0
[<ffffffff88237b74>] :nfs:nfs4_open_recover+0xe4/0x120
[<ffffffff88238e14>] :nfs:nfs4_open_reclaim+0xa4/0xf0
[<ffffffff882413c5>] :nfs:nfs4_reclaim_open_state+0x55/0x1b0
[<ffffffff882417ea>] :nfs:reclaimer+0x2ca/0x390
[<ffffffff88241520>] :nfs:reclaimer+0x0/0x390
[<ffffffff8024e59b>] kthread+0x4b/0x80
[<ffffffff8020cad8>] child_rip+0xa/0x12
[<ffffffff8024e550>] kthread+0x0/0x80
[<ffffffff8020cace>] child_rip+0x0/0x12
Code: 0f 0b eb fe 48 89 ef c7 00 00 00 00 02 be 08 00 00 00 e8 79
RIP [<ffffffff88240980>] :nfs:encode_open+0x1c0/0x330
RSP <ffff8100467c5c60>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Doh! We can't use cancel_delayed_work_sync because we may have been called
from an unmount that was being performed by nfs_automount_task.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This avoids the recent NFS mount regression (returning EBUSY when
mounting the same filesystem twice with different parameters).
The best I can do given the constraints appears to be to have the kernel
first look for a superblock that matches both the fsid and the
user-specified mount options, and then spawn off a new superblock if
that search fails.
Note that this is not the same as specifying nosharecache everywhere
since nosharecache will never attempt to match an existing superblock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Hua Zhong <hzhong@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This will avoid deadlocks of the form:
stack backtrace:
[<c0104fda>] show_trace_log_lvl+0x1a/0x30
[<c0105c02>] show_trace+0x12/0x20
[<c0105d15>] dump_stack+0x15/0x20
[<c013ee42>] __lock_acquire+0xc22/0x1030
[<c013f2b1>] lock_acquire+0x61/0x80
[<c012edd9>] flush_workqueue+0x49/0x70
[<c012ee0d>] flush_scheduled_work+0xd/0x10
[<dcf55c0c>] nfs_release_automount_timer+0x2c/0x30 [nfs]
[<dcf45d8e>] nfs_free_server+0x9e/0xd0 [nfs]
[<dcf4e626>] nfs_kill_super+0x16/0x20 [nfs]
[<c017b38d>] deactivate_super+0x7d/0xa0
[<c018f94b>] mntput_no_expire+0x4b/0x80
[<c018fd94>] expire_mount_list+0xe4/0x140
[<c0191219>] mark_mounts_for_expiry+0x99/0xb0
[<dcf55d1d>] nfs_expire_automounts+0xd/0x40 [nfs]
[<c012e61b>] run_workqueue+0x12b/0x1e0
[<c012f05b>] worker_thread+0x9b/0x100
[<c0131c72>] kthread+0x42/0x70
[<c0104c0f>] kernel_thread_helper+0x7/0x18
=======================
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Doing so would require us to introduce bh-safe locks into put_rpccred().
This patch fixes the lockdep complaint reported by Marc Dietrich:
inconsistent {softirq-on-W} -> {in-softirq-W} usage.
swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(rpc_credcache_lock){-+..}, at: [<c01dc487>]
_atomic_dec_and_lock+0x17/0x60
{softirq-on-W} state was registered at:
[<c013e870>] __lock_acquire+0x650/0x1030
[<c013f2b1>] lock_acquire+0x61/0x80
[<c02db9ac>] _spin_lock+0x2c/0x40
[<c01dc487>] _atomic_dec_and_lock+0x17/0x60
[<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc]
[<dced56c1>] rpcauth_unbindcred+0x21/0x60 [sunrpc]
[<dced3fd4>] a0 [sunrpc]
[<dcecefe0>] rpc_call_sync+0x30/0x40 [sunrpc]
[<dcedc73b>] rpcb_register+0xdb/0x180 [sunrpc]
[<dced65b3>] svc_register+0x93/0x160 [sunrpc]
[<dced6ebe>] __svc_create+0x1ee/0x220 [sunrpc]
[<dced7053>] svc_create+0x13/0x20 [sunrpc]
[<dcf6d722>] nfs_callback_up+0x82/0x120 [nfs]
[<dcf48f36>] nfs_get_client+0x176/0x390 [nfs]
[<dcf49181>] nfs4_set_client+0x31/0x190 [nfs]
[<dcf49983>] nfs4_create_server+0x63/0x3b0 [nfs]
[<dcf52426>] nfs4_get_sb+0x346/0x5b0 [nfs]
[<c017b444>] vfs_kern_mount+0x94/0x110
[<c0190a62>] do_mount+0x1f2/0x7d0
[<c01910a6>] sys_mount+0x66/0xa0
[<c0104046>] syscall_call+0x7/0xb
[<ffffffff>] 0xffffffff
irq event stamp: 5277830
hardirqs last enabled at (5277830): [<c017530a>] kmem_cache_free+0x8a/0xc0
hardirqs last disabled at (5277829): [<c01752d2>] kmem_cache_free+0x52/0xc0
softirqs last enabled at (5277798): [<c0124173>] __do_softirq+0xa3/0xc0
softirqs last disabled at (5277817): [<c01241d7>] do_softirq+0x47/0x50
other info that might help us debug this:
no locks held by swapper/0.
stack backtrace:
[<c0104fda>] show_trace_log_lvl+0x1a/0x30
[<c0105c02>] show_trace+0x12/0x20
[<c0105d15>] dump_stack+0x15/0x20
[<c013ccc3>] print_usage_bug+0x153/0x160
[<c013d8b9>] mark_lock+0x449/0x620
[<c013e824>] __lock_acquire+0x604/0x1030
[<c013f2b1>] lock_acquire+0x61/0x80
[<c02db9ac>] _spin_lock+0x2c/0x40
[<c01dc487>] _atomic_dec_and_lock+0x17/0x60
[<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc]
[<dcf6bf83>] nfs_free_delegation_callback+0x13/0x20 [nfs]
[<c012f9ea>] __rcu_process_callbacks+0x6a/0x1c0
[<c012fb52>] rcu_process_callbacks+0x12/0x30
[<c0124218>] tasklet_action+0x38/0x80
[<c0124125>] __do_softirq+0x55/0xc0
[<c01241d7>] do_softirq+0x47/0x50
[<c0124605>] irq_exit+0x35/0x40
[<c0112463>] smp_apic_timer_interrupt+0x43/0x80
[<c0104a77>] apic_timer_interrupt+0x33/0x38
[<c02690df>] cpuidle_idle_call+0x6f/0x90
[<c01023c3>] cpu_idle+0x43/0x70
[<c02d8c27>] rest_init+0x47/0x50
[<c03bcb6a>] start_kernel+0x22a/0x2b0
[<00000000>] 0x0
=======================
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Do not allow cached open for O_RDONLY or O_WRONLY unless the file has been
previously opened in these modes.
Also Fix the calculation of the mode in nfs4_close_prepare. We should only
issue an OPEN_DOWNGRADE if we're sure that we will still be holding the
correct open modes. This may not be the case if we've been doing delegated
opens.
Finally, there is no need to adjust the open mode bit flags in
nfs4_close_done(): that has already been done in nfs4_close_prepare().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We don't really need to clear &state->inode_states inside
nfs4_set_mode_locked, and doing so without holding the inode->i_lock would
in any case be a bug...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We need to grab the inode->i_lock atomically with the last reference put in
order to remove the open context that is being freed from the
nfsi->open_files list.
Fix by converting the kref to a standard atomic counter and then using
atomic_dec_and_lock()...
Thanks to Arnd Bergmann for pointing out the problem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Obviously broken on little-endian; fortunately, the option is not
frequently used...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ Hey, sparse is wonderful, but even better than sparse is having people
like Al that actually _run_ it and fix bugs using it. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
If a NFSv4 mount is attempted with string based options, and the
option string doesn't contain a clientaddr= option, the kernel will
currently oops. Check for this situation and return a proper error.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
status in nfs client callback xdr code is passed in network order.
print it in host order for better readability.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Fix a couple of bugs:
- Don't rely on the parent dentry still being valid when the call completes.
Fixes a race with shrink_dcache_for_umount_subtree()
- Don't remove the file if the filehandle has been labelled as stale.
Fix a couple of inefficiencies
- Remove the global list of sillyrenamed files. Instead we can cache the
sillyrename information in the dentry->d_fsdata
- Move common code from unlink_setup/unlink_done into fs/nfs/unlink.c
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We need a common structure for setting up an unlink() rpc call in order to
fix the asynchronous unlink code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This will free up the d_fsdata field for other use.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Try harder to recover the open state if the server failed to return a
filehandle.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We can already easily recover from that inside _nfs4_proc_open().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Ensure that opendata->state is always initialised when we do state
recovery.
Ensure that we set the filehandle in the case where we're doing an
"OPEN_CLAIM_PREVIOUS" call due to a server reboot.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or
presence of a conflict lock by setting fl_type to, respectively, F_UNLCK
or something other than F_UNLCK, the return value is no longer needed.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
|
|
As Peter Staubach says elsewhere
(http://marc.info/?l=linux-kernel&m=118113649526444&w=2):
> The problem is that some file system such as NFSv2 and NFSv3 do
> not have sufficient support to be able to support leases correctly.
> In particular for these two file systems, there is no over the wire
> protocol support.
>
> Currently, these two file systems fail the fcntl(F_SETLEASE) call
> accidentally, due to a reference counting difference. These file
> systems should fail more consciously, with a proper error to
> indicate that the call is invalid for them.
Define an nfs setlease method that just returns -EINVAL.
If someone can demonstrate a real need, perhaps we could reenable
them in the presence of the "nolock" mount option.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I can never remember what the function to register to receive VM pressure
is called. I have to trace down from __alloc_pages() to find it.
It's called "set_shrinker()", and it needs Your Help.
1) Don't hide struct shrinker. It contains no magic.
2) Don't allocate "struct shrinker". It's not helpful.
3) Call them "register_shrinker" and "unregister_shrinker".
4) Call the function "shrink" not "shrinker".
5) Reduce the 17 lines of waffly comments to 13, but document it properly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Chinner <dgc@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This includes /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes entries.
Both need to show the header and use the list_head.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I ran into a curious issue when a lock is being canceled. The
cancellation results in a lock request to the vfs layer instead of an
unlock request. This is particularly insidious when the process that
owns the lock is exiting. In that case, sometimes the erroneous lock is
applied AFTER the process has entered zombie state, preventing the lock
from ever being released. Eventually other processes block on the lock
causing a slow degredation of the system. In the 2.6.16 kernel this was
investigated on, the problem is compounded by the fact that the cl_sem
is held while blocking on the vfs lock, which results in most processes
accessing the nfs file system in question hanging.
In more detail, here is how the situation occurs:
first _nfs4_do_setlk():
static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *fl, int reclaim)
...
ret = nfs4_wait_for_completion_rpc_task(task);
if (ret == 0) {
...
} else
data->cancelled = 1;
then nfs4_lock_release():
static void nfs4_lock_release(void *calldata)
...
if (data->cancelled != 0) {
struct rpc_task *task;
task = nfs4_do_unlck(&data->fl, data->ctx, data->lsp,
data->arg.lock_seqid);
The problem is the same file_lock that was passed in to _nfs4_do_setlk()
gets passed to nfs4_do_unlck() from nfs4_lock_release(). So the type is
still F_RDLCK or FWRLCK, not F_UNLCK. At some point, when cancelling the
lock, the type needs to be changed to F_UNLCK. It seemed easiest to do
that in nfs4_do_unlck(), but it could be done in nfs4_lock_release().
The concern I had with doing it there was if something still needed the
original file_lock, though it turns out the original file_lock still
needs to be modified by nfs4_do_unlck() because nfs4_do_unlck() uses the
original file_lock to pass to the vfs layer, and a copy of the original
file_lock for the RPC request.
It seems like the simplest solution is to force all situations where
nfs4_do_unlck() is being used to result in an unlock, so with that in
mind, I made the following change:
Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Consider the case where the user has mounted the remote filesystem
server:/foo on the two local directories /bar and /baz using the
nosharedcache mount option. The files /bar/file and /baz/file are
represented by different inodes in the local namespace, but refer to the
same file /foo/file on the server.
Consider the case where a process opens both /bar/file and /baz/file, then
closes /bar/file: because the nfs4_state is not shared between /bar/file
and /baz/file, the kernel will see that the nfs4_state for /bar/file is no
longer referenced, so it will send off a CLOSE rpc call. Unless the
open_owners differ, then that CLOSE call will invalidate the open state on
/baz/file too.
Conclusion: we cannot share open state owners between two different
non-shared mount instances of the same filesystem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Unless the user sets the NFS_MOUNT_NOSHAREDCACHE mount flag, we should
return EBUSY if the filesystem is already mounted on a superblock that
has set conflicting mount options.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Prior to David Howell's mount changes in 2.6.18, users who mounted
different directories which happened to be from the same filesystem on the
server would get different super blocks, and hence could choose different
mount options. As long as there were no hard linked files that crossed from
one subtree to another, this was quite safe.
Post the changes, if the two directories are on the same filesystem (have
the same 'fsid'), they will share the same super block, and hence the same
mount options.
Add a flag to allow users to elect not to share the NFS super block with
another mount point, even if the fsids are the same. This will allow
users to set different mount options for the two different super blocks, as
was previously possible. It is still up to the user to ensure that there
are no cache coherency issues when doing this, however the default
behaviour will be to share super blocks whenever two paths result in
the same fsid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Hook in final components required for supporting in-kernel mount option
parsing for NFSv2 and NFSv3 mounts.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
For NFSv2 and v3 mounts, the first step is to contact the server's MOUNTD
and request the file handle for the root of the mounted share. Add a
function to the NFS client that handles this operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This generic infrastructure works for both NFS and NFSv4 mounts.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Clean up white space and coding conventions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
In preparation for supporting NFSv2 and NFSv3 mount option handling in the
kernel NFS client, convert mount_clnt.c to be a permanent part of the NFS
client, instead of built only when CONFIG_ROOT_NFS is enabled.
In addition, we also replace the "struct sockaddr_in *" argument with
something more generic, to help support IPv6 at some later point.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
A couple of callers just use a stringified IP address for the rpc client's
hostname. Move the logic for constructing this into rpc_create(), so it can
be shared.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
In preparation for handling NFS mount option parsing in the kernel,
rename rpcb_getport_external as rpcb_get_port_sync, and make it available
always (instead of only when CONFIG_ROOT_NFS is enabled).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Refactor NFSv4 mount processing to break out mount data validation
in the same way it's broken out in the NFSv2/v3 mount path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Move error handling code out of the main code path. The switch statement
was also improperly indented, according to Documentation/CodingStyle. This
prepares nfs_validate_mount_data for the addition of option string parsing.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
NFS and NFSv4 mounts can now share server address sanity checking. And, it
provides an easy mechanism for adding IPv6 address checking at some later
point.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|