From 90586523eb4b349806887c62ee70685a49415124 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 21 May 2009 17:01:20 -0400 Subject: fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris Acked-by: Al Viro Cc: Christoph Hellwig --- fs/notify/fsnotify.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 fs/notify/fsnotify.c (limited to 'fs/notify/fsnotify.c') diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c new file mode 100644 index 00000000000..56bee0f10c3 --- /dev/null +++ b/fs/notify/fsnotify.c @@ -0,0 +1,79 @@ +/* + * Copyright (C) 2008 Red Hat, Inc., Eric Paris + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; see the file COPYING. If not, write to + * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include +#include +#include + +#include +#include "fsnotify.h" + +/* + * This is the main call to fsnotify. The VFS calls into hook specific functions + * in linux/fsnotify.h. Those functions then in turn call here. Here will call + * out to all of the registered fsnotify_group. Those groups can then use the + * notification event in whatever means they feel necessary. + */ +void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is) +{ + struct fsnotify_group *group; + struct fsnotify_event *event = NULL; + int idx; + + if (list_empty(&fsnotify_groups)) + return; + + if (!(mask & fsnotify_mask)) + return; + + /* + * SRCU!! the groups list is very very much read only and the path is + * very hot. The VAST majority of events are not going to need to do + * anything other than walk the list so it's crazy to pre-allocate. + */ + idx = srcu_read_lock(&fsnotify_grp_srcu); + list_for_each_entry_rcu(group, &fsnotify_groups, group_list) { + if (mask & group->mask) { + if (!event) { + event = fsnotify_create_event(to_tell, mask, data, data_is); + /* shit, we OOM'd and now we can't tell, maybe + * someday someone else will want to do something + * here */ + if (!event) + break; + } + group->ops->handle_event(group, event); + } + } + srcu_read_unlock(&fsnotify_grp_srcu, idx); + /* + * fsnotify_create_event() took a reference so the event can't be cleaned + * up while we are still trying to add it to lists, drop that one. + */ + if (event) + fsnotify_put_event(event); +} +EXPORT_SYMBOL_GPL(fsnotify); + +static __init int fsnotify_init(void) +{ + return init_srcu_struct(&fsnotify_grp_srcu); +} +subsys_initcall(fsnotify_init); -- cgit v1.2.3 From 3be25f49b9d6a97eae9bcb96d3292072b7658bd8 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 21 May 2009 17:01:26 -0400 Subject: fsnotify: add marks to inodes so groups can interpret how to handle those inodes This patch creates a way for fsnotify groups to attach marks to inodes. These marks have little meaning to the generic fsnotify infrastructure and thus their meaning should be interpreted by the group that attached them to the inode's list. dnotify and inotify will make use of these markings to indicate which inodes are of interest to their respective groups. But this implementation has the useful property that in the future other listeners could actually use the marks for the exact opposite reason, aka to indicate which inodes it had NO interest in. Signed-off-by: Eric Paris Acked-by: Al Viro Cc: Christoph Hellwig --- fs/notify/fsnotify.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'fs/notify/fsnotify.c') diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 56bee0f10c3..d5654629c65 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -25,6 +25,15 @@ #include #include "fsnotify.h" +/* + * Clear all of the marks on an inode when it is being evicted from core + */ +void __fsnotify_inode_delete(struct inode *inode) +{ + fsnotify_clear_marks_by_inode(inode); +} +EXPORT_SYMBOL_GPL(__fsnotify_inode_delete); + /* * This is the main call to fsnotify. The VFS calls into hook specific functions * in linux/fsnotify.h. Those functions then in turn call here. Here will call @@ -43,6 +52,8 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is) if (!(mask & fsnotify_mask)) return; + if (!(mask & to_tell->i_fsnotify_mask)) + return; /* * SRCU!! the groups list is very very much read only and the path is * very hot. The VAST majority of events are not going to need to do @@ -51,6 +62,8 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is) idx = srcu_read_lock(&fsnotify_grp_srcu); list_for_each_entry_rcu(group, &fsnotify_groups, group_list) { if (mask & group->mask) { + if (!group->ops->should_send_event(group, to_tell, mask)) + continue; if (!event) { event = fsnotify_create_event(to_tell, mask, data, data_is); /* shit, we OOM'd and now we can't tell, maybe -- cgit v1.2.3 From c28f7e56e9d95fb531dc3be8df2e7f52bee76d21 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 21 May 2009 17:01:29 -0400 Subject: fsnotify: parent event notification inotify and dnotify both use a similar parent notification mechanism. We add a generic parent notification mechanism to fsnotify for both of these to use. This new machanism also adds the dentry flag optimization which exists for inotify to dnotify. Signed-off-by: Eric Paris Acked-by: Al Viro Cc: Christoph Hellwig --- fs/notify/fsnotify.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) (limited to 'fs/notify/fsnotify.c') diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index d5654629c65..7fc760067a6 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -34,6 +34,97 @@ void __fsnotify_inode_delete(struct inode *inode) } EXPORT_SYMBOL_GPL(__fsnotify_inode_delete); +/* + * Given an inode, first check if we care what happens to our children. Inotify + * and dnotify both tell their parents about events. If we care about any event + * on a child we run all of our children and set a dentry flag saying that the + * parent cares. Thus when an event happens on a child it can quickly tell if + * if there is a need to find a parent and send the event to the parent. + */ +void __fsnotify_update_child_dentry_flags(struct inode *inode) +{ + struct dentry *alias; + int watched; + + if (!S_ISDIR(inode->i_mode)) + return; + + /* determine if the children should tell inode about their events */ + watched = fsnotify_inode_watches_children(inode); + + spin_lock(&dcache_lock); + /* run all of the dentries associated with this inode. Since this is a + * directory, there damn well better only be one item on this list */ + list_for_each_entry(alias, &inode->i_dentry, d_alias) { + struct dentry *child; + + /* run all of the children of the original inode and fix their + * d_flags to indicate parental interest (their parent is the + * original inode) */ + list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) { + if (!child->d_inode) + continue; + + spin_lock(&child->d_lock); + if (watched) + child->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED; + else + child->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED; + spin_unlock(&child->d_lock); + } + } + spin_unlock(&dcache_lock); +} + +/* Notify this dentry's parent about a child's events. */ +void __fsnotify_parent(struct dentry *dentry, __u32 mask) +{ + struct dentry *parent; + struct inode *p_inode; + bool send = false; + bool should_update_children = false; + + if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) + return; + + spin_lock(&dentry->d_lock); + parent = dentry->d_parent; + p_inode = parent->d_inode; + + if (fsnotify_inode_watches_children(p_inode)) { + if (p_inode->i_fsnotify_mask & mask) { + dget(parent); + send = true; + } + } else { + /* + * The parent doesn't care about events on it's children but + * at least one child thought it did. We need to run all the + * children and update their d_flags to let them know p_inode + * doesn't care about them any more. + */ + dget(parent); + should_update_children = true; + } + + spin_unlock(&dentry->d_lock); + + if (send) { + /* we are notifying a parent so come up with the new mask which + * specifies these are events which came from a child. */ + mask |= FS_EVENT_ON_CHILD; + + fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE); + dput(parent); + } + + if (unlikely(should_update_children)) { + __fsnotify_update_child_dentry_flags(p_inode); + dput(parent); + } +} +EXPORT_SYMBOL_GPL(__fsnotify_parent); + /* * This is the main call to fsnotify. The VFS calls into hook specific functions * in linux/fsnotify.h. Those functions then in turn call here. Here will call -- cgit v1.2.3 From 62ffe5dfba056f7ba81d710fee9f28c58a42fdd6 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 21 May 2009 17:01:43 -0400 Subject: fsnotify: include pathnames with entries when possible When inotify wants to send events to a directory about a child it includes the name of the original file. This patch collects that filename and makes it available for notification. Signed-off-by: Eric Paris Acked-by: Al Viro Cc: Christoph Hellwig --- fs/notify/fsnotify.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'fs/notify/fsnotify.c') diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 7fc760067a6..675129fa9fd 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -114,7 +114,8 @@ void __fsnotify_parent(struct dentry *dentry, __u32 mask) * specifies these are events which came from a child. */ mask |= FS_EVENT_ON_CHILD; - fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE); + fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE, + dentry->d_name.name); dput(parent); } @@ -131,7 +132,7 @@ EXPORT_SYMBOL_GPL(__fsnotify_parent); * out to all of the registered fsnotify_group. Those groups can then use the * notification event in whatever means they feel necessary. */ -void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is) +void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *file_name) { struct fsnotify_group *group; struct fsnotify_event *event = NULL; @@ -156,7 +157,7 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is) if (!group->ops->should_send_event(group, to_tell, mask)) continue; if (!event) { - event = fsnotify_create_event(to_tell, mask, data, data_is); + event = fsnotify_create_event(to_tell, mask, data, data_is, file_name); /* shit, we OOM'd and now we can't tell, maybe * someday someone else will want to do something * here */ -- cgit v1.2.3 From 47882c6f51e8ef41fbbe2bbb746a1ea3228dd7ca Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 21 May 2009 17:01:47 -0400 Subject: fsnotify: add correlations between events As part of the standard inotify events it includes a correlation cookie between two dentry move operations. This patch includes the same behaviour in fsnotify events. It is needed so that inotify userspace can be implemented on top of fsnotify. Signed-off-by: Eric Paris Acked-by: Al Viro Cc: Christoph Hellwig --- fs/notify/fsnotify.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'fs/notify/fsnotify.c') diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 675129fa9fd..f11d75f0236 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -115,7 +115,7 @@ void __fsnotify_parent(struct dentry *dentry, __u32 mask) mask |= FS_EVENT_ON_CHILD; fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE, - dentry->d_name.name); + dentry->d_name.name, 0); dput(parent); } @@ -132,7 +132,7 @@ EXPORT_SYMBOL_GPL(__fsnotify_parent); * out to all of the registered fsnotify_group. Those groups can then use the * notification event in whatever means they feel necessary. */ -void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *file_name) +void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *file_name, u32 cookie) { struct fsnotify_group *group; struct fsnotify_event *event = NULL; @@ -157,7 +157,7 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const if (!group->ops->should_send_event(group, to_tell, mask)) continue; if (!event) { - event = fsnotify_create_event(to_tell, mask, data, data_is, file_name); + event = fsnotify_create_event(to_tell, mask, data, data_is, file_name, cookie); /* shit, we OOM'd and now we can't tell, maybe * someday someone else will want to do something * here */ -- cgit v1.2.3 From e42e27736de80045f925564ea27a1d32957219e7 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 11 Jun 2009 11:09:47 -0400 Subject: inotify/dnotify: should_send_event shouldn't match on FS_EVENT_ON_CHILD inotify and dnotify will both indicate that they want any event which came from a child inode. The fix is to mask off FS_EVENT_ON_CHILD when deciding if inotify or dnotify is interested in a given event. Signed-off-by: Eric Paris --- fs/notify/fsnotify.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'fs/notify/fsnotify.c') diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index f11d75f0236..ec2f7bd7681 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -137,14 +137,16 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const struct fsnotify_group *group; struct fsnotify_event *event = NULL; int idx; + /* global tests shouldn't care about events on child only the specific event */ + __u32 test_mask = (mask & ~FS_EVENT_ON_CHILD); if (list_empty(&fsnotify_groups)) return; - if (!(mask & fsnotify_mask)) + if (!(test_mask & fsnotify_mask)) return; - if (!(mask & to_tell->i_fsnotify_mask)) + if (!(test_mask & to_tell->i_fsnotify_mask)) return; /* * SRCU!! the groups list is very very much read only and the path is @@ -153,7 +155,7 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const */ idx = srcu_read_lock(&fsnotify_grp_srcu); list_for_each_entry_rcu(group, &fsnotify_groups, group_list) { - if (mask & group->mask) { + if (test_mask & group->mask) { if (!group->ops->should_send_event(group, to_tell, mask)) continue; if (!event) { -- cgit v1.2.3