/sys/doc/ Documentation archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bind() and mount() confusion



First, Phil sez:
>this is incorrect. A process has a namespace which may be added to using
>mount or by binding a # device. The # is simply an escape into a special
>area of file systems provided by the kernel. Neither a bind nor a mount
>ever result in a new thread being created.

It doesn't matter if that's not how it actually works; that's not what
I was trying to say.  What I was trying to say is that's how it can be
_modeled_.  I think it can simplify how Inferno namespace is described
and the bind-v-mount distinction.  (In retrospect, I should have
emphasized that; it's not as obvious in the words as it was in my head.)

In actual fact, mount takes a file handle---but it has to be connected
to a thread somewhere, either locally or remotely.  A thread being created
every time is indeed an overstatement---one thread can handle a number of
connections---but a thread (or its moral equivalent) has to be behind the
file handle that's what's actually mounted.

----

Then Rob has some interesting commentary, which I'm going to cut and hack
out of all recognition in an attempt to elucidate something.

Let's start with this:
>	A process starts with nothing, and attaches only those resources
>	it needs into a private space.

This is a cool concept, but it never really happens.  What one tends to
do instead is to clone the current namespace, including all the stuff
already bound/mounted, and modify the clone.  In fact, I just greped all
the code in the beta 16 distribution for NEWNS and didn't find a one.

Furthermore, unless NEWNS also includes the local root filesystem (it's
not documented either way), creating an empty namespace would not let
one reach those files ever again.  And if it does pull it in, that means
that to create a restricted namespace, one would have to "white out"
everything that shouldn't be in it.  Since there doesn't seem to be any
direct way of "whiting out" a name, this means that one would have to
bind all the unrestricted names into a subtree and bind the subtree to
the root---so one may as well work from a clone in the first place.

(BTW, what use is it to have #/ provide some directories if those same
directories must also be in the root of the local file tree?  If the
directory isn't in the local root, the directories provided by #/ don't
work, so there doesn't seem to be any value added there.)

>The # notation was invented long ago (a scarily long time ago - in the
>earliest days of Plan 9) as a hack to give names to kernel devices.  It
>always bothered me that local trees were attached using bind but remote
>ones with mount. But it works out fine in practice, so it stayed that way.

OK, let me see if I can describe this.  I believe this is something that
I could build in Inferno as it exists today; tell me if there's any
difference in functionality.  (I remember when you presented the #
notation in a talk on Plan 9, BTW.  In response to a question from the
floor, you said it was an egregious hack and you really wanted it to go
away.  It's the only time I've ever heard anyone apply "egregious" to a
hack; I think it shows how strongly you felt about it.  Consider this a
way of getting rid of it, at least to the public.)

Let's assume that there's a program that's started the first thing; if
you want, it replaces the normal /dis/sh.dis (in which case, it'd also
have to make sure it bound the real shell over itself so that it was
only executed first thing).  It starts a server thread that does the
file2chan bit, which the initial thread then mounts on, say, /drivers.
The server thread does a FORKNS so that its namespace is protected.
After it does that, the initial thread does a pctl(NODEVS) as a parting
gift, so that nobody else can access the # namespace except the thread
serving /drivers.

Whenever a process makes a reference to /drivers/foo, the server looks
it up.  If it's a kernel driver, it returns a handle to the corresponding
driver in the sharp namespace---thus, /drivers/c is equivalent to #c so
that "bind /drivers/c /dev" does what you'd expect.  This server could
also be configured to recognize other names as well, so that something
like "bind /drivers/db /database" would mount a new instance of the
database files on /database.

Another way you could look at this is to think of /drivers as something
that automatically mounts something for you; each reference creates a
new instance of a mounted object.  (To introduce even more confusion,
however, this could eliminate the mount command itself---a command like
"bind /drivers/rmt/foo.bar.com /n" could do what the mount command does
now, with only a little more mechanism.)

To have a kernel that actually started this way would need a some magic,
but no more than is currently required for #/ and possibly even less,
since there wouldn't have to be machinery for the # namespace.  You'd
still need a server for access to user-level drivers, of course.

>Doing what you suggest muddies that design, and along the way makes it
>more awkward to control access to resources (e.g. pctl(NODEVS) to disable
>binds to local devices), to acquire guaranteed access to local ones (look
>at #c rather than /dev to be sure we're not being substituted), to sniff
>a device to see if it's local (stat it to see what type of file it is),
>and so on.

I'm not sure I agree.  For one thing, I resist the all-or-nothing way
that pctl(NODEVS) works.  Suppose I don't care if a process uses the
local screen, but I don't want it to access the network?  There's no
way I can selectively provide access to local resources without binding
the permitted ones into the namespace and then turning off ALL access.
How does the process get guaranteed access then?

I have other concerns about namespace management---and I don't think it's
because of my UNIX hat---I've followed Plan 9 since its inception and I
think it has a lot of wonderful ideas.  If you find this awkward, there
are places where I find Inferno clumsy.

>Are the #'ed trees a separate name space?  Well, yes, but it's not useful
>to think of them that way.  It is useful, however, to consider the process
>to have one true namespace with the ability to attach resources, so that's
>how we think about it.  It's an accident of history that some resources
>come by bind, others by mount, but an advantageous accident.

Perhaps the difference is one of focus.  You're concerned with the
ability to pull in additional resources into your namespace, while I'm
concerned with building cages for untrusted programs where I don't want
them to pull in additional resources.  A difference in philosophy, as it
were---but then, I started off by saying that what I was talking about
was probably a philosophical distinction.

One final thing, just to make sure I'm not misinterpreted:  I'm not
trying to criticise Inferno---it just re-used a mechanism that was well
known from Plan 9---and it's certainly true that nothing we say here is
going to change how it works.  But maybe there will be a Son-of-Inferno
(er, I guess that has to be Child-of-Inferno to be politically correct)
someday that could use these ideas.

Sigh.  I probably shouldn't have started this just as I'm going on
travel for a week.  If there's more traffic on this subject and I don't
respond immediately, don't assume I'm not interested in it, but I may
not be able to reply until next weekend.  There's more I want to do in
this note, but I've got to go get on an airplane, so I'll close now and
hope what I wrote isn't as confusing as the first one.

-- Greg