Grading Policy, Course Syllabus
Operating system concepts you should already know:
This course covers file systems and distributed systems. Also might discuss some advanced operating systems topics such as multiprocessor operating systems and real-time systems.
Look at concept map (topics/innovations that have effected changes in distributed systems).
Look at general issues then at specific file systems.
A file is a collection of permanent data that has a name assigned to it.
The data is permanent in the sense that it remains in existence after the process that creates it has terminated, and remains in existence until explicitly deleted. Moreover, the data remains in existence across machine shut-downs/crashes.
Note: files are abstract entities provided by the operating system, not physical entities provided by hardware.
The operating system defines:
A name may be divided into components. File names often consist of a basename and extension. The basename names the file, while the extension indicates the type of data the file contains.
Organizing files into types is convenient because it allows similar programs to be grouped together.
In addition, it allows applications to reject operations on files of the ``wrong type''. Also to guide to correct application such as in Windows.
However, should the operating system require and force all files into this mold?
Should the operating system restrict the length and format of filenames?
MS-DOS uses a 8.3 style for names. Limit on file portion is 255 for Windows 98/NT (actually keeps both long and short name). Unix variants have a large or no limit on size.
A directory (folder) is a list of filenames, along with information about the file such as:
Often, operating systems store the contents of directories in files, and the file manager treats those files specially.
In a flat namespace all files are contained in a single, global directory.
Disadvantages:
In a hierarchical file system, directories may contain pointers to other directories. All directories can be accessed by starting at the root directory, and names consist of a sequence of directories followed by the file name.
Now, file names can be specified as full path names, in which
case the file is named relative to the root of the directory
tree. The names of directories are separated by a reserved character,
such as ``/'' (e.g., /usr/include/stdio.h) in Unix, ``
'' in DOS/Windows,
``>
'' in MULTICS.
The working directory is associated with each process, and a system call is used to change to another working directory.
In Unix/Linux, for instance, devices are in the same namespace with files. Each device foo is given a corresponding entry of /dev/foo in the filesystem.
Thus, users can read and write from terminals by opening (say) /dev/tty01, and issuing reads and writes. Device independence (of course!) translates the high level operations into the device specific ones.
As another example, programs can read and write memory through the devices /dev/mem and /dev/kmem.
MS-DOS/Windows variants include the device as part of the name
(A:\MYDOC.DOC
) so the name is not device independent. Floppy drive,
CD-ROM, hard drive, zip drive, ...
Some systems allow a file to be referenced by several aliases. Aliases allow the use of shorter names for files that do not belong to one directory.
Typically, the operating system supplies a system call of the form:
In Unix/Linux, this alias is called a ``hard link'' and is created with the
ln command (invokes link() system call). Example:
ln curname aliasname
Aliasing brings up two issues:
Finding all of a file's aliases could be difficult, or might force the operating system to maintain a special table for aliases.
Alternatively, a reference count could be associated with every file, with a count of the number of aliases. Deleting an alias decrements the reference count, deleting the file only if the count goes to zero.
We could charge the original creator of the file, but this is unfair, especially if the original creator deletes file alias.
Best alternative is to charge each alias owner equally.
In Unix(Windows), the mkdir (CreateDirectory()) system-call/command creates directories. It also creates two aliases in the new directory:
Note: no special processing is needed to handle ``.'' and ``..''. Also in Windows file systems.
In Unix/Linux, this alias is called a ``symbolic link'' (or soft link) and
is created with the ln -s command
Example:
ln -s curname symlink
Use of indirect files raise the following issues:
To prevent infinite loops, systems typically limit the number of indirect links to a small number such as 5.
Unix interprets the name relative to the directory containing indirect file.
% ls -l total 1 -rw-rw-r-- 1 cew system 9 Mar 15 11:08 origfile % ln origfile linkfile # create hard link % ls -l total 2 -rw-rw-r-- 2 cew system 9 Mar 15 11:08 linkfile -rw-rw-r-- 2 cew system 9 Mar 15 11:08 origfile % ln -s origfile symlink # create symbolic link % ls -l total 2 -rw-rw-r-- 2 cew system 9 Mar 15 11:08 linkfile -rw-rw-r-- 2 cew system 9 Mar 15 11:08 origfile lrwxr-xr-x 1 cew system 8 Mar 15 11:10 symlink@ -> origfile
Instead of focusing solely on file access, we will consider the more general case of controlling access to arbitrary objects.
An object provides operations through which a service is invoked. A process can invoke an operation only if it has an appropriate access right or privilege to do so.
We can represent access rights by an access matrix.
The following matrix gives sample protections:
/dev/console | fred/prog.c | fred/letter | /usr/ucb/vi | |
Fred's P | RW | RW | RW | X |
Fred's Q | RW | RW | RW | X |
Jane's P | RW | R | X |
All three processes can execute /usr/ucb/vi, but none can read or write it.
Fred's processes P & Q have the same permissions. Usually, a process inherits its access rights from its parent.
Jane can read (but not write) fred/prog.c.
The access matrix is a conceptual framework and is rarely implemented as an actual matrix. (Why?) Must have an entry for each object and user. Wasteful.
One aspect of access control concerns when access rights are verified. Possibilities:
While more efficient, it cannot change the access rights for a process that has already opened the object.
Using our previous example, Fred's list would be:
Fred/dev/console(RW)
fred/prog.c(RW)
fred/letter(RW)/usr/ucb/vi(X)
Jane/dev/console(RW)
fred/prog.c(R)
fred/letter()/usr/ucb/vi(X)
This arrangement has several drawbacks:
The dual of capability lists is access lists, which divide the access matrix by columns. The access list is associated with objects rather than with users.
Disadvantages:
Solution: group users into classes. Each member in a class has the same access privileges for the object.
For instance, the file fred/prog.c might be given access rights:
self | RW |
group | R |
others | no access |
In addition to groups, Multics allowed individual users to be added to the access list of an object. Thus, fred/prog.c might be given the following access list:
self | RW |
group | R |
others | no access |
bob | R |
Also used in Windows NT.
The semantics of access control are fairly straightforward in the context of files. What about directories?
Here is one possibility:
Note: should execute permission be required for the full path name? What if relative names are given?
If a file has several aliases, should the file have different permissions for each name?
If permissions are associated with the directory entry, the answer is yes. However, this raises security concerns. because it becomes more difficult to verify all possible permissions.
If permissions are associated with the file itself, only one set of permissions can exist. This approach is taken by Unix.
How should the permissions of indirect files be interpreted?
The former approach becomes awkward because the interpretation of the modes also depends on whether the indirect link points to a file or a directory.
Unix adopts the middle approach, ignoring protections of the indirect file.
An access method defines the way processes read and write files.
Where should the mark be stored?
Disadvantage:
Advantages:
seek() call in Unix.
Then access to the file contents is managed by the virtual memory system where the contents of the memory are backed by the mapped file.
Different than treating files as unstructured, byte streams. Some applications, such as databases, prefer viewing files as repositories of records that can be accessed with a key.