From Andrew File System work:
- most files are small--transfer files rather than disk blocks?
- reading more common than writing
- most access is sequential
- most files have a short lifetime--lots of applications generate
temporary files (such as a compiler).
- file sharing is unusual--argues for client caching
- processes use few files
- files can be divided into classes--handle ``system'' files and
``user'' files differently.
Primarily look at three distributed file systems as we look at issues.
- 1.
- File Transfer Protocol (FTP). Motivation is to provide file sharing
(not a distributed file system). 1970s.
Connect to a remote machine and
interactively send or fetch an arbitrary file. FTP deals with
authentication, listing a directory contents, ascii or binary files, etc.
Typically, a user connecting to an FTP server must specify an account
and password. Often, it is convenient to set up a special account in
which no password is needed. Such systems provide a service
called anonymous FTP where userid is ``anonymous'' and password
is typically user email address.
- 2.
- Sun's Network File System (NFS). Motivated by wanting to extend a
Unix file system to a distributed environment. Easy file sharing and
compatability with existing systems. Mid-1980's.
Stateless in that servers do not maintain state about clients. RPC calls
supported:
- searching for a file within a directory
- reading a set of directory entries
- manipulating links and directories
- accessing file attributes
- reading/writing file data
- 3.
- Andrew File System (AFS). Research project at CMU in 1980s.
Company called Transarc. Primary motivation was to build a scalable
distributed file system. Look at pictures.
Other file systems:
- 1.
- CODA: AFS spin-off at CMU. Ddisconnection and fault recovery.
- 2.
- Sprite: research project at UCB in 1980's. To build a distributed
Unix system.
- 3.
- Echo. Digital SRC.
- 4.
- Amoeba Bullet File Server: Tanenbaum research project.
- 5.
- NTFS
How are files named? Access independent? Is the name location independent?
- FTP. location and access dependent.
- NFS. location dependent through client mount points. Largely
transparent for ordinary users, but the same remote file system could be
mounted differently on different machines. Access independent. See Fig
9-3. Has automount feature for file systems to be mounted on demand. All
clients could be configured to have same naming structure.
- AFS. location independent. Each client has the same look within a
cell. Have a cell at each site. See Fig 13-15.
NTFS uses uniform naming convention
\\server_name\share_name\x\y\z
. For example in CS department, mount
Unix file system under NT \\Rav2\cew\tmp\file.doc
. Access, but not
location transparent.
Can files be migrated between file server machines? What must clients be
aware of?
- FTP. Sure, but end-user must be aware.
- NFS. Must change mount points on the client machines.
- AFS. On a per-volume (collection of files managed as a single unit)
basis.
Are directories and files handled with the same or a different mechanism?
- FTP. Directory listing handled as remote command.
- NFS. Unix-like.
- AFS. Unix-like.
Amoeba has separate mechanism for directories and files.
What type of file sharing semantics are supported if two processes
accessing the same file?
Possibilities shown in Fig. 13-5.
- FTP. User-level copies. No support.
- NFS. Mostly Unix semantics.
- AFS. Session semantics.
What, if any, file caching is supported?
Different options are shown in Fig 13-11.
Immutable files in Amoeba.
Does the system support locking of files?
- FTP. N/A.
- NFS. Has mechanism, but external to NFS.
- AFS. Does support.
Is file replication/reliability supported and how?
- FTP. No.
- NFS. Unknown.
- AFS. For read-only volumes within a cell. For example binaries and
system libraries.
Look at Fig 13-13 for more theoretical approach. Have version numbers and
voting.
Nr + Nw > N. Usually keep Nr small and Nw close to N. Must
assemble a read-quorum or a write-quorum to read/write a
file.
Can have ghosts to cast write votes and then when the down machine
comes up it immediately gets the new version.
Is the system scalable?
- FTP. Yes. Millions of users.
- NFS. Not so much. 10-100s
- AFS. Better than NFS, keep traffic away from file servers. 1000s.
Is hardware/software homogeneity required?
- FTP. No.
- NFS. No.
- AFS. No.
Is the application interface compatible to Unix or is another
interface used?
- FTP. Separate.
- NFS. The same.
- AFS. The same.
What security and protection features are available to control access?
- FTP. Account/password authorization.
- NFS. RPC Unix authentication. If root on a client machine then can
create an arbitrary login/userid and gain access to files on file server.
Also could gain access as root unless account mapping used to map root to ``nobody''
- AFS. Unix permissions for files, access control lists for directories.
Do file system servers maintain state about clients? Look at Fig 13-8.
- FTP. No.
- NFS. No.
- AFS. Yes.
Think about for file systems and other large distributed systems.
- Workstations have cycles to burn. Make clients do work whenever
possible.
- Cache whenever possible.
- Exploit file usage properties. Understand them.
One-third of Unix files are temporary.
- Minimize system-wide knowledge and change. Do not hardwire
locations.
- Trust the fewest possible entitities. Do not trust workstations.
- Batch if possible to group operations.