Distributed Systems

Overview

Impact of moving from a single machine to a networked set of machines has been tremendous. Look at Fig 9-1 for comparison of centralized vs. distributed. Also compare to isolated computers (9-2). Disadvantages (9-3).

Distributed System--``Autonomous machines linked by a network with software designed to produce an integrated computing facility.''--CDK.

Historical Background. Look at Fig 1.9 and 1.10.

Hardware

Begin with notion of distributed system (what is the interconnection hardware?)

Computer system classification:

SISD (single instruction single data) traditional uniprocessor
SIMD (single instruction multiple data) vector processing
MISD (multiple instruction single data) apply many different operations on the same type of data
MIMD (multiple instruction multiple data) distributed system

multiprocessor--a single address space among the processors.
multicomputer--each machine has its own private memory.

tightly-coupled vs. loosely-coupled. Look at Fig 9-4.

Bus

Two types of multiprocess organization:

1.

Uniform Memory Access (UMA) model. Show picture.

Caching is vital to get reasonable performance. For example, caches on a shared memory multiprocessor.

Want to maintain cache coherency. write-through cache--any changes to the cache are written through to memory. Combine with other processors on the bus watching the bus (snooping or snoopy cache). Possible to go to 32-64 CPUs on a bus.

2.

Non-Uniform Memory Access (NUMA) model. Show picture. Or a hierarchy where CPUs have their own memory (not the same as a cache). Access costs to memory is non-uniform (NUMA). Use memory locally or copy to our own machine.

Switched

Can also have switched multiprocessors.

Software

The hard part to make the system work. Expansions of a single machine system (Unix) now with more secure versions:
- rlogin--establish a context on a remote machine
- rsh--execute a single command on a remote machine
- rcp--copy a file from a remote machine
All accomplished with servers on the remote machine. Processes waiting to handle requests. Systems are heterogeneous and autonomous (make own decisions).
Some software to make the distributed nature of the system transparent to the user. Software gaining much use is Sun's Network File System (NFS). Allows access to files on remote machines by mounting (exported) remote directories within the local file system. Example of a network operating system (aware of machines).
Distributed applications such as WWW, gopher, netnews. Not an entire DOS, but needs underlying support.
Also multimedia applications--varying data requirements concerning speed and delay.
Distributed systems (not aware of machines) and multiprocessors. Look at 9-13.

Design Issues

Resource Sharing

Fundamental use of networked computers (why network them otherwise?). Files, information, work (computer supported cooperative working (CSCW)). Need policies and mechanisms for sharing the resources. Have clients and servers of information. See Fig 1.6.

Can also be done with the object model. Figure 1.7.

Transparency

location--unaware of resource location
access--identical operations used for local and remote resources
migration--resources can move without changing names
replication--system can have any number of copies
concurrency--protection of shared resources in face of parallel users
parallelism--user need not be aware of how many parallel activities occur
failure--concealment of faults

Openness (Flexibility)

Can the system be extended?

Need to design it into the system and publish the interfaces. Unix was an early open system. Look at DCE and CORBA as standards for creating open systems. Extended to new class of middleware for building systems.

Ability to adapt. Tanenbaum gives example of monolithic versus microkernel.

Concurrency

By the nature of having multiple computing machines, there are possibilities for tasks to be done concurrently (rather than simply timesharing in a single CPU system).

Reliability

One goes down, have high probability that other machines are available. But distributed systems often have dependencies on one or a few machines.

Leslie Lamport on a distributed system ``One on which I cannot get any work done because some machine I have never heard of has crashed.''

Fault tolerance (ability to recover from faults).

Two approaches to fault tolerance:

1.: hardware redundancy (look at Fig 1.3)
2.: software recovery

Issue of availability (how much is the system usable). Did a workstation or a server machine crash?

Name service to resolve names to their underlying value.

Names have contexts

Communication

Paradigms:

message passing
remote procedure call
group communication

Structure

monolithic vs. microkernel.

Workload Allocation

Approaches:

Processor pool (dedicated computation servers).
Use of idle workstations (absorbtion of idle power).
Shared memory multiprocessor.

Consistency

1.: update (by multiple applications)
2.: replication (replicated copies consistent)
3.: cache (cached copies consistent--Web)
4.: failure
5.: clock
6.: user interface