Distributed Systems

Overview

Impact of moving from a single machine to a networked set of machines has been tremendous. Look at Fig 9-1 for comparison of centralized vs. distributed. Also compare to isolated computers (9-2). Disadvantages (9-3).

Distributed System--``Autonomous machines linked by a network with software designed to produce an integrated computing facility.''--CDK.

Historical Background. Look at Fig 1.9 and 1.10.

Hardware

Begin with notion of distributed system (what is the interconnection hardware?)

Computer system classification:

tightly-coupled vs. loosely-coupled. Look at Fig 9-4.

Bus

Two types of multiprocess organization:

1.
Uniform Memory Access (UMA) model. Show picture.

Caching is vital to get reasonable performance. For example, caches on a shared memory multiprocessor.

Want to maintain cache coherency. write-through cache--any changes to the cache are written through to memory. Combine with other processors on the bus watching the bus (snooping or snoopy cache). Possible to go to 32-64 CPUs on a bus.

2.
Non-Uniform Memory Access (NUMA) model. Show picture. Or a hierarchy where CPUs have their own memory (not the same as a cache). Access costs to memory is non-uniform (NUMA). Use memory locally or copy to our own machine.

Switched

Can also have switched multiprocessors.

Software

Design Issues

Resource Sharing

Fundamental use of networked computers (why network them otherwise?). Files, information, work (computer supported cooperative working (CSCW)). Need policies and mechanisms for sharing the resources. Have clients and servers of information. See Fig 1.6.

Can also be done with the object model. Figure 1.7.

Transparency

Openness (Flexibility)

Can the system be extended?

Need to design it into the system and publish the interfaces. Unix was an early open system. Look at DCE and CORBA as standards for creating open systems. Extended to new class of middleware for building systems.

Ability to adapt. Tanenbaum gives example of monolithic versus microkernel.

Concurrency

By the nature of having multiple computing machines, there are possibilities for tasks to be done concurrently (rather than simply timesharing in a single CPU system).

Reliability

One goes down, have high probability that other machines are available. But distributed systems often have dependencies on one or a few machines.

Leslie Lamport on a distributed system ``One on which I cannot get any work done because some machine I have never heard of has crashed.''

Fault tolerance (ability to recover from faults).

Two approaches to fault tolerance:

1.
hardware redundancy (look at Fig 1.3)
2.
software recovery

Issue of availability (how much is the system usable). Did a workstation or a server machine crash?

Performance

Different measurements. Actual applications versus low-level benchmarks.

Scalability

How big in terms of machines and distance? Affects resource management and location. A distributed file system vs. Internet as an example. Affects issues of caching and replication.

Should be able to extend any resource.

Naming

Need an identifier for an entity. Often two levels of names: those used by users and those used by the system.

Structured or flat?

Name service to resolve names to their underlying value.

Names have contexts

Communication

Paradigms:

Structure

monolithic vs. microkernel.

Workload Allocation

Approaches:

Consistency

1.
update (by multiple applications)
2.
replication (replicated copies consistent)
3.
cache (cached copies consistent--Web)
4.
failure
5.
clock
6.
user interface