Two directions that traditional operating systems have gone in terms of
processing:
- 1.
- support for threads
- 2.
- distributed processing
Have talked about threads in CS3013, but review a little.
Also called lightweight processes. Contain an execution state within
a shared address space.
Threads are natural to use for a server handling requests. Each request
can be handled by a thread.
Threads vs. Processes
- threads are cheaper to create
- switching between threads in same process is much cheaper
- easier sharing of resources between threads
- lack of protection between threads within a process
Terminology among different systems:
Distributed OS kernel |
Thread name |
Exec. Env. Name |
Amoeba |
Thread |
Process |
Chorus |
Thread |
Actor |
Mach |
Thread |
Task |
V System |
Process |
Team |
Unix |
- |
Process |
User vs. Kernel threads. Fig 12-8 (from perspective of user threads):
- +
- can implement on a system not supporting kernel threads
- +
- fast creation of threads (kernel not involved)
- +
- fast switching between threads (kernel not involved)
- +
- customized scheduling algorithm
- -
- must have jackets around system calls that may block
- -
- no clock interrupts for time slicing
- -
- use threads when there are many system calls, not much more work
to switch threads in the kernel.
- -
- do not gain on a multiprocessor.
- -
- for all threads, worry about non-reentrant code (errno).
Amoeba. Best argument for using this approach comes
from queueing theory. ``Replacing n small resources by one big one that
is n times more powerful, reduces the average response time n-fold.
Dedicated processors. Do not support user interaction.
Every user has their own computer. Can share computing resources.
Models for workstations and their use of files. Look at Fig 12-18 as a
summary. Talk about load balancing later.
Can show both theoretically and practically that many idle nodes exist at
any one time. Look at Fig 11.1 and 11.2 from Singhal.
Would like to use these idle nodes. What is idle?
- idle process
- no user logged on
- threshold on system parameters
What is performance we are trying to optimize?
- response time
- throughput (jobs through system)
- load measure--resource queue lengths. Most common is CPU queue
length. V-System uses the CPU utilization. Other measures--memory, file,
display.
- Static/Dynamic, Adaptive then changes it how operates (may quit
collecting status, or collect it less often).
- Load Balancing/Load Sharing. Higher overhead for load balancing
because it tries to keep the loads equal (as opposed to just keeping
processors busy).
- Preemptive/Non-preemptive transfers. Migration during execution
implies passing state.
- transfer policy--should a task be transferred? Usually based on a
threshold.
- selection policy--which task to transfer. Next task, tasks that
will take a long time (to justify migration costs).
- location policy--which node to transfer to. Use polling,
multicasting or broadcasting.
- information policy--when to collect system state information
- demand-driven--only collect information when needed.
- periodic--non-adaptive
- state-change-driven--only when state has changed
- Sender-Initiated
- Look as needed--ELZ is a good example.
- Centralized. Zhou. A central server keeps track of idle and busy
nodes. All changes are reported to this node.
- Buddy System. Shin and Chang. 10-20 workstations keep complete
information amongst themselves.
- Receiver-Initiated--Idle nodes search out for tasks, leads to
preemptive transfers since tasks often have received some amount of
service.
- Symmetrically-Initiated--do both at once. Can have an upper and
lower threshold. Each node tries to keep its load within an
acceptable range. Does adjust its threshold.
Look at Singhal Fig 11.6
- Adaptive Algorithms--can change thresholds, may want to do for
sender-initiated at high loads. Keep track of responses from the probes so
a cache of receivers, senders, and oks are kept at each node.
co-scheduling or gang scheduling to get processes that are cooperating to
run at the same time.
- scalability--how large can the mechanism scale?
- location transparency--hide distributed nature
- determinism--works the same in all locations
- preemption--be able to move away from workstations that are no
longer idle.
- heterogeneity--should know particular hardware of machines.
State transfer and then unfreeze
Receiver-initiated approach for Internet-wide scale.
http://distriblets.wpi.edu/
``Adaptive Load Sharing in Homogeneous Distributed Systems''
by Eager, Lazowska and Zahorjan
Sender-initiated policies.
It is adaptive in that policies react to system state. Basic idea is to
understand how relatively simple load sharing policies work. Problems with
complex policies:
- complexity causes more overhead (hence increasing response time)
- can make poorer decisions if information is out of date.
- instability if decisions are too fine.
Identified a transfer policy (when to transfer) and a location policy
(where to transfer).
Assume a system model of 20 homogeneous node with some processor cost
assigned to a task transfer.
Exponentially distributed arrival rates and service times. Used
an analytically study backed up by a simulation.
Policies: all try to transfer a new task if the queue length is greater
than or equal to a threshold T.
- Random: pick another node and transfer task to that node. Have a
transfer limit Lt.
- Threshold: probe nodes until a satisfactory one (using T) is
found. Have a probe limit (3 found to be best) beyond which the task is
kept locally.
- Shortest: probe a fixed number of nodes (probe limit( and select the
best.
Look at primary results. Basically show that simple is good.
``Load Sharing Using Multicasting''
Buddy policy by Shin and Chang is to broadcast state changes to buddy set
of nodes
Node states: underloaded (eligible to receive tasks), medium, full-loaded
(try to transfer tasks)
Policies:
- Multithresh: maintain a multicast group for lightly-loaded nodes. If
a node is lightly loaded then join group. If overloaded then send to group
and use first response. Initiator can get swamped with replieds.
- Multileader: Two multicast addresses--group and leader. On state
changes, send message to group. If overloaded, send message to leader who
returns a lightly loaded node. Must deal with movement and loss of leader.
Look at results. Can also look at scaling.
Implemented as MQP on WPI's Beowulf cluster (Spring 2000). System is
called PANTS (PANTS Application Node Transparency System).