This is a 4000-level undergraduate course
during which you will study the concepts, design, and implementation of
distributed computing systems. The course will:
The course includes a substantial
practical component during which students will write programs that exercise
distributed computing features.
The official
course description on the WPI Computer Science web site says that this course
“extends the study of the design and implementation of operating systems begun
in CS 3013 to distributed and advanced computer systems. Topics include
principles and theories of resource allocation, file systems, protection
schemes, and performance evaluation as they relate to distributed and advanced
computer systems.”
You must have a solid background CS-3013 and/or
an equivalent Operating Systems course, and you must also be proficient in
programming a low-level language such as C, including the use of pointers,
casts, malloc(), and free().
You will
need to know something about computer networks, protocols, and the 7-layer OSI
stack. This is covered in CS-4514,
Computer Networks. For those have not taken CS-4514, a brief tutorial will
be offered at 11:00 AM to 1:00 PM on March 19 in Fuller Labs 320. The
following slides include the material of the tutorial. (.ppt,
html).
A tutorial on the use of sockets can be found here:– http://beej.us/guide/bgnet.
Time and Place: Tuesdays and Fridays, 8:00 — 9:50 AM, Goddard Hall 227
March 11 — April 29, 2008
No class on Tuesday, April 15 (campus-wide project day)
Fossil Lab will not be available from approximately 6:00 PM Friday, March 21
to
approximately 6:00 PM
Saturday, March 22
Professor: Hugh C. Lauer
Email:
<professor’s last name>@cs.wpi.edu
Office hours: by appointment, or (normally) 1 hour
after each class
Office:
Fuller Labs, room 239
Teaching Assistants:
Rick
Skowyra
Email:
rskowyra **at** cs.wpi.edu
Office hours:
2:00-4:00
PM, Mondays and Tuesdays
Office:
Fossil Lab
Issac
Chanin
Email:
chanin **at** wpi.edu
Office hours:
5:30-7:30
PM, Tuesday and Thursdays
Office:
Fossil Lab
Textbooks Andrew S. Tanenbaum and Maarten Van Steen, Distributed Systems: Principles and Paradigms,
2nd edition, Prentice Hall, 2007. Note:– Be sure that you have the 2nd edition of this
book. There are copies of the 1st edition floating around, including
on the web. The chapters of the two editions are in different orders.
You
also need one of the following from your Operating System course:–
o Silberschatz, Galvin, and Gagne, Operating
Systems Concepts, Seventh Edition, John Wiley and Sons, 2005.
o Andrew S. Tanenbaum, Modern Operating Systems.
2nd edition, Prentice Hall, 2001.
Class e-mail lists: The following two lists are in the domain
cs.wpi.edu:–
o cs4513-all
— to reach all students, TAs, and the professor
o cs4513-staff — to reach just the TAs and the
professor
Course
web site: http://web.cs.wpi.edu/~cs4513/d08/
Students needing to be absent from class
should notify the professor by e-mail or in person as soon as possible, especially if an exam is scheduled for the
day of absence!
CS-4513 meets for two 2-hour classes per
week for a seven-week undergraduate term (28 hours). There will be no class on
April 15, the campus-wide Project Day.
This course will be a combination of
lecture and class discussion, programming projects, and quizzes and tests.
There may be one or more unannounced quizzes during the term, and there will be
a scheduled mid-term exam and a scheduled final exam.
There will two programming projects
during the term, a simple project to be done individually and a larger to be
done in teams.
Class
participation is an essential
part of the grade for the course. If you attend lectures but never say anything
or engage in discussions, it would likely reduce your grade by one full letter
or more. Students will be expected to be familiar with and to discuss the
relevant sections of the textbook and with other reading assigned during the
term.
Final grades will be computed as follows:
Final grades will reflect the extent to
which you have demonstrated understanding of the material, and completed the
assigned projects. The base level grade will be a "B" which indicates
that the basic objectives on assignments and exams have been met. A grade of
"A" will indicate significant achievement beyond the basic objectives
and a grade of "C" will indicate not all basic objectives were met,
but work was satisfactory for credit.
If there are any circumstances that limit
or restrict your participation in the class or the completion of assignments,
please contact the professor as soon as possible in order to work something
out.
Testing
o There will be an in-class mid-term exam on Tuesday,
April 1 of approximately one hour duration.
o There will be an in-class final exam on Tuesday,
April 29 of approximately one hour duration.
o One or more unannounced quizzes may be held during
the term.
The mid-term and final exams will be
closed book, but you may bring one 8-½ ´
11 inch sheet of prepared notes (double-sided). Unannounced quizzes are closed
book and closed notes. Each student
should have a calculator available for quizzes and exams.
There is no make-up for missed quizzes.
It is not in your interest to miss an exam, but if extraordinary circumstances
apply, please contact the professor and
your advisor beforehand so that we can work something out.
Academic Honesty
Unless explicitly noted, all work is to
be done on an individual basis. Any violation of the WPI guidelines for
academic honesty will result in no credit for the course and referral to the
Student Affairs Office. More information can be found at
http://www.wpi.edu/Pubs/Policies/Honesty/policy.html
That being said, you are strongly
encouraged to discuss with each other about ideas, distributed system concepts,
course material, and especially the challenges you encounter in working with
projects and/or in the Fossil Lab.
Late Policy
Unless you have arranged otherwise with
the professor at least one day prior to
the due date, late submissions will be penalized 10% of total assignment
value per day or partial day (with the weekend counting as one day), and no
assignments will be accepted after seven days beyond the due date. All
assignments are due at the start of class on the due date. Projects must be
submitted as directed in class. Exceptions to these rules can be made only
beforehand.
Students wishing to receive both BS and
MS credit for this course must complete an additional research project. The
project for this term is specified here (.doc, html).
Copies of
the slides and notes used for lectures will be posted here just before or just
after each class. The textbook chapters refer to Tanenbaum & van Steen, Distributed Systems: Principles and
Paradigms, 2nd edition.
Date |
Topics |
Textbook |
Lecture |
|
Week 1 |
Course Introduction & |
1 |
||
|
Remote Procedure Call |
4.2 |
||
Week 2 |
Naming |
5 |
||
|
Introduction to File Systems |
OS text |
||
|
File System Implementations |
OS text |
||
Week 3 |
Distributed File Systems (and related topics) |
11 |
||
|
Learning about MapReduce |
|
||
|
Atomic Transactions |
|
||
Week 4 |
The Grapevine Distributed System |
|
||
|
Synchronization in Distributed Systems |
6 |
||
|
Election algorithms |
6 |
||
|
Replication and Consistency |
7 |
||
Week 5 |
More on Replication and Consistency |
7 |
||
|
Security and Authentication |
9 |
||
|
MapReduce — algorithms |
Not in text |
||
|
MapReduce — distributed functions |
Not in text |
||
Week 6 |
MapReduce — applications |
Not in text |
||
|
Google File System |
Not in text |
||
Week 7 |
Security and Authentication (continued) |
9 |
||
|
More on Authentication |
9 |
Fossil Lab
Project assignments this term will involve
the development of programs that spawn multiple processes and that communicate
across the network. Since the normal mistakes that occur during learning and
development of such programs can wreak havoc on the daily operation of the
campus computing facilities and networks, we will use the newly refurbished and
re-equipped Fossil Lab. This enables students to work in a realistic but
protected and isolated environment without impacting other computational
activities.
If you elect to develop a programming
assignment outside the Fossil Lab, please be advised that you are entirely responsible
for any errors or chaos it might cause.
In the new Fossil Lab, students work on
virtual machines. General instructions for setting up your virtual machine can
be found here:– (.doc, .html). The
specific virtual machine for this course can be found in the following folder,
which is accessible from the desktop of your Fossil Lab workstation:–
P:\
Clonable-SUSE-Linux-10.3
This virtual machine requires about 8-9 gigabytes of storage on
the Fossil server. You may make a full clone, but it will not fit within your
Fossil server quota. A linked clone,
however, will generally be less than 2 gigabytes and will with within your
quota. However, the linked clone can only be used in the Fossil Lab. If you
wish use the virtual machine on your own PC, make a full clone.
This virtual machine has a single user registered, namely
“student”. The root password and also the password for “student” is
“Fossil-B17”. It is strongly suggested that you create a user identity for
yourself and delete the student identity. It is also strongly suggested that you
change the root password of your virtual operating system to something else.
General Requirements
for Programming Assignments
Each programming project submitted by a
student will be compiled and tested by the graders. In order to make the
grading process reasonable, a certain uniformity of project submissions and
reasonable standard of programming practice are expected.
Programming projects this term will be
compiled and tested on virtual machines in the Fossil Lab machines running the
version of Linux, C, C++, or Java provided with this course. All student
programs must compile without
warnings. Using the “-Wno_deprecated” switch is strongly discouraged without prior
permission from the instructor.
When grading any project, the grader or
professor will do the following to compile the components of the project submission:–
download
from Turnin to student’s directory
cd {student’s directory}
make
and the following to erase all compiled
components
make clean
If this does not work, we will not try to figure out how to compile and
run your program and the assignment will not be graded any further.
To test your program, we will execute
{name
of program} {arguments}
For example, if you were asked to write a
program called life with three arguments, we will type the following
to a shell to test your program:–
life argument1 argument2 argument3
We will not look through your submission to try to figure out what you
decided to call your program.
Submission of
Programming Projects
Projects must be submitted using the
web-based version of the Turnin system. The instructor will provide accounts to
students at the start of term. When accounts are set up, each student will
receive an e-mail message with a password for the Turnin system. Students
should log into Turnin and change their passwords to something they can
remember. In the event that a password is forgotten, the instructor or grader
can reset it. The web-based Turnin system can be accessed from within a virtual
machine or from anywhere on the Internet by visiting
http://web.cs.wpi.edu/~kfisler/turnin.html
http://turnin.cs.wpi.edu:8088/servlets/turnin.ss
Project components should not be zipped
together or be partitioned into folders for different parts of the project. The
reason is that the Turnin system already zips them together.
When
you submit an assignment, your submission must include not only the solution,
but also the test code or test cases and it must be well commented and easy to
read by others. Similarly, any output must be cleanly formatted and easy to
read by others.
Be
sure to put your name at the top of every file submitted as part of a
programming project. You would be surprised at how often students
forget this!
“Getting the correct answer” on a
programming project is not sufficient to meet the objectives of the assignment.
Successfully meeting the objectives includes testing and paying attention to
its robustness. On the team project, it also means multiple “deliverables,”
each of which must be completed on time.
Programming Project Assignments:
Project #1 – Remote
execution of a command (.doc,
.html)
Slides for assignment of
project (.ppt,
.html)
Project #2 – Distributed
system project (.doc,
.html)
The
following papers are relevant to the material presented in class:–
Allman, Eric, “E-mail Authentication: what? Why? How?,” ACM Queue, November 2006, pp 30-34. (.pdf)
Anderson, Thomas E., Dahlin, Michael D.,
Neefe, Jeanna M., Patterson, David A., Roselli, Drew S., and Wang, Randoph Y.,
“Serverless Network File Systems,” Proceedings
of the 1995 Symposium on Operating System Principles, Copper Mountain,
Colorado, December 1995. (.pdf)
Birrell, Andrew D., Levin, Roy,
Birrell, Andrew D., and Nelson, Bruce
Jay, “Implementing Remote Procedure Calls,” ACM
Transactions on Computer Systems, vol. 2, #1, February 1984, pp 39-59. (.pdf)
Dean, Jeffrey, and Ghemawat, Sanjay,
“MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, vol 51, #1, January 2008, pp. 107-113. (.pdf)
Dean, J. and Ghemawat, S. “MapReduce:
Simplified data processing on large clusters,” In Proceedings of Operating Systems Design and Implementation (OSDI).
San Francisco, CA, 2004. pp. 137-150. (.pdf). Note: This paper is an earlier
version of the CACM paper above, but it contains some details not included in
the CACM paper.
Ghemawat, Sanjay, Gobioff, Howard, and
Leung, Shun-Tak, “The Google File System,” Proceedings
of the 2003 Symposium on Operating System Principles, Bolton Landing (Lake
George), NY, October 2003. (.pdf)
Lämmel, Ralf, “Google’s MapReduce Programming
Model — Revisited,” Microsoft Corp.,
McDougall, Richard, “Extreme Software
Scaling,” ACM Queue, September 2005,
pp 38-48. (.pdf)
Patterson, David A., “The Data Center is
the Computer,” Communications of the ACM,
vol 51, #1, January 2008, p. 105. (.pdf) Note: This paper is technical
perspective on the CACM paper by Dean and Ghemawat, listed above.
Rosenblum, M,
and Ousterhout, J. K., “The Design and Implementation of a Log-Structured File
System,” Proceedings of 13th
ACM Symposium on Operating Systems Principles, Pacific Grove, California,
October 1991, pp. 1-15. (.pdf)
Schroeder, Michael D., Birrell, Andrew
D.,
Smed, J., Kaukoranta, T., and Hakonen, H.
“Aspects of Networking in Multiplayer Computer Games,” The Electronic
Library, vol. 20, #2, Pages 87-97, 2002. (.pdf)
Sutter, Herb, and Larus, James, “Software
and the Concurrency Revolution,” ACM
Queue, September 2005, pp 54-62. (.pdf)