Web as a Distributed System

The World Wide Web is a large distributed system. Comprises 70-75% of Internet traffic (Claffy:INET98).

Active area of personal research. Look at my home page for online papers.

Also an extensive list of publications in the area http://www.cs.wpi.eduwebbib/

E2E Factors and Relevance to Distributed Systems

Build up end-to-end performance picture. Basic picture of clients and servers. How DNS and HTTP are used for each object.

Static Objects

Served from a static file. Similarities with a distributed file system, but differences.

Composite Objects

Pages with embedded objects (images). In the simplest case just make subsequent requests to the same server over separate TCP connections.

Browsers use parallel connections to speed up delivery speed.

Network Protocols

Use of HTTP/1.1 for persistent connections (retrieval of multiple objects over the same connection) and pipelining (multiple requests before receiving any replies).

When persistence and pipelining works it is better than parallel connections.

Caches

Dynamic Content

Retrieval is no longer a file retrieval, but a remote procedure call with one or more parameters.

Query to a search engine where key words are parameters.

Use of cookies, which are pieces of information sent by browser on each request for server to track requests and customize content.

Server-side computation is being performed (CGI, JavaScript, ASP (VBScript). May include database access.

Server Farms

A single server is not enough. Have a farm of servers. How to organize? Make it visible to the world or not?

Remote Content Server

Content Adaptation Server

Adapt content to needs/capabilities of the client.

Other Factors

Interaction Between Factors

Impact of one factor may be more or less depending on another factor.

For example, impact of caching or remote content distribution may depend on the HTTP protocol option being used.

Secure Sockets Layer

With increased use of electronic commerces, secure transactions are important. SSL protocol developed by Netscape for encrypted communication between clients and servers.

SSL sits between TCP/IP and HTTP layers. Invoked by browsers using the protocol ``https''.

Uses public-key cryptography. Features:

Two sub-protocols: SSL record protocol and SSL handshake protocol. SSL record format defines the format used to transmit data. SSL handshake protocol uses SSL record protocol to establish connection between SSL-enabled server and SSL-enabled client. Exchange of messages does the following:

SSL Handshake

Outline using one of Cipher suites previously given.

1.
Client sends SSL version number, cipher settings, randomly generated data and other information server needs.
2.
Server sends server SSL version number, cipher settings, randomly generated data, other information and servers own certificate. Server may optionally request client's certificate. Client authenticates server certificate by using public key of certificate authority (CA). See Figure 2.
3.
Client creates premaster key for session and encrypts it with servers public key (obtained from server's certificate) and sends to server.
4.
Client sends encrypted data based on own private key if client needs authentication.
5.
Server generates master secret after optional client authentication.
6.
Both client and server use master secret secret to generate session keys, which are symmetric keys for encryption/decryption of exchanged information during SSL session.
7.
Client and server inform each other session key has been created.
8.
SSL handshake is complete.

Look at timings for cached and uncached key from IBM technical report.

Client needs to verify server's Domain Name (DN) to make sure a ``man-in-the-middle'' does not intercept messages and substitute its own keys.