The World Wide Web is a large distributed system. Comprises 70-75% of
Internet traffic (Claffy:INET98).
Active area of personal research. Look at my home page for online papers.
Also an extensive list of publications in the area
http://www.cs.wpi.edu/ webbib/
Build up end-to-end performance picture. Basic picture of clients and
servers. How DNS and HTTP are used for each object.
Served from a static file. Similarities with a distributed
file system, but differences.
- much larger scale (number of clients)
- heterogeneous clients (machine type/operating system)
- Web generally has single-writer/multiple-reader versus a DFS is
concerned about multiple writers.
- no central administration
Pages with embedded objects (images). In the simplest case just make
subsequent requests to the same server over separate TCP connections.
Browsers use parallel connections to speed up delivery speed.
Use of HTTP/1.1 for persistent connections (retrieval of multiple objects
over the same connection) and pipelining (multiple requests before
receiving any replies).
When persistence and pipelining works it is better than parallel connections.
- Browser vsProxy (depends on shared interests of users).
- Cache Replacement. LRU, GreedyDual-Size (value divided by size with
credit for recent accesses)
- Cache Coherency. Strong consistency (validate each access, server
invalidation like AFS). Weak consistency using a heuristic TTL.
- Hierarchy of caches. How to should caches work together?
Retrieval is no longer a file retrieval, but a remote procedure call with
one or more parameters.
Query to a search engine where key words are parameters.
Use of cookies, which are pieces of information sent by browser on each
request for server to track requests and customize content.
Server-side computation is being performed (CGI, JavaScript, ASP
(VBScript). May include database access.
A single server is not enough. Have a farm of servers. How to organize?
Make it visible to the world or not?
- HTTP redirection
- multiple IP addresses for the same server name.
- a single high-speed switch in front of a group of web servers.
Transparent to clients.
- How to use them. Talk about Akamai use of DNS.
- What content to store there?
- How to update?
Adapt content to needs/capabilities of the client.
- Client-side scripts (java applets)--distriblet use
- Secure transactions (SSL)
- Streaming audio/video content
Impact of one factor may be more or less depending on another factor.
For example, impact of caching or remote content distribution may depend on
the HTTP protocol option being used.
With increased use of electronic commerces, secure transactions are
important. SSL protocol developed by Netscape for encrypted communication
between clients and servers.
SSL sits between TCP/IP and HTTP layers. Invoked by browsers using the
protocol ``https''.
Uses public-key cryptography. Features:
- Clients can authenticate servers thrugh a
certificate authority. Important as a client should not send credit card
information to an unknown server.
- Servers can authenticate clients in the same way.
- Encrypted SSL connection (no plain text).
Two sub-protocols: SSL record protocol and SSL handshake protocol. SSL
record format defines the format used to transmit data. SSL handshake
protocol uses SSL record protocol to establish connection between
SSL-enabled server and SSL-enabled client. Exchange of messages does the
following:
- authenticate the server to the client
- allow client and server to select the cryptographic algorithms, or
ciphers, they both support. Choices include DES, Digianl signature
algorithm (DSA), MD5, RSA, RSA key exchange. Most common is RSA key
exchange developed for SSL. Also FORTEZZA Cipher Suites for SSL 3.0.
- Optionally authenticate client to the server.
- Use public-key encryption techniques to generate shared secrets.
- Establish encrypted SSL connection.
Outline using one of Cipher suites previously given.
- 1.
- Client sends SSL version number, cipher settings, randomly generated
data and other information server needs.
- 2.
- Server sends server SSL version number, cipher settings, randomly
generated data, other information and servers own certificate. Server may
optionally request client's certificate. Client authenticates server
certificate by using public key of certificate authority
(CA). See Figure 2.
- 3.
- Client creates premaster key for session and encrypts it with
servers public key (obtained from server's certificate) and sends to
server.
- 4.
- Client sends encrypted data based on own private key if client needs
authentication.
- 5.
- Server generates master secret after optional client
authentication.
- 6.
- Both client and server use master secret secret to generate session keys, which are symmetric keys for encryption/decryption of
exchanged information during SSL session.
- 7.
- Client and server inform each other session key has been created.
- 8.
- SSL handshake is complete.
Look at timings for cached and uncached key from IBM technical report.
Client needs to verify server's Domain Name (DN) to make sure a
``man-in-the-middle'' does not intercept messages and substitute its own
keys.