Physical Representation of Files

A disk drive consists of a disk pack containing one or more platters stacked like phonograph records. Information is stored on both sides of the platter.

Each platter is divided into concentric rings called tracks, and each track is divided into sectors. All transfers to and from the disk are performed at the sector level.

For example, to modify a single byte, the system reads the sector containing the byte off the disk, modifies the byte, and rewrites the entire sector.

The disk uses a read/write head to transfer data to and from the platter. The operation of moving the head from one track to another is called seeking, and the heads generally move together as a unit.

Because the disk heads move together as unit, it is useful to group the tracks at the same head position into logical units called cylinders. Sectors in the same cylinder can be read without an intervening seek.

Formatting

Before a disk can be used, the raw disk must be formatted. Formatting creates sectors of the size appropriate for the target operating system. Some systems allow disks to be formatted by user programs; usually, the manufacturer formats a disk before sending it to a customer.

One or more disk drives connects to a disk controller, which handles the details of moving the heads, etc. The controller communicates with the CPU through a host interface. Moreover, through direct memory access (DMA), the controller can access main memory directly, transferring entire sectors without interrupting the CPU.

Look at picture. Fig 5-3 from Tanenbaum.

Three factors influence the delay, or latency, that occurs in transferring data to/from a disk:

1.: seek latency -- time needed to move the head to the desired track
2.: rotational delay -- time needed to wait for the sector of a track to move under the head
3.: transfer delay -- the amount of time required to read/write the sector

Current technologies (Maxtor Diamond Max 1750 Drive)

seek time: of 10ms average
latency: 6ms (for half rotation, based on 5200 RPM)
transfer rate: up to 14 MB per second

File Implementation

An important (experimental) observation is that:

1.: the majority of files are small
2.: a few files are large

We want to handle both file types efficiently.

The operating system may choose to use a larger block size than the sector size of the physical disk. Each block consists of consecutive sectors. Motivation:

a larger block size increases the transfer efficiency
might be more convenient to have block size match the machine's page size

Note:

because some systems interrupt after each sector operation, ``consecutive'' sectors may mean ``every other physical sector'' to allow time for CPU to start the next transfer before the head moves over the desired sector
some systems allow transferring of many sectors between interrupts

The size of transfer convenient for operating system is a disk block. It may be the same size of a sector or larger. Generally moving to larger block size. NTFS uses 4K block size for disks larger than 2GB. FAT-32 uses 4K up to 8GB, 8K up to 16GB, 16K up to 32GB and 32K above 32GB.

Can also look at allocating blocks in a contiguous manner, but may not know the total needed for a file at creation time. Also can lead to fragmentation with too many small block runs.

Details of policies for reducing latencies for retrieval of blocks from disk discussed in previous course.

Block Management

Approaches for keeping track of free blocks:

1.

Bit map--devote a single bit to each block

2.

Linked List--each element of the list contains the number of a block. This block contains numbers of free blocks. Thus for 16-bit block numbers (FAT-16) and 1024K block, can fit 512 block numbers in the block.

Can simplify this approach and keep track of one free block in each linked list element.

Also can groups set of consecutive free blocks using an address/count approach.

Can look at space/time tradeoffs for each approach.

Caching

Caching is crucial to improve performance. Why?

many file blocks will be accessed again soon
consider the cycle of editing, compiling, and running a program
consider frequently used commands such as ls, vi, etc.

Unfortunately, caching may cause data in memory to become out of step with data on disk. This is known as the cache coherency problem. The problem is most significant in following contexts:

database applications that must insure that the database is in a known state.
One solution is for the operating system to provide a ``flush'' system call, that writes to disk the file blocks associated with the file descriptor. The system might also flush the cache when the file is closed.
machine crashes, where the disk must be in a consistent state. For instance, it wouldn't be good to have a directory entry point to a file that hadn't been written to disk (or vice versa).