Hashtables
Our bank from the last few lectures has grown to have hundreds of thousands of customers and accounts. The system has long since outgrown the LinkedList data structure for these data; even the AVL-tree based implementation has gotten too slow for looking up customers. Ideally, the bank needs data structures that provide even faster techniques for getting from account numbers to Account objects and from names to Customers.
This lecture is about hashtables, a data structure that can perform lookups in near-constant time. A hashtable maps keys to values without traversing a data structure over all the keys. The diagram at the top right of the Wikipedia entry on hashtables illustrates the concept nicely. Every hashtable has a fixed number of "buckets" to which it maps keys. If you have more buckets than actual keys, hashtables provide a perfect match between keys and data. If you have more keys than buckets, keys sometimes collide, costing the accuracy of the data retrieved from the table. We won’t talk about hashtables in depth here, as you’ll see them in detail in Algorithms.
We will study hashtables in the context of providing a better data structure for implementing the IAcctSet interface in our banking system.
1 Java Classes and Operations on Hashtables
Java provides two data structures for hashtables: one called Hashtable and one called HashMap. HashMap is generally preferred, unless you have to deal with threads and synchronization (not a topic for this course). Hashtable is a legacy data structure from earlier versions of Java. We will therefore use the HashMap classes in Java as our implementation of the general hashtable data structure.
Hashtables are built into Java. To use them, include the line
import java.util.HashMap; |
HashMap is a generic class, defined over type variables for the key (K) and value (V). The most useful HashMap<K,V> methods are:
V put(K key, V value) : stores the value under the key. If there was already a value associted with the key, put returns the previous value; otherwise it returns null.
V get(Object key) : returns the value corresponding to the given key. If there is no value for the key, returns null.
For the full list of HashMap methods, see the Java HashMap documentation.
2 Using HashMaps
Here’s a simple AcctSetHM class that implements IAccountSet using a HashMap:
class AcctSetHM implements IAccountSet { |
private HashMap<Integer,Account> accounts = new HashMap<Integer,Account>(); |
|
public Account findByNum(int num) { |
return accounts.get(num); |
} |
} |
While this has the right spirit, it fails to properly handle cases in which the given account number is not found. The above description of the get method indicated that it returns null if the requested key isn’t in the hashtable. A null value should not leak out of findByNum. Instead, findByNum should check for it, and throw an exception if necessary:
class AcctSetHM implements IAccountSet { |
private HashMap<Integer,Account> accounts = new HashMap<Integer,Account>(); |
|
public Account findByNum(int num) throws AcctNotFoundException { |
Account acct = accounts.get(num); |
if (null == acct) |
throw new AcctNotFoundException(num); |
return acct; |
} |
} |
For an example of put, we need a method that adds new accounts or new customers, neither of which we have including in the banking system to date. Where should we put a method to add new customers?
In the Customer class?: No. Methods in the customer class are for existing Customer objects.
Isn’t this what the Customer constructor does?: No. The Customer constructor will create a new Customer object, but it doesn’t put that object into the overall data structures for the banking system. We will use the Customer constructor while adding a new customer, but it is not sufficient.
In the BankingConsole, since somebody would have to issue the command to create a new customer?: Remember that the BankingConsole is only for the user-interface for operations. We would need to extend the console with a command for adding customers, but actually modifying the set of customers in the console would break the encapsulation of the BankingService.
In the BankingService class?: Yes. BankingService controls the data structure we are trying to modify (the set of customers), so that class is the right place for the core method that adds new customers.
The method in BankingService might appear as follows:
class BankingService { |
... |
|
// create a new customer with an initial password and add to the customer set |
public Customer newCustomer (String custname) { |
Customer newC = new Customer(custname, this.genInitPwd()); |
Customer oldC = this.customers.add(newC); |
return newC; |
} |
|
// generates default passwords for new customers. Would not |
// generate the same password for all customers in a real system |
// Note: this is your first example of a private method |
private int genInitPwd() { |
return 1234; |
} |
} |
For this to work, we have to add the add method to the ICustSet interface and each of its implementing classes.
interface ICustSet { |
Customer findByName(String name) throws CustNotFoundException; |
Customer add(Customer newC); |
} |
|
class CustSetHM implements ICustSet { |
private HashMap<String,Customer> customers = new HashMap<String,Customer>(); |
|
public Customer add(Customer newC) { |
return this.customers.put(newC.getUniqueID(), newC); |
} |
} |
As we did when we called get in findByNum, we should augment this method to check for the null return value and throw an exception if a customer with this name already exists.
3 Defining Equality on Keys
Hashmaps store and retrieve values based on the keys. For hashmaps to work, Java must be able to tell whether data of the key type for a hashmap are the same. When we use integers or Strings as keys, Java can easily determine whether two keys are equal. What if we instead want to use objects of some class that we defined as the keys? For example, we might choose to use a hashmap to retrieve the accounts affiliated with each customer, rather than store a list of those accounts inside the customer.
If you need to create a hashmap using some class that you defined as the key, you need to tell Java two things:
how to determine whether two objects of the class are equal
how to generate a positive number from an object (to be used as the actual key under the hood).
You do this by adding two methods to classes that you will use as keys: equals takes another object of the same type and returns a boolean if they should be considered the same object; hashcode returns an int that must be unique (to high probability) across all the objects in the class. If you are working in Eclipse, there is a menu item that helps you generate these methods based on essential fields in your class. Items 7 and 8 in Chapter 3 of Effective Java provide a method for writing your own hashcode methods (this one chapter is available online.
4 Summary
Hashtables and hashmaps are a particular data structure that implement a more general ADT called a dictionary or map. These ADTs are used to map from keys to values. Hashtables and hashmaps are particularly fast implementations that achieves performance by converting keys to positive integers that are largely unique across the set of hashed items. Hashmaps themselves use these positive integers to directly access the locations where data is stored in memory (rather than having to search through memory for values).