Chapter 20 Java Utilities Package and Bit Manipulation 1161 20.5 Hashtable Class Object-oriented programming languages facilitate creating new data types. When a program creates objects of new or existing types, the program then needs to manage those objects efficiently. This includes storing and retrieving objects. Storing and retrieving information with arrays is efficient if some aspect of your data directly matches the key value and if those keys are unique and tightly packed. If you have 100 employees with 9digit Social Security numbers and you want to store and retrieve employee data by using the Social Security number as a key, it would nominally require an array with 999,999,999 elements, because there are 999,999,999 unique 9-digit numbers. This is impractical for virtually all applications that use Social Security numbers as keys. If the program could have an array that large, the program could get high performance for both storing and retrieving employee records by simply using the Social Security number as the array index. There are numerous applications that have this problem, namely, that either the keys are of the wrong type (i.e., not nonnegative integers), or they may be of the right type, but sparsely spread over a huge range. What is needed is a high-speed scheme for converting keys such as Social Security numbers, inventory part numbers and the like into unique array subscripts. Then, when an application needs to store something, the scheme could convert the application key rapidly into a subscript and the record of information could be stored at that slot in the array. Retrieval is accomplished the same way: Once the application has a key for which it wants to retrieve the data record, the application simply applies the conversion to the key this produces the array subscript where the data is stored and the data is retrieved. The scheme we describe here is the basis of a technique called hashing. Why the name? When we convert a key into an array subscript, we literally scramble the bits, forming a kind of mishmashed number. The number actually has no real significance beyond its usefulness in storing and retrieving this particular number data record. A glitch in the scheme occurs when collisions occur (i.e., when two different keys hash into the same cell (or element) in the array). We cannot store two different data records in the same space, so we need to find an alternative home for all records beyond the first that hash to a particular array subscript. There are many schemes for doing this. One is to hash again (i.e., to reapply the hashing transformation to the key to provide a next candidate cell in the array). The hashing process is designed to distribute the values throughout the table, so the assumption is that with just a few hashes an available cell will be found. Another scheme uses one hash to locate the first candidate cell. If that cell is occupied, successive cells are searched linearly until an available cell is found. Retrieval works the same way: The key is hashed once to determine the initial location to check to see whether it contains the desired data. If it does, the search is finished. If it does not, successive cells are searched linearly until the desired data is found. The most popular solution to hash table collisions is to have each cell of the table be a hash bucket, typically a linked list of all the key value pairs that hash to that cell. This is the solution that Java s Hashtable class (from package java.util) implements. One factor that affects the performance of hashing schemes is called the load factor. This is the ratio of the number of occupied cells in the hash table to the size of the hash table. The closer this ratio gets to 1.0, the greater the chance of collisions.
Note: In case you are looking for affordable webhost to host and run your web application check Vision http web server services