Hash Tables And Spellcheck

Chris Tralie

In this exercise, you will fill in the methods for a hash table class and use it to do spell check by quickly checking to see if typed words are actually in our list of ~300k words.

Click here to download the starter code for this exercise

Step 1: HashTable Class

First, you should fill in the set ADT instance methods of the HashTable class. A linked list class has already been provided for you, and the constructor sets up the hash table with a list of linked lists according to how many buckets there are. The methods you have to fill in are:

  • contains(obj): Return True if obj is in the hash table, or False otherwise
  • add(obj): Add obj to the hash table
  • remove(obj): Remove obj from the hash table if it exists in the table, or return None if it's not there

Step 2: Testing with Programmable Hash Codes

In order for an object to be compatible with this data structure, it must be hashable; that is, it must implement a method hash_code that returns a number for this particular object that's used to place it in the appropriate bin. We'll use "duck typing," so that as long as an object has this method, it doesn't matter what class it's come from, and we can use it in our HashTable class. As an example, I've provided the file harrypotter.py which defines a Wizard class with the hash code based on the wizard's birthday:

So, if you were to run the following code:

It should print

Also, if you make a hash table with 10 buckets and add all of the wizards:

Then you should see the following output

Step 3: String Hash Codes

In order to use this for spell check, we will need to devise a hash code for strings. We'll use Java's default hash code scheme for strings, which is defined with the following recurrence relation (which can be implemented with recursion or a loop)

\[ h_0 = 0 \] \[ h_i = 31 * h_{i-1} + c_i \]

where ci is an int associated to the ith character, and the hash code for an entire string of length N is taken to be hN. In python, an int associated with a character can be obtained with the ord method. For instance,

would yield the number 99.

Your Task: Implement the hash_code method in the StrWrapper class in strwrapper.py according to the above definition. If it works and you run the following code, for example:

Then you should get

Step 4: Spellchecker

A file spellcheck.py has been provided which uses the hash table you've implemented to do spellcheck efficiently. I've provided a jupyter notebook SpellCheckInteractive.ipynb to interactively use to do spell check, using Jupyter widgets. If you've done all of the above properly, it should just work! Below shows an example running the spellcheck interface

This will be noticeably laggy if you didn't implement the buckets or the string hash code properly and everything maps to the same bucket