Hash Tables And Spellcheck
Chris Tralie
In this exercise, you will fill in the methods for a hash table class and use it to do spell check by quickly checking to see if typed words are actually in our list of ~300k words.
Click here to download the starter code for this exercise
Step 1: HashTable Class
First, you should fill in the set ADT instance methods of the HashTable
class. A linked list class has already been provided for you, and the constructor sets up the hash table with a list of linked lists according to how many buckets there are. The methods you have to fill in are:
-
contains(obj)
: ReturnTrue
ifobj
is in the hash table, orFalse
otherwise -
add(obj)
: Addobj
to the hash table -
remove(obj)
: Removeobj
from the hash table if it exists in the table, or return None if it's not there
Step 2: Testing with Programmable Hash Codes
In order for an object to be compatible with this data structure, it must be hashable; that is, it must implement a method hash_code
that returns a number for this particular object that's used to place it in the appropriate bin. We'll use "duck typing," so that as long as an object has this method, it doesn't matter what class it's come from, and we can use it in our HashTable
class. As an example, I've provided the file harrypotter.py
which defines a Wizard
class with the hash code based on the wizard's birthday:
So, if you were to run the following code:
It should print
Also, if you make a hash table with 10 buckets and add all of the wizards:
Then you should see the following output
Step 3: String Hash Codes
In order to use this for spell check, we will need to devise a hash code for strings. We'll use Java's default hash code scheme for strings, which is defined with the following recurrence relation (which can be implemented with recursion or a loop)
\[ h_0 = 0 \] \[ h_i = 31 * h_{i-1} + c_i \]
where ci is an int associated to the ith character, and the hash code for an entire string of length N is taken to be hN. In python, an int associated with a character can be obtained with the ord
method. For instance,
would yield the number 99.
Your Task: Implement the hash_code
method in the StrWrapper
class in strwrapper.py
according to the above definition. If it works and you run the following code, for example:
Then you should get
Step 4: Spellchecker
A file spellcheck.py
has been provided which uses the hash table you've implemented to do spellcheck efficiently. I've provided a jupyter notebook SpellCheckInteractive.ipynb
to interactively use to do spell check, using Jupyter widgets. If you've done all of the above properly, it should just work! Below shows an example running the spellcheck interface
This will be noticeably laggy if you didn't implement the buckets or the string hash code properly and everything maps to the same bucket