Simple example:
Write a function to compute a hash code for lowercase alphabetic English caracters (a-z). The function returns integers from 0 to 25, with each char getting a different int. The function returns a wild card for not lowercase alphabetic English chars.
public int getHash(char c) {
c = Character.toLowerCase(c);
int hash = (int) c - 97;
if (hash < 0 || hash > 25) return -1;
return hash;
}What do you get when you run this?
Write a function to compute a hash code for lowercase alphabetic English caracters (a-z). The function returns integers from 0 to 25, with each char getting a different int. The function returns a wild card for not lowercase alphabetic English chars.
Minimal Perfect Hash function \(h(key)\)
valuesput(key, value) – values[\(h(key)\)] = valuecontains(key) – return values[\(h(key)\)] != nullget(key) – return values[\(h(key)\)]remove(key) – values[\(h(key)\)] = nullpublic class CharHash<T> {
private T[] table = (T[]) new Object[26];
private int getHash(char c) {
c = Character.toLowerCase(c);
int hash = (int) c - 97;
if (hash < 0 || hash > 25) return -1;
return hash;
}
public void put(char key, T value) {
int index = getHash(key);
if (index >= 0) table[index] = value;
}
public boolean contains(char key) {
int index = getHash(key);
if (index >= 0) return table[index] != null;
return false;
}
public T get(char key) {
int index = getHash(key);
if (index >= 0) return table[index];
return null;
}
public void remove(char key) {
int index = getHash(key);
if (index >= 0) table[index] = null;
}
}Advantages:
Disadvantages:
Go to canvas, on the menu on the left you will see a “Student Rating of Teaching” option.
Please complete the survey: it’s anonymous, I don’t get the results until winter break, results are used for improving my teaching (and providing feedback to my boss)
You have 10 minutes to complete the quiz
Constraints:
if X.equals(y) then X.hashCode() == Y.hashCode()if !X.equals(y) then X.hashCode() != Y.hashCode()Two main issues:
Here are the values that are to be stored in the hash table:
37, 7, 73, 77, 11, 68, 50, 66, 19, 4, 1, 18, 76, 71, 31, 45, 12, 25, 49, 53,
69, 87, 41, 2, 43, 23, 62, 38, 47, 51, 55, 65, 58, 91, 17, 63, 27, 88, 48
Here’s a hash function in java:
public class Hash {
int tableSize;
public Hash(int size) {
tableSize = size;
}
public int getHash(int value) {
return Math.abs(value % tableSize);
}
}Table size: 10
0 - 1 1 - 7 2 - 3 3 - 5 4 - 1
5 - 4 6 - 2 7 - 7 8 - 6 9 - 3
int[] values = new int[]{37, 7, 73, 77, 11, 68, 50, 66, 19, 4, 1, 18, 76, 71,
31, 45, 12, 25, 49, 53, 69, 87, 41, 2, 43, 23, 62, 38, 47, 51, 55, 65, 58,
91, 17, 63, 27, 88, 48};
int size = 20;
Hash hash = new Hash(size);
int[] indices = new int[size];
for (int v : values) {
indices[hash.getHash(v)] += 1;
}
for (int i=0; i<size; i++) {
System.out.println(i + " - " + indices[i] );
}Simple and fast solution:
Works well for any table size:
Consider an initially empty hash table \(H\) of size 15, designed to store integer keys. The hash function is defined as:
hash(k) = ⌊k mod length(\(H\))⌋
Insert the following sequence of keys in the given order: -524, 16, 322, -15, 78, -212
hash(k) = ⌊k mod length(\(H\))⌋
Insert the following sequence of keys in the given order: -524, 16, 322, -15, 78, -212
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
hash(k) = ⌊k mod length(\(H\))⌋
Insert the following sequence of keys in the given order: -524, 16, 322, -15, 78, -212
-524 % 15 = -14 (Java doesn’t use the mathematical definition)
⌊-14⌋ = 14
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| -524 |
hash(k) = ⌊k mod length(\(H\))⌋
Insert the following sequence of keys in the given order: 16, 322, -15, 78, -212
16 % 15 = 1
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 16 | -524 |
hash(k) = ⌊k mod length(\(H\))⌋
Insert the following sequence of keys in the given order: 322, -15, 78, -212
322 % 15 = 7
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 16 | 322 | -524 |
hash(k) = ⌊k mod length(\(H\))⌋
Insert the following sequence of keys in the given order: -15, 78, -212
-15 % 15 = 0
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| -15 | 16 | 322 | -524 |
hash(k) = ⌊k mod length(\(H\))⌋
Insert the following sequence of keys in the given order: 78, -212
78 % 15 = 3
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| -15 | 16 | 78 | 322 | -524 |
hash(k) = ⌊k mod length(\(H\))⌋
Insert the following sequence of keys in the given order: -212
⌊-212 % 15⌋ = 2
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| -15 | 16 | 212 | 78 | 322 | -524 |
hash(k) = ⌊k mod length(\(H\))⌋
Insert the following sequence of keys in the given order: -212
⌊-212 % 15⌋ = 2
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| -15 | 16 | 212 | 78 | 322 | -524 |
What if the next value to be stored is 120?
Two main issues:
Two problems to solve: hash code range, collisions
We compute index from hashcode as follows:
int index = abs(hashCode % tableSize)
hashCode is negative - modulo is negative - use absolute value to fixWe compute index from hashcode as follows:
int index = abs(hashCode % tableSize)
Consequences:
This one is hard. Two general approaches, both compromise efficiency:
Open Addressing: If a slot is taken, probe for the next empty one
Chaining: Each array slot holds a linked list of all items that hashed there. If there’s a collision, just add to the list.
If space is full put it somewhere else
Consider an initially empty hash table \(H\) of size 5, designed to store integer keys with linear probing. The hash function is defined as:
hash(k) = k mod length(\(H\))
Insert the following sequence of keys in the given order: 5, 7, 10
Key compromise: hash table doesn’t actually “finish” the job - just sorts into buckets.
put → find “bucket”, search bucket, add or setget → find “bucket”, search bucketcontains → find “bucket” search bucketremove → find “bucket” search bucketλ – the “load factor”, n current size of key set, k current size of table
λ = n / k
For one of our previous exercises, we had 39 values to store (n). What would be the load factor λ for different table sizes?
For one of our previous exercises, we had 39 values to store (n). What would be the load factor λ for different table sizes?
Why?
When?
How?
Design a Simple Spell Checker.
You’re building a spell checker that needs to quickly determine if a word exists in a dictionary of 100,000 English words
Implement a hash table to store the dictionary. Write a simple hash function for strings that takes a word as input and returns an integer index for an array.
Test your hash function with a sample of words. Does it:
public class WordHash {
private String[] table;
private int tableSize = 70000;
private int count;
public WordHash() {
table = new String[tableSize];
count = 0;
}
/**
* Hash function
* @param word is a string, a word in English
* @return an index, which is an int between 0 and tableSize-1
*/
private int getIndex(String word) {
int total = 0;
for (char c : word.toCharArray()) {
total += (int) c;
}
return (total * 997) % tableSize;
}
public void put(String word) {
int index = getIndex(word);
while (table[index] != null) {
index++;
if (index == tableSize -1) index = 0;
}
table[index] = word;
count++;
if (load() > 0.7) expandArray();
}
public boolean contains(String word) {
int index = getIndex(word);
while (table[index]!= null && !table[index].equals(word)) {
index++;
if (index == tableSize-1) index = 0;
}
if (table[index] == null) return false;
return true;
}
private int whereToPut(String word, String[] newTable) {
int index = getIndex(word);
while (newTable[index] != null) {
index++;
if (index == tableSize-1) index = 0;
}
return index;
}
private void expandArray() {
tableSize *= 2;
String[] newTable = new String[tableSize];
for (String word : table) {
newTable[whereToPut(word, newTable)] = word;
}
table = newTable;
}
public double load() {
return (double) count / (double) tableSize;
}
@Override
public String toString() {
String retVal = "";
for (int i=0; i < tableSize; i++) {
if (table[i] != null) retVal += i + " - " + table[i] + "\n";
}
return retVal.trim();
}
}