FIT1008 Introduction to Computer Science (FIT2085: for Engineers)Interview Prac 3 - Weeks 11 and...

Question

FIT1008 Introduction to Computer Science (FIT2085: for Engineers)
Interview Prac 3 - Weeks 11 and 12
Semester 1, 2019
Objectives of this practical session
To be able to implement and use hash tables in Python.
Important requirements
For this prac, you are required to:
• Create a new file/module for each task named task[num].
• Provide documentation for each basic piece of functionality in your code (see below)
• The documentation needs to include:
– pre and post conditions
– Big O best and worst time complexity
– information on any parameters used
• Provide testing for those functions/methods explicitly specified in the questions. The
testing needs to include:
– a function to test it
– at least two test cases per test function. These cases need to show that you
functions/methods can handle both valid and invalid inputs (if it takes inputs).
– assertions for testing the cases, rather than just printing out messages
• VERY IMPORTANT: As before the code cannot use any of the Python functionality
that would make the job trivial (like dictionaries, slicing, etc). You need to implement
the functions yourself by going through the a
ay, as we did in the lectures.
Important: Checkpoint for Week 11
To reach the check point you must complete Tasks 1, 2 and 3 by the end of your lab in Week 11, and
submit all python files co
esponding to these tasks. Remember: reaching the checkpoint is a hurdle.
Task 1 6 marks
Define a class HashTable implementing a hash table, using Linear Probing to resolve collisions. As in
Prac 2, you should use a list to represent the underlying a
ay. Include implementations for the following
methods:
• __init__(self, table_capacity, hash_base): Creates a new hash table, with initial table ca-
pacity table_capacity, and with using base hash_base for the hash function (see hash below).
If table_capacity or hash_base are not specified, the hash table should use appropriate default
values.
• getitem (self, key): Returns the value co
esponding to key in the hash table. Raises a
KeyE
or if key does not exist in the hash table. Remember that this can be called from Python
code as table[key]
• setitem (self, key, value): Sets the value co
esponding to key in the hash table to be value.
If the hash table is full and the key does not exist in the table yet, it first calls the rehash method
(see below) and then reinserts the key and value. Called as table[key] = value
• contains (self, key): Returns True if key is in the table and False otherwise.
• hash(self, key): Calculates the hash value for the given key, using the universal hash function
specified in lecture 27 (page 35) with the base hash_base given on table creation.
• rehash(self): It first creates a new a
ay using as size the smallest prime number in the Primes
list below that is larger than twice the cu
ent size. It then updates the size of the hash table
and reinserts all key-value pairs in the old a
ay into the new a
ay using the new size. Raises a
ValueE
or if there is no such prime in the list. For now, this method is only called when the table
is full and we want to insert an element.
Primes = [ 3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761,
919, 1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, XXXXXXXXXX, XXXXXXXXXX,
1674319, XXXXXXXXXX, XXXXXXXXXX, XXXXXXXXXX, XXXXXXXXXX, XXXXXXXXXX, XXXXXXXXXX, XXXXXXXXXX, XXXXXXXXXX]
We have provided a file TestHashTable.py which defines tests for some of the HashTable methods.
But be warned: these tests are not comprehensive, and it is in your interest to think about what kinds of
inputs we have not covered.
Important: you must define your own tests for the __setitem__ and rehash methods.
Task 2 5 marks
• Write a function load_dictionary(hash_table, filename) which reads a file filename containing
one word per line, and adds each word to hash_table with integer 1 as the associated data.
Important: you must define tests for the load_dictionary method. You should use one or more
small dictionary files to do so.
• Download from Moodle the dictionary files english small.txt, english large.txt and french.txt.
For each of these dictionaries, use a methodology similar to that studied in the Workshop of week 5
to time how long it takes for load_dictionary to run. Do this for each combination of the values
specified below for TABLESIZE and b in the universal hash function (giving a total of 3*3=9 com-
inations per file – nested for loops would be a good way to implement this – you can use ”in” fo
this).
TABLESIZE
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
Present the times recorded in a table. Write a short analysis regarding what values work best
and which work poorly. Explain why these might be the case. Be sure to include your table and
explanation in task_2.py (as a multi-line string).
Note: You can use Ctrl+c to stop execution of your program if some combination of values takes
too long. If a single combination of values takes more than 3 minutes to complete, you may report it as
TIMEOUT (but think about the reasons why!).
Task 3 4 marks
When discussing open addressing, we introduced several concepts:
• A collision occurs when we attempt to insert a key, but the location for that key is already occupied
y a different key.
• A probe chain is the sequence of positions we inspect before finding an empty space.
Consider inserting keys s1, s2 and s3 (which all hash to location 10) into an empty table with linear probing:
• When we insert s1, location 10 is empty, so there is no conflict.
• If we then insert s2, location 10 is full, but 11 is empty. This is a conflict, with probe chain length 1.
• When we insert s3, 10 and 11 are full, but 12 is free. This is one conflict, with probe chain length 2.
Extend your HashTable class with a method statistics(self) which returns a tuple (collision count,
probe total, probe max, rehash count), consisting of: (a) the total number of collisions, (b) the total
length of the probe chains, (c) the length of the longest probe chain, and (d) the number of times rehash
has been called.
Important: you must define appropriate tests for your statistics method.
Once the hash table has been modified, repeat Task 2 printing not only the run times but also the fou
counters above, and comment on the relationship between these counters and the runtime. Also, comment
on the length of the longest probe chain and the promise of O(1) time complexity. rehash_count should
e 0 in all runs – explain why. Again, remember to include your table and explanations in your submission.
CHECKPOINT
(You should reach this point during week 11)
Task 4 3 marks
Modify your hash table from the previous task to use Quadratic Probing to resolve collisions. Compare
the running time, and the value of the four counters obtained when loading each dictionary with Quadratic
Probing against those found when using Linear Probing.
Task 5 4 marks
Implement a hash table that uses Separate Chaining to resolve collisions, where the linked data structure
used is a Binary Search Tree (we will see this data type in Week 11, and you will be provided with an
implementation of the Binary Search Tree, so do not wo
y about this just yet).
Note that the content of the a
ay cells should initially be None (if you want to store the numbe
of elements per chain – up to you – then you should use instead something like a tuple that starts as
(None,0), where 0 is the initial number of elements). Once you need to add a (key,data) pair in that
cell, you should create an empty binary search tree (say my_tree = BinarySearchTree()), then insert
the (key,data) into my_tree, and then put my_tree in the cell (or in the tuple as (my_tree,1).
Once this is done, compare again the runtime and value of the counters obtained for Separate Chaining
against those obtained for Quadratic and for Linear Probing.
Background
In most large collections of written language, the frequency of a given word in that collection is inversely
proportional to its rank in the words. That is to say: the second most common word appears about half
as often as the most common word, the third most common word appears about a third as often as the
most common word and so on1.
If we count the number of occu
ences of each word in such a collection, we can use just the numbe
of occu
ences to determine the approximate rank of each word. Taking the number of occu
ences of the
most common word to be max and the relationship described earlier, we can give a rarity score to each
word:
• Any word that appears at least max/100 times is considered to be common (score 0).
• Any word that appears less than max/1000 times is considered to be rare (score 2).
• Any word in between (less than max/100 and greater or equal to max/1000) is considered to be
uncommon (score 1).
• Any word which has never been observed is a misspelling (score 3).
In the following we will use a hash table to facilitate a frequency analysis on a given text file.
Task 6 4 marks
Create a new class Freq that uses the Linear Probing hash table to perform a frequency analysis of a set
of files as follows. First, implement the method add_file(self,filename), which reads each word from
the file into the hash table, in such a way that the data associated to the word is its “occu
ence count”,
i.e., the number of times the word has been “added” to the hash table (which is the same as the numbe
1This is known as Zipf’s law. You can read more at https:
en.wikipedia.org/wiki/Zipf%27s_law
https:
en.wikipedia.org/wiki/Zipf%27s_law
of times it has already appeared in the file). The class (and its methods) must keep track of the numbe
of occu
ences max for the most common word read.
Then, implement a method rarity(self, word) that given a word, returns its rarity score.
Download some ebooks as text files from https:
www.gutenberg.org/ and use Freq to read these
files into your hash table. At the end of this process the table will

assessed-prac-3-files-20190609-ny5whx4s.zip fit1008-fit2085-interview-prac-3-1-qkcsahkx.pdf

Pritam · Accepted Answer

task1.py
class HashTable:
	def __init__(self, table_capacity = 11, hash_base = 31):
		self.table_capacity = table_capacity	# initialize capacity of hash table
		self.key_count = 0						# initialize count of distinct keys stored to 0
		self.hash_base = hash_base				# initialize the base for computing hash values
		self.Primes = [ 3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 
		                239, 293, 353, 431, 521, 631, 761, 919, 1103, 1327, 1597, 1931, 
		                2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 
		                14591, 17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 
		                75431, 90523, 108631, 130363, 156437, 187751, 225307, 270371, 
		                324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 
		                1395263, 1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 
		                4999559, 5999471, 7199369]	# store list of primes
		self.table = [ (None, None) ]*table_capacity		# create an empty table with specified capacity
		
	def __getitem__(self, key):
		hash_value = self.hash(key)			# compute hash value corresponding to key
		if self.table[hash_value][0] != key:# if the key does not match with the key stored in the table location whose index is hash value
			raise KeyError('Key is not present in the hash table!')		# raise a KeyError
		else:
			return self.table[hash_value][1]	# else return the value stored in the table location whose index is the hash value 
		
	def __setitem__(self, key, value):
		if self.__contains__(key) == False and self.key_count == self.table_capacity:
			self.rehash()
		hash_value = self.hash(key)				# compute hash value corresponding to key
		self.table[hash_value] = (key, value)	# store the (key, value) tuple in the table location whose index is the hash value 
		self.key_count += 1						# increase count of distinct keys stored by 1
				
	def __contains__(self, key):
		hash_value = self.hash(key)			# compute hash value corresponding to key
		if self.table[hash_value][0] == key:# if the key matches with the key stored in the table location whose index is the hash value
			return True						# return True 
		else:
			return False					# else, return False
		
	def hash(self, key):
		value = 0
		for i in range(len(key)):			# compute the hash value using the universal hash function
			value = ( value*self.hash_base + ord(key[i]) ) % self.table_capacity
		return value
		
	def rehash(self):
		new_prime = 0
		for i in range(len(self.Primes)):
			if self.Primes[i] > self.table_capacity:
				new_prime = self.Primes[i]
				break
		if new_prime == 0:
			raise ValueError('No suitable new prime found!')	
		else:
			old_table = self.table
			self.table_capacity = new_prime
			self.table = [ (None, None) ]*new_prime
			for (key, value) in old_table:
				self._setitem_(key, value)	
		
	
task2.py
class HashTable:
	def __init__(self, table_capacity = 11, hash_base = 31):
		self.table_capacity = table_capacity	# initialize capacity of hash table
		self.key_count = 0						# initialize count of distinct keys stored to 0
		self.hash_base = hash_base				# initialize the base for computing hash values
		self.Primes = [ 3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 
		                239, 293, 353, 431, 521, 631, 761, 919, 1103, 1327, 1597, 1931, 
		                2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 
		                14591, 17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 
		                75431, 90523, 108631, 130363, 156437, 187751, 225307, 270371, 
		                324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 
		                1395263, 1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 
		                4999559, 5999471, 7199369]	# store list of primes
		self.table = [ (None, None) ]*table_capacity		# create an empty table with specified capacity
		
	def __getitem__(self, key):
		hash_value = self.hash(key)			# compute hash value corresponding to key
		if self.table[hash_value][0] != key:# if the key does not match with the key stored in the table location whose index is hash value
			raise KeyError('Key is not present in the hash table!')		# raise a KeyError
		else:
			return self.table[hash_value][1]	# else return the value stored in the table location whose index is the hash value 
		
	def __setitem__(self, key, value):
		if self.__contains__(key) == False and self.key_count == self.table_capacity:
			self.rehash()
		hash_value = self.hash(key)				# compute hash value corresponding to key
		self.table[hash_value] = (key, value)	# store the (key, value) tuple in the table location whose index is the hash value 
		self.key_count += 1						# increase count of distinct keys stored by 1
				
	def __contains__(self, key):
		hash_value = self.hash(key)			# compute hash value corresponding to key
		if self.table[hash_value][0] == key:# if the key matches with the key stored in the table location whose index is the hash value
			return True						# return True 
		else:
			return False					# else, return False
		
	def hash(self, key):
		value = 0
		for i in range(len(key)):			# compute the hash value using the universal hash function
			value = ( value*self.hash_base + ord(key[i]) ) % self.table_capacity
		return value
		
	def rehash(self):
		new_prime = 0
		for i in range(len(self.Primes)):
			if self.Primes[i] > self.table_capacity:
				new_prime = self.Primes[i]
				break
		if new_prime == 0:
			raise ValueError('No suitable new prime found!')	
		else:
			old_table = self.table
			self.table_capacity = new_prime
			self.table = [ (None, None) ]*new_prime
			for (key, value) in old_table:
				self._setitem_(key, value)	
		
		
# function that reads a specified file containing one word per line, 
# and adds each word to hash_table with integer 1 as the associated data
def load_dictionary(hash_table, filename):
	with open(filename, 'r') as f:
		words = f.readlines()
		for b in [1, 27183, 250726]:
			for TABLESIZE in [250727, 402221, 1000081]:
				ht = HashTable(TABLESIZE, b)
				for word in words:
					ht.__setitem__(word, 1)
	f.close()
	
	
task3.py
class HashTable:
	def __init__(self, table_capacity = 11, hash_base = 31):
		self.table_capacity = table_capacity	# initialize capacity of hash table
		self.key_count = 0						# initialize count of distinct keys stored to 0
		self.hash_base = hash_base				# initialize the base for computing hash values
		self.collision_count = 0				# initialize collision_count to 0
		self.probe_total = 0 					# initialize probe_total to 0
		self.probe_max = 0 						# initialize probe_max to 0
		self.rehash_count = 0					# initialize rehash_count to 0
		self.Primes = [ 3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 
		                239, 293, 353, 431, 521, 631, 761, 919, 1103, 1327, 1597, 1931,

FIT1008 Introduction to Computer Science (FIT2085: for Engineers) Interview Prac 3 - Weeks 11 and 12 Semester 1, 2019 Objectives of this practical session To be able to implement and use hash tables...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment