Great Deal! Get Instant \$10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

# HashTables.pptx http://algs4.cs.princeton.edu Algorithms Robert Sedgewick | Kevin Wayne 3.4 Hash Tables hash functions separate chaining linear probing context http://algs4.cs.princeton.edu Algorithms...

HashTables.pptx
http:
algs4.cs.princeton.edu
Algorithms
Robert Sedgewick | Kevin Wayne
3.4 Hash Tables
hash functions
separate chaining
linear probing
context
http:
algs4.cs.princeton.edu
Algorithms
Robert Sedgewick | Kevin Wayne
[wayne f11] include memory of separate chaining and linear probing vs. red-black BST?
Symbol table implementations: summary
Q. Can we do better?
2
implementation        guarantee                        average case                        ordered
ops?        key
interface
search        insert        delete        search hit        insert        delete
sequential search (unordered list)        N        N        N        ½ N        N        ½ N                equals()
binary search (ordered a
ay)        lg N        N        N        lg N        ½ N        ½ N        ✔        compareTo()
BST        N        N        N        1.39 lg N        1.39 lg N        √ N        ✔        compareTo()
red-black BST        2 lg N        2 lg N        2 lg N        1.0 lg N        1.0 lg N        1.0 lg N        ✔        compareTo()
3
Hashing: basic plan
Save items in a key-indexed table (index is a function of the key).
Hash function. Method for computing a
ay index from key.
Issues.
Computing the hash function.
Equality test: Method for checking whether two keys are equal.
Collision resolution: Algorithm and data structure
to handle two keys that hash to the same a
ay index.
No space limitation: trivial hash function with key as index.
No time limitation: trivial collision resolution with sequential search.
Space and time limitations: hashing (the real world).
hash("times") = 3
??
0
1
2
3        "it"
4
5
hash("it") = 3
hashing at its core is a space-time tradeoff: if our keys are 32-bit integers, can use a hash table of size 2^32
http:
algs4.cs.princeton.edu
hash functions
separate chaining
linear probing
context
3.4 Hash Tables
http:
algs4.cs.princeton.edu
5
Computing the hash function
Idealistic goal. Scramble the keys uniformly to produce a table index.
Efficiently computable.
Each table index equally likely for each key.
Ex 1. Phone numbers.
Better: last three digits.
Ex 2. Social Security numbers.
Better: last three digits.
Practical challenge. Need different approach for each key type.
thoroughly researched problem,
still problematic in practical applications
573 = California, 574 = Alaska
(assigned in chronological order within geographic region)
key
table
index
Goal: hash function scrambles keys, then each index will co
espond to roughly 1/M fraction of elements
Assumes table size is M = XXXXXXXXXXdecimal digits)
6
Java’s hash code conventions
All Java classes inherit a method hashCode(), which returns a 32-bit int.
Requirement. If x.equals(y), then (x.hashCode() == y.hashCode()).
Highly desirable. If !x.equals(y), then (x.hashCode() != y.hashCode()).
Default implementation. Memory address of x.
Legal (but poor) implementation. Always return 17.
Customized implementations. Integer, Double, String, File, URL, Date, …
User-defined types. Users are on their own.
x.hashCode()
x
y.hashCode()
y
int between -2^31 and 2^31 - 1
epeated calls to hashCode() must return same value (provided no info used in equals() is changed)
Ensures hashing can be used for every object type.
Enables expert implementations for each type. This is the sweet spot for hashing.
7
Implementing hash code: integers, booleans, and doubles
public final class Intege
{
private final int value;
...

public int hashCode()
{ return value; }
}
convert to IEEE 64-bit representation;
xor most significant 32-bits
with least significant 32-bits
Warning: -0.0 and +0.0 have different hash codes
public final class Double
{
private final double value;
...

public int hashCode()
{
long bits = doubleToLongBits(value);
return (int) (bits ^ (bits
32));
}
}
public final class Boolean
{
private final boolean value;
...

public int hashCode()
{
if (value) return 1231;
else return 1237;
}
}
Java li
ary implementations
Java li
ary implementations
Warning: this means that need special care when implementing equals(). The equals() method in Double considers -0.0 and 0.0 to be not equal even though 0.0 == -0.0
Horner's method to hash string of length L: L multiplies/adds.
Equivalent to h = s · 31L–1 + … + s[L – 3] · XXXXXXXXXXs[L – 2] · XXXXXXXXXXs[L – 1] · 310.
Ex.
public final class String
{
private final char[] s;
...
public int hashCode()
{
int hash = 0;
for (int i = 0; i < length(); i++)
XXXXXXXXXXhash = s[i] + (31 * hash);
return hash;
}
}
8
Implementing hash code: strings
3045982 = 99· XXXXXXXXXX· XXXXXXXXXX· XXXXXXXXXX·310
= XXXXXXXXXX· XXXXXXXXXX · XXXXXXXXXX · (99)))
(Horner's method)
ith character of s
String s = "call";
int code = s.hashCode();
char        Unicode
…        …
'a'        97
'b'        98
'c'        99
…        ...
Java li
ary implementation
Note: reference Java implementation caches string hash codes.
this is essentially Java's implementation of hashCode() for strings since Java 1.5.
It's ok to add an int and a char. It's also ok if the result overflows since this is all well-defined in java. (two's complement integers)
Why 31? It's a Mersenne prime (one less than a power of 2) -> easy to implement with shift and subtract 1.
Performance optimization.
Cache the hash value in an instance variable.
Return cached value.
Q. What if hashCode() of string is 0?
public final class String
{
private int hash = 0;
private final char[] s;
...
public int hashCode()
{
int h = hash;
if (h != 0) return h;
for (int i = 0; i < length(); i++)
XXXXXXXXXXh = s[i] + (31 * h);
hash = h;
return h;
}
}
9
Implementing hash code: strings
eturn cached value
cache of hash code
store cache of hash code
Note: if hash value is 0, it is recomputed every time, e.g., "pollinating sandboxes"
The code is a bit ve
ose in case threads are calling hashCode() concu
ently
Another place where caching a value is useful is on 8-puzzle assignment - cache the manhattan distance in Board (or SearchNode)
Note: still immutable even though instance variable hash can change.
10
Implementing hash code: user-defined types
public final class Transaction implements Comparable{
private final String who;
private final Date when;
private final double amount;
public Transaction(String who, Date when, double amount)
{ /* as before */ }
...
public boolean equals(Object y)
{ /* as before */ }

public int hashCode()
{
int hash = 17;
hash = 31*hash + who.hashCode();
hash = 31*hash + when.hashCode();
hash = 31*hash + ((Double) amount).hashCode();
return hash;
}
}
typically a small prime
nonzero constant
for primitive types, use hashCode()
of wrapper type
for reference types,
use hashCode()
the 17 helps reduce collisions when there are leading fields with 0s. Java often uses 1 as the constant instead of 17.
11
Hash code design
"Standard" recipe for user-defined types.
Combine each significant field using the 31x + y rule.
If field is a primitive type, use wrapper type hashCode().
If field is null, return 0.
If field is a reference type, use hashCode().
If field is an a
ay, apply to each entry.
In practice. Recipe works reasonably well; used in Java li
aries.
In theory. Keys are bitstring; "universal" hash functions exist.
Basic rule. Need to use the whole key to compute hash code;
consult an expert for state-of-the-art hash codes.
or use A
ays.deepHashCode()
applies rule recursively
used in Java li
aries such as A
ays.deepHashCode() and String
Note: Java makes it difficult to implement universal hashing and tabular hashing because there is only one hash function (from hashCode())
Hash code. An int between -231 and XXXXXXXXXX.
Hash function. An int between 0 and M - 1 (for use as a
ay index).
12
Modular hashing
typically a prime or power of 2
private int hash(Key key)
{ return key.hashCode() % M; }
ug
private int hash(Key key)
{ return Math.abs(key.hashCode()) % M; }
private int hash(Key key)
{ return (key.hashCode() & 0x7fffffff) % M; }
co
ect
1-in-a-billion bug
hashCode() of "polygenelu
icants" is -231
x.hashCode()
x
hash(x)
int between -2^32 and 2^31 - 1
Can't use h % M for index since h can be negative so % can return negative number. Plausible fix |h| % M doesn't work since |h| can be negative if h = -2^31. Should also count 1 bit-whacking operation in work to hash string of length W.
13
Uniform hashing assumption
Uniform hashing assumption. Each key is equally likely to hash to an integer between 0 and M - 1.
Bins and balls. Throw balls uniformly at random into M bins.
Birthday problem. Expect two balls in the same bin after ~ π M / 2 tosses.
Coupon collector. Expect every bin has ≥ 1 ball after ~ M ln M tosses.
Q ( log M / log log
M ) balls.
0        1        2        3        4        5        6        7        8        9        10        11        12        13        14        15
Ex. After 23 people enter a room, expect two with same birthday.
14
Uniform hashing assumption
Uniform hashing assumption. Each key is equally likely to hash to an integer between 0 and M - 1.
Bins and balls. Throw balls uniformly at random into M
Answered Same Day Jun 23, 2021 DSAA204

## Solution

Akriti answered on Jun 25 2021
1
Data structure and algorithm
Student name
Student ID
Introduction
Data structure is considered to be the way for the purpose of collecting the requisite data and organizing the same in an effective and efficient manner. The same have to be organized in such a manner that the user can perform various operations on these particular set of data in an effective manner. Data structure relates to the rendering of the data and its elements in relation to the relationships and for the purpose of better organization and storage capacity of the same.
Whereas the algorithm consist of the finite set of logics and instructions which are stated in a specified order for the purpose of completing and accomplishing the specified task which have already been predefined. Algorithms are not considered to be the complete code or rules but are again the core logic relating to the situation or the problem, which can be expressed either as a flow chart or by informal high level description relating to the same.
Content analysis
Week 1
In the first week I learned about the stacks and queues related to the data structure and the algorithms related to the same. Though they seem to be same in structure but the use relating to the same is different. These sorts of structures related to the data helps the users like us to organize the data in a particular defined order like a
ays and lists. The difference which I have noticed during the study relating to the both of them is that the difference is within the removal id the data present in the same. Stacks are considered to be last in first out i.e. LIFO whereas the queues are considered to be the first in first out basis that is FIFO. The best example elated to the stack can be the stack of the toy ring as new rings can be put on the other ring and we have to take out the same ring for taking out the rest of the rings under the same. Whereas the best example of queues can be the individuals standing in a perfect straight line waiting for their respective turns to undertake and receive the stuff they want to. Here are some of the terms relating to the same:
Node- consist of the key and the indicator
Key- helps in representing the data or the respective information which have been stored in the respective string
Week 2
In this week we have learned about the respective bags and queues which have been taken into account. This consists of various sub topics like resizing of a
ays, queues, generics and various applications. I have searched for the manner and the way for the purpose of resizing the a
ay but various challenges and issues have been faced at the time of implementing and undertaking the same (Ebert, Wu, et.al., 2017). Typically resizing relating to the a
ay is comparatively a complex procedure and is expensive too. One of the major limitation relating to the same is that it is fixed in size and which directly means that the every time the user has to specify and enter the number of elements which is required by the a
ay and will hold ahead of the requisite time. Apart from the same it have been noticed by me that by making additions of the new element it will be the reason for adding the additional cost to the data structure which have been undertaken and hence is a time consuming and expensive measure for the users.
Figure 1: selection a
ay sorting
Week 3
In the third week I learned and study the structure and the various rules and codes of the algorithm. In the theoretical analysis relating to the algorithm I have undertaken various measures and estimates related to the challenges, the level of complexity undertaken, will the program will be able to resolve the various issues and many more such elements which are considered be of paramount importance for the users and the organization making use of the same as a whole. For the purpose of overcoming the same various exact and accurate measures related to the dame have been undertaken and considered for efficiency which can be sometimes are being computed but then they also are in requirement of the certain assertion and assumptions for the purpose of undertaking and calculating the same.
For the purpose of simplifying the calculation it will be considered convenient to make use of the certain measure of the respective amount of the work which has been undertaken in the procedure of computing the same. We can make count for the numerous times in respect to the various operations or the various elementary operations which have been applied to the same.
Week 4
In this particular week I have learned and studied in relation to the elementary sorts and the its various elements which have...
SOLUTION.PDF