CS161 Homework 2
Summer 2022
For the most up-to-date version of this document, please see: https:
www.overleaf.
com
ead/mrzy
kxxgsj
Problem 1: Word Representations
As described in class, RadixSort runs in O(d(n+b)) time, where n is the number of elements,
is the base, and d (“digits”) is the max length of the elements.
(a) (2 pts.) Suppose we want to use RadixSort to sort the following list of words W , each
of which comes from the uppercase English alphabet, into lexicographic order (like in
a dictionary):
W = [HOW, VEXINGLY, QUICK, DAFT, ZEBRAS, JUMP]
Before sorting, we right-pad all words with ! characters (assume ! comes before
a alphabetically) to make them the same length.
NOTE: the problem originally said left-pad, but now that I think about it, it’s more
natural to sort words in lexicographic / dictionary order, so I’ve switched to right
padding. You can do either in your solution and we will accept either answer as long
as you’re consistent.)
Provide numerical values for the three variables n, b, and d. No explanation required.
Your answer goes here!
(b) (2 pts.) Now you happen to have the date (the day, month, year, all as base-10 num-
ers) that each word inW was first published in an English dictionary. (Assume that all
such dates are within the last 500 years.) You want to sort first by date (going forward
in time), then use lexicographic ordering (as in part (a)) to
eak ties. You will do this
y converting each of the original words in W into words with date information (digits)
prepended, appended, or inserted somewhere in the string, in an order that you choose.
Provide the string that you would use to represent the word ZEBRAS (first published,
say, May 4, 1604) so that it will be co
ectly sorted by RadixSort for the given ob-
jective, regardless of what the dates for the other words in W are. No explanation
equired.
Your answer goes here!
1
(c) (2 pts.) Hereafter we will leave the list W behind. You have a new list V representing
the text of a document (just the words, all consisting only of uppercase English alpha-
et letters). V has n words in it in all, where n > 2, and there are at least two distinct
words (but words may be repeated within the list).
You decide to save space and simply represent the first distinct word in V with the
inary value 0, the second distinct word with 1, the third distinct word with 10,
etc., continuing to increment by one each time in binary. All subsequent instances
of a word you have already seen get the same number. So, for instance, the text
[CS, ONE, SIXTY, ONE, CS, IS, FUN] would become [0, 1, 10, 1, 0, 11, 100].
Now you want to sort your list V (an a
itrary list, not the example above!) us-
ing RadixSort. Before running RadixSort, you left-pad the strings with 0s as needed
to make them equal-length.
Give the time complexity of RadixSort on this converted list, as a simplified big-
O expression in terms of n. No explanation required. Note that we are asking only fo
the time complexity of the RadixSort, not of converting the wordlist into binary.
Your answer goes here!
(d) (2 pts.) Now, assume the same setup as in the previous part, but because you don’t
want to mess with binary conversions, you decide instead to represent the distinct words
in V with ”one-hot” vectors (vectors with all 0’s except for a single ’1’ in a position
co
esponding to a particular word). For example, the [CS, ONE, SIXTY, ONE, CS, IS, FUN]
example from before has five distinct words, and it would become
[10000, 01000, 00100, 01000, 10000, 00010, 00001].
As before, give the time complexity of RadixSort on this converted list, as a sim-
plified big-O expression in terms of n. No explanation required. Again, we only want
the time complexity of the RadixSort part itself.
Your answer goes here!
Problem 2: Proof Planning
Waverly has gotten into the habit of running proofs by Te
y. Te
y is preoccupied with a lot
of work, but they still want to help Waverly as much as possible. Waverly has an estimate
of how long it takes to run each proof by Te
y. So she sends an a
ay A = [a1, ..., an] of
these time lengths (measured in minutes) to Te
y. Te
y has put aside a total of L minutes
to help Waverly today, and they want to go over as many proofs as possible.
2
(a) (6 pts.) Design an algorithm, MaximumProofs, that calculates the maximum numbe
of proofs that fit within L minutes and runs in time O(n), where n is the length of A.
You may assume that all time lengths in A are distinct and positive, but (clarifica-
tion added July 1) you may not assume that the times are, e.g., integers –
i.e., RadixSort is not viable for this problem. You may ignore divisibility issues
(such as floors and ceilings) throughout this problem. You may use any algorithm we
have learned in class as a function.
For this problem, please provide pseudocode (we don’t care about the exact format),
or, if you prefer, code in a language like Python or C++. We think the solution, though
not super complicated, has enough details that it’s worth writing out the algorithm
igorously. However, you do not need to prove co
ectness or justify the runtime – just
the algorithm is sufficient.
Input: A
ay A of n distinct positive values (not necessarily integers) indicat-
ing the time it takes to verify each proof, and number L ≥ 0 indicating how much time
Te
y has put aside.
Output: The maximum number of proofs that Te
y can verify in ≤ L minutes.
Here are some sample inputs and the co
ect outputs below.
• For A = [5, 1, 6, 2, 3] and L = 3, the co
ect output is 2.
• For A = [20, 10, 30] and L = 5, the co
ect output is 0.
• For A = [4, 1, 2, 3] and L = 1000, the co
ect output is 4.
• ForA = [1.56287683628568 XXXXXXXXXX, 1.43712316371431 XXXXXXXXXX]
and L = 3, the co
ect output is 2.
Your answer goes here!
(b) (2 pts.) Now prove the runtime of the MaximumProofs algorithm which you have just
designed. That is, argue using a recu
ence relation, or otherwise, that your algorithm
satisfies the time bound of O(n). (There is a way to do this pretty
iefly.)
Your answer goes here!
Problem 3: Randomized Algorithms
In this exercise, we’ll explore different types of randomized algorithms. Recall from class
that a randomized algorithm is a Las Vegas algorithm if it is always co
ect (that is, it
eturns the right answer with probability 1), but the running time is a random variable. We
3
say that a randomized algorithm is a Monte Carlo algorithm if there is some probability
that it is inco
ect. For example, QuickSort (with a random pivot) is a Las Vegas algorithm,
since it always returns a sorted a
ay, but it might be slow if we get very unlucky.
Consider the following three algorithms for finding a majority element in a list that cer-
tainly contains a strict majority element. (That is, there is one element in the list
that appears strictly more than n
2
times.) Suppose that nothing else is known about the
distribution of the list.
(8 pts.) Fill in the following table. You may use asymptotic notation for the running
times, or ∞ for a procedure that may not terminate. For the probability of returning a
majority element, give the tightest lower bound that you can given the information provided;
this may entail making the most conservative assumptions possible about the contents of
the list. (Remember that big-O always assumes a worst-case input, even for randomized
algorithms where the randomness of the choices might vary.) No explanations required.
Algorithm Monte Carlo o
Las Vegas?
Expected
unning
time
Worst-case
unning
time
Probability of return-
ing a majority element
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 1: findMajorityElement1
Input: A population P of n elements
while true do
Choose a random p ∈ P ;
if isMajority(P , p) then
eturn p;
Algorithm 2: findMajorityElement2
Input: A population P of n elements
for 100 iterations do
Choose a random p ∈ P ;
if isMajority(P , p) then
eturn p;
eturn P [0];
4
Algorithm 3: findMajorityElement3
Input: A population P of n elements
Put the elements in P in a random order.
* Assume it takes time Θ(n) to put the n elements in a random orde
*
for p ∈ P do
if isMajority(P , p) then
eturn p;
Algorithm 4: isMajority
* This is a helper called by the other three algorithms. *
Input: A population P of n elements and a element p ∈ P
Output: True if p is the majority element of the list
count ← 0;
for q ∈ P do
if p = q then
count ++;
if count > n/2 then
eturn True;
else
eturn False;
Your answer goes here!
Problem 4: What Happens If I...?
Some algorithms have constants or properties that seem magical at best and a
itrary at
worst, and it’s not clear why they’re there. In this problem, we’ll tweak a couple of the
algorithms from this unit to see what
eaks.
(a) (5 pts.) Show that if the median-of-medians part of the k-Select algorithm is modified
to use groups of 3 instead of groups of 5, the argument from lecture that the overall
k-Select algorithm is O(n) does not work. To do this, first derive a recu
ence T (n) fo
the running time of k-Select, and then show that there is no way to choose a constant
c such that the recu
ence is O(n).
We recommend closely following what we did in Lecture 3, and seeing what
eaks.
You do not need to reiterate the entire argument from Lecture 3, but you do need to
show where the differences arise. Also, to make your life easier:
• Assume that n ≥ 9, and that all the list elements are distinct.
5
• Be very precise about the number of elements that are guaranteed to be less than
the estimated median of medians, because the analysis is very tight. You do not
need to argue about the equivalent greater-than case.
• You do not need to address the possible issue of the length of the list (or subse-
quent sublists) not being divisible by 3, which we also glossed over in lecture.