Problem 4 (9 marks, 1.5 pages). A database of n records, each of which has size m bits
(we assume that m is large) is maintained at two different servers X and Y. As sometimes
a record may get updated at one server and not at the other, the servers X and Y have
to sync their (possibly different) databases x = (x1,...,x,) and y = (y1,...,,) from time
to time to make sure they are the same. At the synchronization time, the two servers
exchange some data in one or several rounds of communications.
A trivial synchronisation algorithm is as follows: X sends x via the internet to Y,
which verifies every record of x against that of its database y; foreach 1
then Y does nothing, otherwise, it will either update y; — x; if x; is newer or send (i, y;)
to X if its record is newer; upon receiving a pair (i,y;), X updates x; — y;. We assume
there is an efficient mechanism (which is not our focus) to decide which version of the
ecord is newer (e.g., using a naming convention like ver-1.2.1 is newer than ver-1.2.0).
While this solution works, it requires a huge bandwidth because at least n records have
to be sent across the network.
Our goal is to design a more efficient protocol/algorithms using a tool called crypto-
graphic hash function. A cryptographic hash function A(-) takes as input any piece of
data x of any size and returns a fixed-size string h(x) (e.g., h(x) is of size 256 bits fo
SHA-256). Moreover, different from a non-cryptographic hash function (Week 11), they
are collision-resistant, that is, it’s hard to find two different inputs that hash to the same
output. As a consequence, in practice, we can assume that if a # b then h(a) # h(b). Fo
simplicity, we assume that the following assumptions hold.
* At most one item is different between the two databases, that is, either x; = y; fo
alli=1,...,n, or there exists an index i* such that x;+ # y;» and x; = y; forall i #i*.
* The computation complexity is measured by the number of hashes required to
e computed by both X and Y and also by the amount of data that are hashed. Fo
instance, the trivial protocol described above requires no hash computation, and
hence, the computation complexity is zero.
* The communication complexity is measured by the number of hashes and the
number of records communicated between X and Y. For instance, the trivial proto-
col sends no hashes and n records, which is very expensive (a record is much large
than a hash).
Your task is to design a synchronisation algorithm that requires a small communi-
cation complexity (most important) and also a reasonable computation complexity.
More specifically, you need to include the following in your answer.
a) (Overview - 2 mark) An overview of how your algorithm works, why it solves the
problem, and what are the communication and computation complexities.
.
(Detailed description - 3 marks) A detailed description of the main steps of the
algorithm, describing what X and Y do to sync the two databases. You could also
write your description using a pseudocode. The goal is to guarantee that other peo-
ple can understand clearly how the algorithm works. The format of the pseudocode
is not important.
c) (Demonstration - 2 marks) A demonstration of how the algorithm works on a
small example, e.g., when n = 8.
d
(Analysis - 2 marks) State the communication and computation complexities of
your algorithm and explain why it has these complexities. The input size is (n,m).
The given marks are maximum possible for the parts. Your actual marks depend on
how efficient your algorithm is (regarding the communication complexity (major crite-
ia) AND the computation complexity) and also how well it is presented. A co
ect and
efficient but poorly written solution could still attract a low mark. An inco
ect solution
will get a zero mark.
Problem 5 (5 marks, 2 pages). Research a well-known problem of your own interest in
any field (science, engineering, technology) that can be solved by a computer algorithm.
Write a 1- to 2-page report on a popular algorithm that solves that particular prob-
lem and include in your report: (1) a problem description and why it is important and/o
interesting, (2) the algorithm description, (3) a pseudocode, (4) a demonstration on a toy
example, and (5) a complexity analysis.
You could include a few (1-5) references that you used when researching the prob-
lem/algorithm, but the writing should be your own. A similarity score of 25% and above
etween your report and any existing source may indicate plagiarism. The report should
e typed in a text editor, e.g., words or Latex, and not handwritten. Marks will be
decided based on the co
ectness, clarity, and the sophistication of the problem/al-
gorithm discussed. A report that is not well written or about a trivial/straightforward
problem/algorithm will receive a low mark.
Note that the problem/algorithm should NOT be among those already discussed in
the pre-recorded lectures/workshops. If you present a problem/algorithm that has been
discussed in the lectures/workshops, you will get a zero mark for Problem 5.
You could start from our textbook or check the following list from Wiki for a start.
https:
en.wikipedia.org/wiki/List_of algorithms