CSCE 411 Design and Analysis of Algorithms Project 1 I. Some Basics on Line Fitting Suppose that there is a set P of n points in the two-dimensional plane, denoted by (x1, y1), (x2, y2), · · · , (xn,...

1 answer below »

CSCE 411 Design and Analysis of Algorithms
Project 1
I. Some Basics on Line Fitting
Suppose that there is a set P of n points in the two-dimensional plane, denoted by
(x1, y1), (x2, y2), · · · , (xn, yn),
and suppose x1 < x2 < · · · < xn. Given a line L defined by the equation
y = ax+ b,
we say that the e
or of L with respect to P is the sum of its squared “distances” to the points in P :
E
or(L,P ) =
n∑
i=1
(yi − axi − b)2.
The above e
or shows how well the line L approximates the points in P : the smaller the e
or is, the better it fits.
We show an example in Fig. 1. In this example, there are n = 7 points, and a line that optimally fits them is shown.
Fig. 1. A line that optimally fits 7 points.
It is known that when n ≥ 2, given the set of points P , the line L that minimizes the e
or E
or(L,P ) is y = ax+
with
a =
n
∑n
i=1 xiyi − (
∑n
i=1 xi)(
∑n
i=1 yi)
n
∑n
i=1 x
2
i − (
∑n
i=1 xi)
2
and b =
∑n
i=1 yi − a
∑n
i=1 xi
n
.
If n = 1, then any line passing through the only point trivially fits the point perfectly well, and the e
or would be
E
or(L,P ) = 0.
In the following, given a set of n ≥ 1 points P , let’s use
ε(P )
to denote the minimum e
or that a line fitting them can achieve. (Clearly, we can achieve the minimum e
or ε(P )
y using the solution above.)
Fig. 2. Two lines that fit 12 points.
Fig. 3. Three lines that fit 18 points.
II. Multi-Line Fitting Problem
Sometimes it is hard to fit points well using just one line. Instead, it is more natural to fit the points with multiple
lines. For example, for the points in Fig. 2, it is more natural to fit them using two lines. And for the points in Fig. 3,
it is more natural to fit them using three lines.
It is not always simple, however, to know how many lines are the best choice beforehand. On the other hand, if
there is no restriction on how many lines can be used, it would become trivial to fit the points perfectly: just have a
different line pass through each pair of consecutive points in P . But that would make the problem meaningless. The
problem is essentially a “modeling” problem: we want to model a set of “observed points” using a simple yet accurate
model, and in this case, the model is a set of lines, each for a different interval of x. If too many lines are used, the
model would not be “simple” any more. Thus, intuitively, we would like a problem formulation that requires us to fit
the points well, using as few lines as possible.
Let’s formulate our problem, which we shall call the “Multi-Line Fitting Problem”:
Input: We have the following inputs:
1) A set of n points in a 2-dimensional plane
P = {(x1, y1), (x2, y2), · · · , (xn, yn)}
with x1 < x2 < · · · < xn. We shall use pi to denote the point (xi, yi), for i = 1, 2, · · · , n.
2) A real number C > 0.
Output: A partition of the points P into k segments:
P1 = {p1, p2, · · · , pm1}
P2 = {pm1+1, pm1+2, · · · , pm2}
P3 = {pm2+1, pm2+2, · · · , pm3}
...
Pk = {pmk−1+1, pmk−1+2, · · · , pmk = pn}
that minimizes the cost function
cost(P,C) = ε(P1) + ε(P2) + · · ·+ ε(Pk) + Ck.
In the cost function cost(P,C), the part ε(P1) + ε(P2) + · · · + ε(Pk) is the total e
or of using k lines to fit the n
points, and Ck is the penalty for using k lines (namely, for partitioning the n points into k segments). The greater k
is, the greater the penalty term Ck is. Note that the value of
k,
as well as the turning points
pm1 , pm2 , · · · , pmk−1 ,
are what an algorithm solving the problem needs to decide on.
III. What to Submit
In this project, you need to submit three items: (1) an algorithm report, (2) a program that implements you
algorithm, (3) output of your program on test instances.
A. Submission Item One: An Algorithm Report
First, you are to design an efficient algorithm for the “Multi-Line Fitting Problem”, and submit it as a report. The
eport is just like what we usually do for algorithm design in a homework, and should have the four elements as usual:
1) The main idea of your algorithm.
2) The pseudo code of your algorithm.
3) The proof of co
ectness for your algorithm.
4) the analysis of time complexity for your algorithm.
B. Submission Item Two: A Program That Implements Your Algorithm
You need to implement your algorithm using a commonly used programming language, such as Python, C++, Java,
etc. We encourage you to use Python if possible, but other languages are fine, too.
Your program will take a list of instances of the “Multi-Line Fitting Problem” as input, and return a list of solutions
(co
esponding to those instances) as its output.
To test your program, we will provide you with input instances as a “dictionary” in Python (after you have submitted
the program), and ask you to submit the output solutions as a “dictionary” in Python. (Details on the two files,
including their formats, will be explained in the next subsection for “Submission Item Three”.)
When you submit your program, you need to explain clearly how to run your code. If your programming language
is not Python, then you also need to include an explanation on how you turned the provided “dictionary” of “input
instances” in Python to your programming language, and how you turned your “output solutions” in your programming
language to a “dictionary” in Python (following the formats specified in the next subsection).
C. Submission Item Three: Output of Your Program on Test Instances
To test your algorithm/program, we will provide you with three sets of “input instances”, all in the Python language:
1) A set of instances of relatively small sizes.
2) A set of instances of medium sizes.
3) A set of instances of relatively large sizes.
Of course, if your algorithm/program is co
ect, then you should get the co
ect solutions for all three sets of input
instances. However, an algorithm is not only about co
ectness, but also about efficiency. If your algorithm’s time
complexity is low, then computing should be easy. (When we tested the algorithm in Google Colab using Python, the
set of small instances took less than 1 minute 20 seconds, the set of medium-sized instances took about 14 minutes
30 seconds, and the set of large instances took about 13 minutes 30 seconds.) However, if your algorithm’s time
complexity is high, you may experience much longer running times, or may not be able to get the solutions to the
large instances.
To give you an idea what the three sets of input instances are like, we provide three “example” files (down-
loadable from our course webpage): “examples_of_small_instances”, “examples_of_medium_instances”, “exam-
ples_of_large_instances”. They are all Python dictionaries, whose format will be explained below. You can down-
load them by clicking on the links in our course webpage. After downloading each file, you can open it using
pickle.load(open(filePath, ‘
’)), where “filePath” is the path of your downloaded file. (Of course, these three files are
different from the three files we will send you for testing your program. But they are similar.)
After you run your program on the three sets of “input instances”, save your results as three Python “dictionaries”
in three files:
1) File “small_solutions” co
esponding to the input instances of small sizes.
2) File “medium_solutions” co
esponding to the input instances of medium sizes.
3) File “large_solutions” co
esponding to the input instances of large sizes.
Now let’s explain the formats of the “input instances” and “output solutions”.
The “input instances” is a “dictionary” (in Python) that contains 4 key-value pairs (in the following, let’s use m to
denote the number of provided instances of the “Multi-Line Fitting Problem”):
1) “key” is “n_list”, “value” is a list of m numbers. For i = 0, 1, · · · ,m − 1, n_list[i] is the number of points in
the i-th instance (that are to be fitted by lines). (Note that here and in the following, the points will be indexed
y 0, 1, 2, · · · instead of by 1, 2, 3, · · · .)
2) “key” is “ x_list, “value” is a list of m items. For i = 0, 1, · · · ,m − 1, x_list[i] is a list of n_list[i] numbers.
For i = 0, 1, · · · ,m − 1 and j = 0, 1, · · · , n_list[i] − 1, x[i][j] is the x-coordinate of the j-th point in the i-th
instance. Here x[i][0] < x[i][1] < x[i][2] < · · · < x[i][n_list[i]− 1].
3) “key” is “ y_list, “value” is a list of m items. For i = 0, 1, · · · ,m − 1, y_list[i] is a list of n_list[i] numbers.
For i = 0, 1, · · · ,m − 1 and j = 0, 1, · · · , n_list[i] − 1, y[i][j] is the y-coordinate of the j-th point in the i-th
instance.
4) “key” is C_list, “value” is a list of m numbers. For i = 0, 1, · · · ,m−1, C_list[i] is a positive real number, which
is the input parameter C specified in the definition of the “Multi-Line Fitting Problem” for the i-th instance.
(That is, for the i-th instance, C_list[i] is the extra cost for every extra line we use to fit points.)
We can see that in the above dictionary, for i = 0, 1, · · · ,m−1, the four items – n_list[i], x_list[i], y_list[i], C_list[i]
– describe all the information we need to know about the i-th instance.
The “output solutions”

2022fallcsce411project1-3lcjaeij.pdf examplesofinstances-jycm1jb1-l54fdjbw-3c5vew4d.zip

Answered 1 days After Sep 25, 2022

Solution

Ximi answered on Sep 27 2022

68 Votes

SOLUTION.PDF

CSCE 411 Design and Analysis of Algorithms Project 1 I. Some Basics on Line Fitting Suppose that there is a set P of n points in the two-dimensional plane, denoted by (x1, y1), (x2, y2), · · · , (xn,...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment