CSI 5810 Assignment # 4
1. Let , , and be four items for clustering.
Consider the following three partitions:
A.
B.
C. .
Determine the partition favored by the sum-of-square-error (SSE) clustering
criterion.
2. Consider the following eight records; each record is described by two quantitative
attributes:
A = (2, 10)t, B = (2, 5)t, C = (8, 4)t, D = (5, 8)t, E = (7, 5)t, F = (6, 4)t G = (1, 2)t, H = (4,
9)t.
Your task is to apply complete link clustering to this data and produce the
dendrogram.
3. In this exercise, you will perform k-means clustering on wine data. You will repeat
the clustering using the following values of k: 2,3, 4, and 5. In each case you will
determine the SSE value and calculate the value of Rand index and tabulate your
results.
4. In this exercise, you will build a linear predictive model to predict crime rate based
on a number of factors. The data is in the “crime-rate” file. You will build the model
by writing your own script for gradient search. Experiment with 2-3 learning rates
to see the effect of learning rate on the search.
5. Build a model to predict corn yield with two independent variables fertilizers
and insecticides. The data for this task is as follows. You will use the pseudoinverse approach to build the model. Alco calculate the R-square coefficient
to assess model’s goodness.
Corn Fertilizer Insecticides
40 6 4
44 10 4
46 12 5
48 14 7
52 16 9
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
x1 = (4 5)
t x2 = (14)
t x3 = (01)
t x4 = (5 0)
t
P1 = 1 2 P2 = 3 4 {x ,x }, {x ,x }
P1 = 1 4 P2 = 2 3 {x ,x }, {x ,x }
P1 = 1 2 3 P2 = 4 {x ,x ,x }, {x }
6. In this exercise, you will use the 250 examples of two classes that were generated in
Q6 of the assignment #3. Each example there-in had two features, x1 and x2. You
will augment these features by three new features defined as follows: feature x3 =
x1*x2, feature x4 = x1*x1, and feature x5 = x2*x2. You will then train and test a
logistic classifier with the augmented vectors. Use 80:20 ratio to split your data into
training and test sets. Once the model is trained and tested, you will use the model
parameters to plot the decision boundary of the logistic classifier.