Great Deal! Get Instant \$25 FREE in Account on First Order + 10% Cashback on Every Order Order Now

# 3/6/22, 2:27 AM Assignment 4: Prediction https://wssu.instructure.com/courses/19153/assignments/325905?module_item_id= XXXXXXXXXX/2 Assignment 4: Prediction Due Tuesday by 11:59pm Points 80 Submitting...

3/6/22, 2:27 AM Assignment 4: Prediction
https:
wssu.instructure.com/courses/19153/assignments/325905?module_item_id= XXXXXXXXXX/2
Assignment 4: Prediction
Due Tuesday by 11:59pm Points 80 Submitting a file upload
Start Assignment
I Do the following experiments:
1. Run Weka's Naive Bayes on the original loan dataset (Loan_original.arff), 2-bin and 3-
in discretized data using training set test option..
Examine the e
or rate and identify the wrongly classified instances. To do the latter, right click on the
cu
ent line in the result list window and then select "Visualize classifier e
ors". The wrongly classified
instance will show in the plot in small squares. Click on an instance to get its information.
XXXXXXXXXXa) How is the model represented? What do the counts mean and why are they incremented by 1
(i.e. actual value count+1)?
XXXXXXXXXXb) What actually happens when you test the classifier on the training set?
XXXXXXXXXXc) How do e
ors occur? What could be the reasons?
XXXXXXXXXXd) How does the e
or rate change over the three different data sets (original, 2-bin, 3-bin)? Any
guess why?
2. Run Weka's IBk algorithm on the original loan data (no discretization) with the "Use training set"
test option and examine the evaluation results (co
ectly/inco
ectly classified instances). Vary KNN (the
number of neighbors) with and without distance weighting. Try for example KNN=1,3,5,20 without
distance weighting and with weight = 1/distance. Compare the results and find explanations.
Answer the following questions by looking at what the algorithm does for each instance from the test set
(which in this case is also a training set). Find conceptual level explanations, no need to go into
computing distances:
XXXXXXXXXXe) What actually happens when you test the classifier on the training set?
XXXXXXXXXXf) How do e
ors occur? What could be the reasons?
XXXXXXXXXXg) How do the e
or rate change with the KNN parameter in IBk?
3. Decide on the application of a new customer.
3/6/22, 2:27 AM Assignment 4: Prediction
https:
wssu.instructure.com/courses/19153/assignments/325905?module_item_id= XXXXXXXXXX/2
Prepare a test set using the information of the new customer described in the Assignment in 3: A
customer applying for a 30 month loan with 80,000 yen monthly pay to buy a car from a custome
with the following data: male, employed, 22 years old, not ma
ied, does not live in a problematic
area, has worked 1 year for his last employer and has 500,000 yen in a bank.
(https:
classification and prediction”.
Run Naive Bayes and IBk with different parameters for KNN and distance weighting, all with
"Supplied test set" test option.
Compare the prediction results obtained with different algorithms.
Decide on the loan application of the new customer by using the outputs from the prediction
algorithms.
II Write a report on the prediction experiments described above. Include the following information
(DO NOT include data sets or classifier outputs):
The original 7 questions (4 about Bayes and 3 about IBK) with short answers to EACH ONE.
ONE Naive Bayes model (any version of the loan data set) with explanations of its parameters (the
answer to #1 (a) may be included here).
Results from predicting the new customer's classification (Experiments with Prediction, #3) with short
comment.
https:
wssu.instructure.com/courses/19153/files/2875098?wrap=1
https:

Chapter 4
Data Mining
Practical Machine Learning Tools and Techniques
Slides for Chapter 4, Algorithms: the basic methods
of Data Mining by I. H. Witten, E. Frank,
M. A. Hall and C. J. Pal
2
Algorithms: The basic methods
• Simple probabilistic modeling
• Linear models
• Instance-based learning
3
Can combine probabilities using Bayes’s rule
• Famous rule from probability theory due to
• Probability of an event H given observed evidence E:
• A priori probability of H :
• Probability of event before evidence is seen
• A posteriori probability of H :
• Probability of event after evidence is seen
Thomas Bayes
Born: 1702 in London, England
Died: 1761 in Tun
idge Wells, Kent, England
P(H |E)= P(E |H)P(H) /P(E)
P(H )
P(H |E)
4
Naïve Bayes for classification
• Classification learning: what is the probability of the
class given an instance?
• Evidence E = instance’s non-class attribute values
• Event H = class value of instance
• Naïve assumption: evidence splits into parts (i.e.,
attributes) that are conditionally independent
• This means, given n attributes, we can write Bayes’ rule
using a product of per-attribute probabilities:
P(H |E)= P(E1 |H)P(E3 |H)… P(En |H)P(H) /P(E)
5
Weather data example
?TrueHighCoolSunny
PlayWindyHumidityTemp.Outlook
Evidence E
Probability of
class “yes”
P(yes | E)= P(Outlook = Sunny | yes)
P(Temperature =Cool | yes)
P(Humidity = High | yes)
P(Windy = True | yes)
P(yes) / P(E)
=
2 / 9´3 / 9´3 / 9´3 / 9´9 /14
P(E)
6
The “zero-frequency problem”
• What if an attribute value does not occur with every
class value?
(e.g., “Humidity = high” for class “yes”)
• Probability will be zero:
• A posteriori probability will also be zero:
(Regardless of how likely the other values are!)
• Remedy: add 1 to the count for every attribute value-
class combination (Laplace estimator)
• Result: probabilities will never be zero
computed from small samples of data
P(Humidity =High | yes)= 0
P(yes |E)= 0
7
Modified probability estimates
• In some cases adding a constant different from 1 might
e more appropriate
• Example: attribute outlook for class yes
• Weights don’t need to be equal
(but they must sum to 1)
Sunny Overcast Rainy
8
Missing values
• Training: instance is not included in frequency count for
attribute value-class combination
• Classification: attribute will be omitted from calculation
• Example:
?TrueHighCool?
PlayWindyHumidityTemp.Outlook
Likelihood of “yes” = 3/9  3/9  3/9  9/14 = 0.0238
Likelihood of “no” = 1/5  4/5  3/5  5/14 = 0.0343
P(“yes”) = 0.0238 / XXXXXXXXXX) = 41%
P(“no”) = 0.0343 / XXXXXXXXXX) = 59%
9
Numeric attributes
• Usual assumption: attributes have a normal or Gaussian
probability distribution (given the class)
• The probability density function for the normal
distribution is defined by two parameters:
• Sample mean
• Standard deviation
• Then the density function f(x) is
10
Statistics for weather data
• Example density value:
5
14
5
No
9
14
9
Yes
Play
3/5
2/5
3
2
No
3/9
6/9
3
6
Yes
True
False
True
False
Windy
 =9.7
 =86
95, …
90, 91,
70, 85,
NoYesNoYesNoYes
 =10.2
 =79
80, …
70, 75,
65, 70,
Humidity
 =7.9
 =75
85, …
72,80,
65,71,
 =6.2
 =73
72, …
69, 70,
64, 68,
2/53/9Rainy
Temperature
0/54/9Overcast
3/52/9Sunny
23Rainy
04Overcast
32Sunny
Outlook
11
Classifying a new day
• A new day:
• Missing values during training are not included
in calculation of mean and standard deviation
?true9066Sunny
PlayWindyHumidityTemp.Outlook
Likelihood of “yes” = 2/9  0.0340  0.0221  3/9  9/14 = XXXXXXXXXX
Likelihood of “no” = 3/5  0.0221  0.0381  3/5  5/14 = XXXXXXXXXX
P(“yes”) = XXXXXXXXXX / XXXXXXXXXX000108) = 25%
P(“no”) = XXXXXXXXXX / XXXXXXXXXX000108) = 75%
12
Probability densities
• Probability densities f(x) can be greater than 1; hence,
they are not probabilities
• However, they must integrate to 1: the area under the
probability density curve must be 1
• Approximate relationship between probability and
probability density can be stated as
assuming ε is sufficiently small
• When computing likelihoods, we can treat densities just
like probabilities
P(x-e / 2 £ X £ x+e / 2) »e f (x)
13
Multinomial naïve Bayes I
• Version of naïve Bayes used for document classification
using bag of words model
• n1,n2, ... , nk: number of times word i occurs in the
document
• P1,P2, ... , Pk: probability of obtaining word i when
sampling from documents in class H
• Probability of observing a particular document E given
probabilities class H (based on multinomial distribution):
• Note that this expression ignores the probability of
generating a document of the right length
• This probability is assumed to be constant for all classes
14
Multinomial naïve Bayes II
• Suppose dictionary has two words, yellow and blue
• Suppose P(yellow | H) = 75% and P(blue | H) = 25%
• Suppose E is the document “blue yellow blue”
• Probability of observing document:
Suppose there is another class H' that has
P(yellow | H’) = 10% and P(blue| H’) = 90%:
• Need to take prior probability of class into account to make the
final classification using Bayes’ rule
• Factorials do not actually need to be computed: they drop out
• Underflows can be prevented by using logarithms
P({blue yellowblue} |H )= 3!´
0.751
1!
´
0.252
2!
=
27
64
P({blue yellowblue} |H )= 3!´
0.11
1!
´
0.92
2!
=
243
1000
15
Naïve Bayes: discussion
• Naïve Bayes works surprisingly well even if independence
assumption is clearly violated
• Why? Because classification does not require accurate
probability estimates as long as maximum probability is
assigned to the co
ect class
• However: adding too many redundant attributes will cause
problems (e.g., identical attributes)
• Note also: many numeric attributes are not normally
distributed (kernel density estimators can be used instead)
16
Classification
• Any regression technique can be used for classification
• Training: perform a regression for each class, setting the output to 1
for training instances that belong to class, and 0 for those that don’t
• Prediction: predict class co
esponding to model with largest output
value (membership value)
• For linear regression this method is also known as multi-
esponse linear regression
• Problem: membership values are not in the [0,1] range, so
they cannot be considered proper probability estimates
• In practice, they are often simply clipped into the [0,1]
ange and normalized to sum to 1
17
Linear models: logistic regression
• Can we do better than using linear regression for
classification?
• Yes, we can, by applying logistic regression
• Logistic regression builds a linear model for a transformed
target variable
• Assume we have two classes
• Logistic regression replaces the target
y this target
• This logit transformation maps [0,1] to (- , + ), i.e., the new
target values are no longer restricted to the
Answered 2 days AfterMar 06, 2022

## Solution

Mohd answered on Mar 08 2022
a) How is the model represented? What do the counts mean and why are they incremented by 1
(i.e., actual value count+1 )?
In naïve Bayes classification, model that impose class value or label to vector of features. On the contrary response variable labels are from a fixed set of values. Generally, we use count to address zero frequency problem. We increment the count by 1 because it eliminated the possibility of zero probability. Also stabilizes probability estimates.
b) What actually happens when you test the classifier on the training set?
If we test the classifier on the training set than there is good chance of training result overly optimistic. There would be minimal chance of e
or in model evaluation.
c) How do e
ors occur? What could be the reasons?
In original loan dataset, co
ectly classified instances are 100 percent or we can say 0 percent e
or rate. In two bin dataset, co
ectly classified instances are 85 percent. That means, there are 15 percent instances are inco
ectly classified. In three bin dataset, co
ectly classified instances are 95 percent. That means, there are 5 percent instances are inco
ectly classified. As we have increased number of bins from 2 to 3 accuracy has increased by 10 percent. That means more levels in attributes could lead to higher accuracy. There are various ways to deal with the assessment of the Bayes e
or rate exist. One strategy looks to get insightful limits which are innately reliant upon circulation boundaries, and thus challenging to appraise. Another methodology centers around class densities, while one more strategy joins and analyses different classifiers.
d) How does the e
or rate change over the three different data sets (original, 2-bin, 3-bin)? Any guess why?
The e
or rate is zero over the loan original dataset.
The e
or rate is 15 percent over the 2-bin dataset.
The e
or rate is 5 percent over the 3-bin dataset.
In Loan original dataset we have numeric attributes and nominal attributes, as we have
oad categories values of attributes. In 2 bin we have na
owed down all values into two categories. As we have increased bin in 3 bin e
or rates has reduced by 10 percent.
e) What actually happens when you test the classifier on the training set?
If we test the classifier on the training set than there is good chance of training result overly optimistic. There would be minimal chance of e
or in model evaluation. Model effectiveness will be low.
f) How do e
ors occur? What could be the reasons?
As we have increased number of KNN number of neighbors e
or rate has increased except KNN=5. As we have changed distance weighting by weight by 1/distance. E
or rate hasn’t change by changing number of KNN. Although mean square has increased slightly as we have increased KNN numbers.
In KNN we could have maximum accuracy and minimum rate at certain number of KNN. We have to find optimum number of KNN neighbors. Before building the model we should find optimum number of KNN neighbor’s.
g) How do the e
or rate change with the KNN parameter in IBk?
As we have increased number of KNN number of neighbors e
or rate has increased except KNN=5. As we have changed distance weighting by weight by 1/distance. E
or rate hasn’t change by changing number of KNN. Although mean square has increased slightly as we have increased KNN numbers.
Model output:
=== Run information ===
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: Loan
Instances: 20
Attributes: 11
Employed
LoanPurpose
Gende
Ma
ied
ProblematicArea
Age
MoneyInBank
Salary
LoanMonths
YearsEmployed
Approved
Test mode: evaluate on training data
=== Classifier model (full training set) ===
Naive Bayes Classifie
Class
Attribute Yes No
(0.64) (0.36)
==================================
Employed
Yes 14.0 5.0
No 1.0 4.0
[total] 15.0 9.0
LoanPurpose
Computer 9.0 3.0
Car 6.0 6.0
[total] 15.0 9.0
Gende
Male 8.0 4.0
Female 7.0 5.0
[total] 15.0 9.0
Ma
ied
Yes 7.0 5.0
No 8.0 4.0
[total] 15.0 9.0
ProblematicArea
Yes 2.0 2.0
No 13.0 7.0
[total] 15.0 9.0
Age
mean 31.453 29.4603
std. dev. 11.2303 13.128
weight sum 13 7
precision 3.5556 3.5556
MoneyInBank
mean 50.1923 34.5238
std. dev. 58.0167 21.8348
weight sum 13 7
precision 24.1667 24.1667
Salary
mean 5.8571 6.6327
std. dev. 3.7742 2.1878
weight sum 13 7
precision 1.8571 1.8571
LoanMonths
mean 16.9231 20.7429
std. dev. 5.6837 5.6221
weight sum 13 7
precision 4.4 4.4
YearsEmployed
mean 7.9327 2.2321
std. dev. 7.4169 2.187
weight sum 13 7
precision 3.125 3.125
Time taken to build model: 0 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Co
ectly Classified Instances 20 100 %
Inco
ectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute e
or 0.114
Root mean squared e
or 0.1722
Relative absolute e
or 24.84 %
Root relative squared e
or 36.0861 %
Total Number of Instances 20
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Yes
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 No
Weighted Avg. 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000
=== Confusion Matrix ===
a b<-- classified as
13 0 | a = Yes
0 7 | b = No
=== Run information ===
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: Loan-weka.filters.unsupervised.attribute.Discretize-B2-M-1.0-Rfirst-last-precision6
Instances: 20
Attributes: 11
Test mode: evaluate on training data
=== Classifier model (full training set) ===
Naive Bayes Classifie
Class
Attribute Yes No
(0.64) (0.36)
================================
Employed
Yes 14.0 5.0
No 1.0 4.0
[total] 15.0 9.0
LoanPurpose
Computer 9.0 3.0
Car 6.0 6.0
[total] 15.0 9.0
Gende
Male 8.0 4.0
Female 7.0 5.0
[total] 15.0 9.0
Ma
ied
Yes 7.0 5.0
No 8.0 4.0
[total] 15.0 9.0
ProblematicArea
Yes 2.0 2.0
No 13.0 7.0
[total] 15.0 9.0
Age
'(-inf-34]' 8.0 6.0
'(34-inf)' 7.0 3.0
[total] 15.0 9.0
MoneyInBank
'(-inf-77.5]' 10.0 8.0
'(77.5-inf)' 5.0 1.0
[total] 15.0 9.0
Salary
'(-inf-8.5]' 10.0 6.0
'(8.5-inf)' 5.0 3.0
[total] 15.0 9.0
LoanMonths
'(-inf-19]' 7.0 3.0
'(19-inf)' 8.0 6.0
[total] 15.0 9.0
YearsEmployed
'(-inf-12.5]' ...
SOLUTION.PDF