Lab 3_Naive Bayes1.docxMIS 545 Lab 3: Naive Bayes ClassifiePredicting Mushroom Types1 OverviewIn...

Question

Lab 3_Naive Bayes1.docxMIS 545 Lab 3: Naive Bayes ClassifiePredicting Mushroom Types1 OverviewIn this lab, we will apply Naive Bayes to a Mushroom dataset. You can find a Mushroom dataset under D2L > Labs > Lab 3, called Mushroom.csv. Save it in your working directory. In the Mushroom dataset, there are 8123 observations belonging to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. There are two types of mushroom in terms of edibility. If classes=e, the mushroom is edible, if classes=p, the mushroom is poisonous. We want to tell which mushrooms are edible from those poisonous by looking at some of their characteristics. Note: The original dataset is downloaded from University of California Irvine’s machine learning data repository. For more details, please go to https:archive.ics.uci.edu/ml/datasets/Mushroom.2 Data PackagesWe will need to install e1071 package for this lab, which is a well-developed public package on CRAN. #  install package “e1071”install.packages("e1071")#  To use the package in an R session, we need to load it in an R session via liary()liary(e1071)3 PreprocessingSave Mushroom.csv under your working directory. Different from a clean dataset in Mushroom.csv, the null value in mushroom dataset is denoted by question mark. Given so, we will slightly adjust our read.csv() function. #   read in csv file mushroom.csv. Note the question mark represents null valuemushroom Call function summary() to see how our data looks like in general. Look at column stalk_root. It is the only one column includes NAs. summary(mushroom)#   check completionnrow(mushroom[!complete.cases(mushroom),])## [1] 2480Recall Naive Bayes is an algorithm depends on probability. To predict a conditional probability, we have to figure out the prior probability of each predictive variables. Therefore, a dataset with null value will raise risk for our prediction. #   we can retain observations that do not contain NA(null) valuemushroom = mushroom[complete.cases(mushroom),]4 Training and testing setsNext, we will create train and test sets of the data. We will fit the model with the trainingset, and use the test set to evaluate the model. We will do a 70/30 split (70% will be trainingdata).#   70% of original data will be used for trainingsample_size #   randomly select index of observations for trainingtraining_index train test 5 Fitting and model performanceThere is a Naive Bayes classifier in the e1071 package, loaded into our cuent session already via function liary(e1071). Fit the model to the training data. #   note the period coming after tilde. It means all the other variables in that dataset will be predictive variablemushroom.model #   We can explore the detail conditional probabilities for each variables by calling the object mushroom.model itself. mushroom.modelAfter fitting, run the test data through the model to get the predicted class for each observation.#   The result of prediction, a vector, will be attached to test set labelled as “class”. The return of prediction is a vector including predicted type of mushroommushroom.predict Show the performance metrics of the model:#   pick actual value and predicted value together in a dataframe called resultsesults #   we can get a popular matrix called confusion matrix via function table to evaluate the performance of our predictiontable(results)#   columns indicate the number of mushrooms in actual type; likewise, rows indicate the number those in predicted type. #   for example, we successfully predicted 1067 mushroom as edible, and 580 as poisonous. However, we mistake 46 poisonous mushroom for edible.  XXXXXXXXXXactual XXXXXXXXXXpredicted     e     p XXXXXXXXXXe XXXXXXXXXX XXXXXXXXXXp XXXXXXXXXX1Lab 3 Source.R#   Set working directory, please change the code below#   according to your own situationsetwd("Input your working directory")#    e1071 and kalRif(!require(e1071)){ XXXXXXXXXXinstall.packages("e1071") XXXXXXXXXXliary(e1071)}if(!require(caret)){ XXXXXXXXXXinstall.packages("caret") XXXXXXXXXXliary(caret)}#   read in datasetmushroom                       , na.strings = '?'                      )#   total number of mushroomnrow(mushroom)###   8123#   number of mushroom with na valuenrow(mushroom[!complete.cases(mushroom),])### XXXXXXXXXX###   we can delete observations with missing valuemushroom = mushroom[complete.cases(mushroom),]#   data should be clean nowsummary(mushroom)###########################   mushroom type  ############################  types of mushroomslevels(mushroom$classes)#   distribution of typessummary(mushroom$classes)### take 70% as training setsample_size ###   randomly decide which ones are training datatraining_index train test #   take all explanatory variables to predictmushroom.model                              , data = train                             )#   details of model explain conditional probabilitymushroom.model#   run the test data #   the result of prediction, a vector, #   will be attached to test set labelled as "class"mushroom.predict                             , test                            , type = 'class'                            )#   pick actual value and predicted value together in a dataframe called resultsesults #   we can get a popular matrix called confusion matrix via function table()#   to evaluate the performance of our predictiontable(results)Mushroom.csvclasses,cap_shape,cap_surface,cap_color,if_uises,odor,gill_attachment,gill_spacing,gill_size,gill_color,stalk_shape,stalk_root,stalk_surface_above_ring,stalk_surface_below_ring,stalk_color_above_ring,stalk_color_below_ring,veil_type,veil_color,ring_number,ring_type,spore_print_color,population,habitate,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,ge,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,mp,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,ue,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,ge,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,n,ge,b,s,w,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,n,me,b,y,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,s,mp,x,y,w,t,p,f,c,n,p,e,e,s,s,w,w,p,w,o,p,k,v,ge,b,s,y,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,s,me,x,y,y,t,l,f,c,b,g,e,c,s,s,w,w,p,w,o,p,n,n,ge,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,s,me,b,s,y,t,a,f,c,b,w,e,c,s,s,w,w,p,w,o,p,n,s,gp,x,y,w,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,n,v,ue,x,f,n,f,n,f,w,b,n,t,e,s,f,w,w,p,w,o,e,k,a,ge,s,f,g,f,n,f,c,n,k,e,e,s,s,w,w,p,w,o,p,n,y,ue,f,f,w,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,gp,x,s,n,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,gp,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,n,s,up,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,n,s,ue,b,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,s,mp,x,y,n,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,n,v,ge,b,y,y,t,l,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,s,me,b,y,w,t,a,f,c,b,w,e,c,s,s,w,w,p,w,o,p,n,n,me,b,s,w,t,l,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,s,mp,f,s,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,n,v,ge,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,me,x,y,w,t,l,f,c

Subhanbasha · Accepted Answer

# Installing the required package
install.packages("e1071")
# Calling the package
library(e1071)
# Question 1
# Reading data into R
Balance_Scale

Lab 3_Naive Bayes1.docx MIS 545 Lab 3: Naive Bayes Classifier Predicting Mushroom Types 1 Overview In this lab, we will apply Naive Bayes to a Mushroom dataset. You can find a Mushroom dataset under...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment