Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

this lab, we will work with K-means clustering algorithm. The data file, crime_data.csv, can befound under Lab 4 module on D2L. Save it in your working directory. The dataset is about crime rate...

1 answer below »
this lab, we will work with K-means clustering algorithm. The data file, crime_data.csv, can befound under Lab 4 module on D2L. Save it in your working directory. The dataset is about crime rate oroccurrence for 50 states in the US. Crimes include murder, assault,and rape. The urban population (inmillions) of the states, and the predefined cluster is also provided. As you can see, this is a 5 dimensionaldataset with 4 predefined clusters this lab, we will work with K-means clustering algorithm. The data file, crime_data.csv, can befound under Lab 4 module on D2L. Save it in your working directory. The dataset is about crime rate oroccurrence for 50 states in the US. Crimes include murder, assault,and rape. The urban population (inmillions) of the states, and the predefined cluster is also provided. As you can see, this is a 5 dimensionaldataset with 4 predefined clusters
Answered Same Day Aug 08, 2021

Solution

Subhanbasha answered on Aug 09 2021
141 Votes
Report
For doing clustering we need two packages to do clustering and visualize.
# Installing required packages
install.packages('ggplot2')
install.packages("animation")
# Calling required packages
li
ary(ggplot2)
li
ary(animation)
# Reading data set
iris_data <-read.csv('iris.csv')
Before do the clustering need to check the normalization. If the data not follows normal distribution need to convert it into normal.
# Data overview and distribution
plot(iris_data$Petal.Length)
plot(iris_data$Petal.Width)
From the above two plots the data not following the normal distribution so we need to convert it into normal distribution. Here we have selected only two features for the analysis.
# Normalizing the data
normIt <- function(feature){
normalized <- ((feature - min(feature)) / (max(feature) - min(feature)))
return (normalized)
}
nor_iris <- apply(iris_data[,c(4,5)], 2, FUN = normIt)
nor_iris <- as.data.frame(nor_iris)
After normalizing the data need to check or find the optimal cluster value to get the good classification.
# Finding optimal cluster value
set.seed(200)
k.max <- 10
wss<- sapply(1:k.max,function(k){kmeans(iris_data[,c(4,5)],k,nstart = 20,iter.max = 20)$tot.withinss})
wss
plot(1:k.max,wss, type= "b", xlab = "Number of clusters(k)", ylab = "Within cluster sum of squares")
The above plot will be useful to identify the optimal clusters. By using the plot the optimal cluster value is 3 so we can use 3...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here