Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Major Project Jan Batch Major Project Take any Dataset of your choice ,perform EDA(Exploratory Data Analysis) and apply a suitable Classifier,Regressor or Clusterer and calculate the accuracy of...

1 answer below »
Major Project Jan Batch
Major Project
Take any Dataset of your choice ,perform EDA(Exploratory Data Analysis) and apply a
suitable Classifier,Regressor or Clusterer and calculate the accuracy of the model.
Answered 1 days After Jan 05, 2023

Solution

Pratyush answered on Jan 07 2023
35 Votes
1/7/23, 9:08 AM Iris Data Analysis.ipynb - Colaboratory
https:
colab.research.google.com/drive/12z-t9UQzBFiqjno8iNWVVCpYakVCE1sm#scrollTo=OrsRUgvdP6JZ&printMode=true 1/8
import numpy as np
import seaborn as sns
import warnings
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
warnings.filterwarnings("ignore", category = FutureWarning)
sns.set(style = "white" , color_codes = True)
import pandas as pd
dataset = pd.read_csv('Iris.csv')
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
5 6 5.4 3.9 1.7 0.4 Iris-setosa
6 7 4.6 3.4 1.4 0.3 Iris-setosa
7 8 5.0 3.4 1.5 0.2 Iris-setosa
8 9 4.4 2.9 1.4 0.2 Iris-setosa
9 10 4.9 3.1 1.5 0.1 Iris-setosa
dataset.head(10)
iris_setosa_data = dataset.iloc[0:50 , :]
iris_versicolor_data = dataset.iloc[51:100 , :]
iris_virginica_data = dataset.iloc[101:150 , :]
Mean, Variance, Standard Deviation
Mean
Mean is the CENTRAL TENDENCY or the AVERAGE VALUE of a set of given observations. Mathematical de�nation of mean is :
=XÌ„
∑n
i=1 xi
n
print("MEANS")
print(np.mean(iris_setosa_data['PetalLengthCm']) , "--setosa")
print(np.mean(iris_versicolor_data['PetalLengthCm']) , "--versicolor")
print(np.mean(iris_virginica_data['PetalLengthCm']) , "--virginica")
MEANS
1.464 --setosa
4.2510204081632645 --versicolo
5.542857142857144 --virginica
MEAN
By using the mean we can perform very initial EDA. For example, just by looking at the values of the means of the petal lenght, we can easily tell
that Iris Setosa has a much smaller petal length on average when comapre to Iris Versicolor and Iris Virginica.
Observations
#Mean with an outlie
print(np.mean(np.append(iris_setosa_data['PetalLengthCm'] , 50)))
2.4156862745098038
1/7/23, 9:08 AM Iris Data Analysis.ipynb - Colaboratory
https:
colab.research.google.com/drive/12z-t9UQzBFiqjno8iNWVVCpYakVCE1sm#scrollTo=OrsRUgvdP6JZ&printMode=true 2/8
Even after 50 values all say that the petal length of setosa �owers are around 1.464 cm , If there is even a single wrong data, it can shift the
mean wildly.
These e
or can happen because of human mistakes or data co
uption or any other reasons. Such data points are called as "OUTLIERS".
Variance
Variance represents the spread of the given observations. It is the average square distance of the observation from mean.
The formula for the variance of a population is given by :
Where
is the sum of squared e
ors
is the number of observations in the group
is the observation in the group
is the mean of the group
= =s2
SS
N
∑( −xi x̄)2
N
SS
N
xi...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here