Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Create a Python Notebook with your response to the following questions. Ensure that your work is presented neatly. Explain all results and all visualizations should have clearly labeled axes and a...

1 answer below »

Create a Python Notebook with your response to the following questions. Ensure that your work is presented neatly. Explain all results and all visualizations should have clearly labeled axes and a meaningful title.

Question 1: Loading Data (50 pts)

Write python code to answer the questions below and ensure that you round all numeric calculations to 2 decimal places.

  1. (0pts) Load the attached data into a pandas dataframe:diamonds.csvDownload diamonds.csv
    This dataset contains the price of diamonds based on various attributes. For more information about the variables, read the description onkaggle.comLinks to an external site.
  2. (5 pts) Select any two variables of your choice and explain its statistical summary, e.g. mean, median, min, max, etc.
    Note: you can use the .describe() method from the dataframe to obtain the descriptive statistics, or any suitable approach.
  3. (5 pts) Create a bar chart that shows the frequency of diamonds grouped bycut. Explain the chart.
  4. (5 pts) Create a scatterplot that shows the relationship betweencaratandprice. Explain the chart and comment on the relationship between the variables.
  5. (10 pts) Calculate the Pearson Correlation coefficient ofcaratandprice. Explain the results and discuss the strength of the correlation.
  6. (15 pts) Create a histogram (or boxplot) that shows the distribution ofpricesbased on the quality of thecut. Explain the charts and skew.
    Note: this question is asking you to show the distribution of diamondpricesfor eachcut. There should be a separate histogram/boxplot for eachcut.
  7. (10 pts) Using a 2-sample t-test**, determine if there is a statistical difference between thepriceof diamonds with acutthat is considered:
    1. 'Good' vs 'Very Good'
    2. 'Premium' vs 'Ideal'
      **set alpha to 5% i.e. 0.05

Here is the kaggle url:Diamonds | Kaggle
Answered 3 days After Feb 16, 2023

Solution

Baljit answered on Feb 20 2023
32 Votes
2/20/23, 10:24 AM diamonds - Jupyter Notebook
localhost:8888/notebooks/diamonds.ipynb#Explanation:- 1/5
Question 1
In [1]:
1. Loading Of Data
In [2]:
2. Statstical summary of two variables
We will perform the statstical summary of carat and prices column.
In [3]:
Explanation:-
As we can see from the above output mean of carat is 0.80,standard deviation is 0.47 and its values varies from range 0.20 to 5.01. Mean of price is
3932.80,its standard deviation is 3989.44 and its values varies from 326.00 to 18823.00.Since it has high standard deviation so its result is more spread out
from mean.
3. Bar chart that shows the frequency of diamonds grouped by cut.
In [4]:
Explanation
As we can see from above bar chart most of the diamonds belongs to ideal cut i.e more than 20000 and then to premium ,very good ,good and Fair cut.Fai
cut has lowest number of diamonds
Out[3]:
carat price
count 53940.00 53940.00
mean 0.80 3932.80
std 0.47 3989.44
min 0.20 326.00
25% 0.40 950.00
50% 0.70 2401.00
75% 1.04 5324.25
max 5.01 18823.00
#Li
ary import
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import seaborn as sns
#Loading of Data
diamonds=pd.read_csv('diamonds.csv')
diamonds[["carat","price"]].describe().round(2)

d=diamonds.groupby(['cut']).size().plot(kind='bar')
2/20/23, 10:24 AM diamonds - Jupyter...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here