Homework 5
of
STAT 3355 Data Analysis for Statisticians & Actuaries
Due: 2:30pm
October 26 (Tuesday), 2021
Let’s work on the dataset diamonds in the package ggplot2. You can use the following
code to load the data. Use necessary code to read the description of the dataset, which
contains 53940 samples and 10 variables.
# Install the package if you never did
install.packages("ggplot2")
# Load the pacakge
li
ary(ggplot2)
# Load the mpg dataset
data("diamonds")
Problem 1 (1× 5 = 5 points)
Use ggplot2 to visualize the data. You need to paste the resulting plots and related code
in order to get the full points. For each ggplot2 plot:
• make it complete
eadable, in other words, it should include axis label(s), title, and
legend if necessary;
• write 1–2 sentence about what the chart tells you about the data.
(a) Choose a bin number or a binwidth (Hint: See page 11 of lecture 04c.pdf), explain
why, and create a histogram of carat
(b) Make a scatter plot of y =price against x =carat and set the color to clarity
(c) Make a scatter plot of y =price against x =carat and add a smooth line to each
group of points defined by clarity
(d) Make a scatter plot of y =price against x =carat and facet it by clarity
(e) Show carat vs cut, make a point, a jitter, a box plot and a violin plot, respectively.
Which one is the best for visualization?
1
https:
elearning.utdallas.edu/webapps
lackboard/execute/content/file?cmd=view&mode=designer&content_id=_3086018_1&course_id=_171899_1
Problem 2 (1× 5 = 5 points)
Use ggplot2 to recreate the following plots with title. You need to paste the new plots and
elated code in order to get full points.
(a) Recreate the following two plots, add a short title, and comment on the merits of each
one compared to the othe
0
1000
2000
3000
4000
5000
I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF
clarity
co
un
t
cut
Fai
Good
Very Good
Premium
Ideal
Fai
G
ood
V
ery G
ood
P
em
ium
Ideal
I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
clarity
co
un
t
cut
Fai
Good
Very Good
Premium
Ideal
(b) Recreate the following plot and add a short title
0
10000
20000
XXXXXXXXXX
carat
p
ic
e
clarity
I1
SI2
SI1
VS2
VS1
VVS2
VVS1
IF
2
(c) Recreate the following plot and add a short title
0
5000
10000
15000
I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF
clarity
p
ic
e
cut
Fai
Good
Very Good
Premium
Ideal
(d) Recreate the following plot and add a short title
0
5000
10000
15000
0 1 2 3
carat
p
ic
e
cut
Fai
Good
Very Good
Premium
Ideal
(e) Recreate the following plot and add a short title (Hint: Choose binwidth = 0.1)
Fai
G
ood
V
ery G
ood
P
em
ium
Ideal
XXXXXXXXXX
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
0.8
depth
de
ns
ity
3