Faculty of Science, Applied Science & Engineering
Department of Mathematics and Statistics
ASSIGNMENT 5
STAT 4203: INTRODUCTION TO MULTIVIARTE STATISTICAL ANALYSIS
WINTER • 2023
Due Date: Wednesday, March 1st @ 12:30 pm
Total Marks: 48
Part I: Questions requiring detailed solutions
Instructions: A paper copy with solutions to these questions are due at 12:30 pm, in class,
on March 1st, XXXXXXXXXXShow all of your work.
Part I: Question 1
Marks: 9
Let XXXXXXXXXX, , andX X X X be independent and identically distributed random vectors with
2
1
0
=
μ and
1 0 2
0 5 4
2 4 3
=
Σ .
a. Find the mean vector and covariance matrix for the linear combination
1 2 3 4
1 1 1 1
4 4 4 4
+ + +X X X X . (3)
. Find the mean vector and covariance matrix for the linear combination
1 2 3 4− + −X X X X . (3)
c. Obtain the covariance between the two linear combinations of random vectors in
part a. and b. (3)
Part I: Question 2
Marks: 19
Let 1 2 60, ,X X X be a random sample of size 60 from a bivariate normal distribution having
mean μ and covariance matrix Σ . Completely specify each of the following:
a. The distribution of X . (3)
2
. The distribution of XXXXXXXXXX
−− −X μ Σ X μ . (2)
c. The distribution of XXXXXXXXXX −− −X μ Σ X μ . (2)
d. The approximate distribution of XXXXXXXXXX −− −X μ S X μ . (2)
e. The distribution of 59S . (1)
f. The distribution of (59 ) A S A , where
2 1
1 1
=
A .
(Note: You need NOT simplify/expand your solution.) (2)
g. The distribution of 60X . (3)
h. The distribution of XXXXXXXXXXX X . (4)
Part II: Questions requiring the use of R
The R code solutions to these questions are to be submitted as a .txt document on the
due date through D2L.
When you are ready to submit your assignment in D2L:
1. Go to Assessments -> Assignments (at top of page).
2. Click “Assignment #5 R Code”
3. Go to “Add a file”. Upload your assignment. Multiple files will not be accepted.
Note: Your R script will be pasted into R and run. In order to earn full marks, the output
must be co
ect and your R script must be properly commented.
Part II : Question 1
Marks: 20
a. Import the file AP.dat (provided in D2L) into R. Call your data AP and add the
column names Wind, Solar Radiation, CO, NO, NO_2, O_3 and HC to columns 1
through 7 respectively.
This data comes from Table 1.5 on p. 39 in your textbook. The description is as
follows: “The data in Table 1.5 are 42 measurements on air-pollution variables
ecorded at 12:00 noon in the Los Angeles area on different days.”
. Construct QQ plots for each of the 7 variables and, for each, test for univariate
normality using shapiro.test()as we did in class. Based on the plots, do you
think we can assume multivariate normality? Explain. (5)
c. Repeat part b. but plot the log transformed data (use log()) and calculate the
p -values for the transformed data. What do you notice? (5)
3
d. Examine the pairs 5log( )X and 6log( )X for bivariate normality by calculating and
displaying the squared statistical distances XXXXXXXXXX , 1,2, ,42,j j j−
− − =x x S x x and
then creating an appropriate QQ plot to assess the distribution of the squared
distances. (5)
e. Refer to part c. and determine the proportion of observations
5[log( )jX , 6log( )]jX
that fall within the approximate 50% probability contour of a bivariate normal
distribution. (3)
f. Based on your findings in part b.-d., do you think we can reasonably assume that
the random vector with components 5log( )X and 6log( )X is bivariate normally
distributed? Explain. (2)
II
3S Chapter 1 Aspects of Multivariate Analysis
.1.2. A morning newspaper lists the following used-car prices for a foreign compact with age
XI measured in years and selling price X2 measured in thousands of dollars:
XXXXXXXXXX
XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
(a) Construct a scatter plot of the data and marginal dot diagrams.
(b) Infer the sign of the sampkcovariance sl2 from the scatter plot.
( c) Compute the sample means X I and X2 and the sample variartces SI I and S22' Com-
pute the sample covariance SI2 and the sample co
elation coefficient '12' Interpret
these quantities.
(d) Display the sample mean a
ay i, the sample variance-covariance a
ay Sn, and the
sample co
elation a
ay R using (I-8).
1.3. The following are five measurements on the variables Xl' X2, and X3:
XI XXXXXXXXXX
X XXXXXXXXXX
X XXXXXXXXXX
Find the a
ays i, Sn, and R.
1.4. The world's 10 largest companies yield the following data:
The World's 10 Largest Companiesl
Company
Citigroup
General Electric
American Int! Group
Bank of America
HSBCGroup
ExxonMobil
Royal Dutch/Shell
BP
INGGroup
Toyota Motor
Xl = sales
(billions)
108.28
152.36
95.04
65.45
62.97
263.99
265.19
285.06
92.01
165.68
X2 = profits
(billions)
17.05
16.59
10.91
14.14
9.52
25.33
18.54
15.73
8.10
11.13
X3 = assets
(billions)
1,484.10
750.33
766.42
1,110.46
1,031.29
195.26
193.83
191.11
1,175.16
211.15
IFrom www.Fo
es.compartiallybasedonFo
esTheFo
esGlobaI2000,
April 18,2005.
(a) Plot the scatter diagram and marginal dot diagrams for variables Xl and X2' Com-
ment on the appearance of the diagrams.
(b) Compute Xl> X2, su, S22, S12, and '12' Interpret '12'
1.5. Use the data in Exercise 1.4.
(a) Plot the scatter diagrams and dot diagrams for (X2, X3) and (x], X3)' Comment on
thepattems.
(b) Compute the i, Sn, and R a
ays for (XI' X2, X3).
Exercises 39
1.6. The data in Table 1.5 are 42 measurements on air-pollution variables recorded at 12:00
noon in the Los Angeles area on different days. (See also the air-pollution data on the
web at www.prenhall.com/statistics. )
(a) Plot the marginal dot diagrams for all the variables.
(b) Construct the i, Sn, and R a
ays, and interpret the entries in R.
Table 1.5 Air-Pollution Data
Solar
Wind (Xl) radiation (X2) CO (X3) NO (X4) N02 (xs) 0 3 (X6) HC(X7)
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
, 3
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
Source: Data courtesy of Professor O. C. Tiao.