Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Microsoft Word - SIT743-Assignment-1 Page 1 of 8 SIT743 Bayesian Learning and Graphical Models Assignment-1 Total Marks = 120, Weighting - 25% Due date: 26 April 2020 by 11.30 PM...

1 answer below »
Microsoft Word - SIT743-Assignment-1

Page 1 of 8
SIT743 Bayesian Learning and Graphical Models
Assignment-1
Total Marks = 120, Weighting - 25%
Due date: 26 April 2020 by 11.30 PM
---------------------------------------------------------------------------------------------------------------
INSTRUCTIONS:
• For this assignment, you need to submit the following THREE files.

1. A written document (A single pdf only) covering all of the items described in the
questions. All answers to the questions must be written in this document, i.e, not in
the other files (code files) that you will be submitting. All the relevant results
(outputs, figures) obtained by executing your R code must be included in this
document.
For questions that involve mathematical formulas, you may write the answers
manually (hand written answers), scan it to pdf and combine with your answer
document. Submit a combined single pdf of your answer document.

2. A separate “.R” file or ‘.txt’ file containing your code (R-code script) that you
implemented to produce the results. Name the file as “name-StudentID-Ass1-
Code.R" (where `name' is replaced with your name - you can use your surname or
first name, and StudentID with your student ID).
3. A data file named “name-StudentID-LzMyData.txt" (where `name' is replaced with
your name - you can use your surname or first name, and StudentID with your
student ID).
• All the documents and files should be submitted (uploaded) via SIT 743 Clouddeakin
Assignment Dropbox by the due date and time.
• Zip files are NOT accepted. All three files should be uploaded separately to the
CloudDeakin.
• E-mail or manual submissions are NOT allowed. Photos of the document are NOT
allowed.
• The questions Q2 and Q3 do not require any R programming.
=================================================================
Some of the questions in this assignment require you to use the “Lizard Island” dataset. This
dataset is given as a CSV file, named “LZIsData.csv”. You can download this from the
Assignment folder in CloudDeakin. Below is the description of this dataset.
Lizard Island dataset:
This dataset gives the weather measurements collected at Lizard Island, which is an island in
the Great Ba
ier Reef (North Queensland, Australia).
[http:
weather.aims.gov.au/#/station/1166 ].
The data gives 10 minutes sample measurements collected over a 1 month period between
May 2019 and June 2019.
The variables include the following (4 variables; in the same order of columns appear in the
file LZIsData.csv):
Page 2 of 8
Air Temperature: Air temperature in degrees Celsius.
Humidity: Humidity in percentage.
Wind Speed: Maximum Wind speed in kilometre per hou
Air Pressure: pressure measurements expressed in units of Hectopascals
Q1) [19 Marks]:
• Download the data file “LZIsData.csv” and save it to your R working directory.
• Assign the data to a matrix, e.g. using

the.data <- as.matrix(read.csv("LZIsData.csv", header = FALSE, sep = ","))
• Generate a sample of 1500 data using the following:

my.data <- the.data [sample(1: 4464,1500),c(1:4)]
Save “my.data” to a text file titled “name-StudentID-LzMyData.txt" using the
following R code (NOTE: you ‘must’ upload this data text file and the R code along
with your submission. If not, ZERO marks will be given for this whole question).

write.table(my.data,"name-StudentID-LzMyData.txt")
Use the sampled data (“my.data”) to answer the following questions.
1.1) Draw histograms for ‘Air temperature’ and ‘Air Pressure” values, and comment on
them. [2 Marks]
1.2) Draw a parallel Box plot using the two variables; ‘Air Temperature’ and the
‘Wind Speed’.
Find five number summaries of these two variables.
Use both five number summaries and the Boxplots to compare and comment on
them. [5 Marks]
1.3) Which summary statistics would you choose to summarize the center and spread
for the ‘Humidity’ data? Why (support your answer with proper plot/s)? Find
those summary statistics for the “Humidity” data.
[4 Marks]
1.4) Draw a scatterplot of ‘‘Air Temperature’ (as x) and ‘Humidity’ (as y) for the first
1000 data vectors selected from the “my.data” (name the axes).
Fit a linear regression model to the above two variables and plot the (regression)
line on the same scatter plot.
Write down the linear regression equation.
Compute the co
elation coefficient and the coefficient of Determination.
Explain what these results reveal. [8 Marks]
Page 3 of 8
Q2) [21 Marks]
2.1) The table shows results of a survey conducted about the favorite sports, in different
states over some period in 2020.
State
New south Wales
(N)
Victoria
(V)
Queensland
(Q)
S
p
o
t
s
Footy (F XXXXXXXXXX
Basketball
(B)
XXXXXXXXXX
Cricket (C XXXXXXXXXX
Suppose we select a person at random,
a) What is the probability that the person is from Victoria (V)? [1 mark]

) What is the probability that the person likes cricket (C) and from New South Wales
(N)? [1 Mark]
c) What is the probability that the person likes Footy (F) given that he/she is from
Queensland (Q)? [2 Marks]
d) What is the probability that the person, who likes Basketball (B) is from Victoria
(V)? [2 Marks]
e) What is the probability that the person is from Victoria (V) or likes cricket (C)?
[2 Marks]
f) Find the marginal distribution of sports. [3 marks]

g) Are sports and state mutually exclusive? Explain [2 Marks]

h) Are sports and state independent? Explain [3 marks]
Page 4 of 8
2.2) The weather in Victoria can be summarised as follows
If it rains one day there is a 75% chance it will rain the following day. If it is sunny one
day there is a 30% chance it will be sunny the following day. Assume that the prior
probability it rained yesterday is 0.6, what is the probability that it was sunny yesterday
given that it is rainy today? [5 Marks]
Q3) [5 Marks]
3.1) State two differences between frequentist way and the Bayesian way of estimating
a parameter [2 marks]
3.2) Why conjugate priors are useful in Bayesian statistics? [1 mark]

3.3) Give two examples of Conjugate pairs (i.e., give two pairs of distributions that
can be used for prior and likelihood) [2 marks]
Q4) Frequentist and Bayesian estimations [31 Marks]
An Artificial Intelligence solutions provider, BigSecAI Ltd. houses several computing servers
to perform computationally intensive processing, such as deep learning, on sensitive (secure)
data for customers, including government agencies. In order to provide reliable service,
BigSecAI wants to improve their monitoring and maintenance activities of their computer
servers. As part of their planning, the BigSecAI wants to model the lifetime pattern of their
servers. BigSecAI assumes that the length of time �� (in years) a computer server � lasts follows
a form of exponential distribution with an unknown parameter �, as shown below. Here, the
quantity ���� represents on average, how long a certain server last. �� ~ ���(�) ���(�) = �(��|�) = ���(���)
Assume that there are � servers used, and each of their lifetime are independently and
identically distributed (iid).
4.1) BigSecAI first decided to use a frequentist approach to a
ive at an estimate for �.
Answer the following questions.
a) Show that the joint distribution of lifetime of � servers can be given by the below
equation (show the steps clearly).
�(�|�) = �� ��(��), , , , where � = ∑ ������

[3 marks]
) Find a simplified expression for the log-likelihood function �(�) = �� (�(�|�))
[3 marks]
Page 5 of 8
c) Show that the Maximum likelihood Estimate (�� ) of the parameter � is given by:
�� = �� , "ℎ$%$ � = �� & ��

���
[4 Marks]
d) Suppose that the lifetimes of six of their servers are
{2, 7, 6, 10, 8, 3}, what is the Maximum likelihood Estimate �� (MLE) of
parameter � given this data? [2 Marks]
e) Hence, on the average, how long would 7 servers last if they are used one after
another? [2 Marks]
f) What is the probability that a server lasts between six and twelve years?

Hint: Use cumulative distribution function (cdf) of exponential distribution. The
cdf of the exponential distribution is given by '(() = 1 − $−+(�, .

[4 marks]
4.2) BigSecAI has now consulted an overseas computer hardware vendor,
HardwareExpert, which has more experience working with large servers, and obtained
some prior information about the lifetime of servers of similar capacity and processing
capabilities. The HardwareExpert mentioned that their � value follows a pattern that
can be described using a form of Gamma distribution, Gamma (a,b), where - and . are
the hyper-parameters of the Gamma distribution, with - = 0.1 and . = 0.1.
XXXXXXXXXX, 5) = 6 54�(4��)��5� , where 7 is a constant.

a) BigSecAI has decided to use this prior information from HardwareExpert for their
estimation. If it uses the Gamma distribution prior, Gamma (a,b), obtain an
expression for the posterior distribution (show all the steps).
Show that the posterior distribution is also a Gamma distribution, Gamma (a’, b’),
with different hyper-parameters -8 and .′. Express -8 and .′ in terms of 4, 5, � and �. [5 Marks]

) Use the values for a and b hyper-parameters suggested by the HardwareExpert,
and the server lifetimes that has been observed from 6 servers: {2, 7, 6, 10, 8, 3},
to find the value of -8 and .′. What is the posterior mean
Answered Same Day Apr 24, 2021 SIT743 Deakin University

Solution

Pushpendra answered on Apr 28 2021
147 Votes
Ans:
2000 500 1000
( )
3500
3900 3500 2600
0.35
P V
Total
 


 

Ans:
1400
( )
10000
0.14
P C NSW 

Ans:
1300
( | )
1300 500 800
1300
1300 500 800
0.5
TotalP F Q
Total

 

 

Ans:
500
( )
10000
0.05
P B V 

Ans:
( ) ( ) ( ) ( )
2000 500 1000 1400 1000 800 1000
10000 10000 100000
0.66
P B C P V P C P V C    
   
  

Ans: The marginal distribution of sports is:-
3900
,
10000
3500
( ) ,
10000
2600
,
10000
x N
P X x x V
x Q
 
 
 
 
   
 
 
 
 
Ans: Two events are said to be mutually exclusive, if they cannot occur at the same time.
1000
( )
10000
0
P F N 


No, they are not mutually exclusive.
Ans: Two events are said to be independent when the occu
ence of one does not depend on the
occu
ence of the o=another event, i.e. ( ) ( ) ( )P A B P A P B  
1000 4300 3900
( ) , ( ) , ( )
10000 10000 10000
( ) ( ) ( )
P F N P F P N
P F N P F P N
   
  

Therefore, they are not independent.
Ans:
( ) 0.6, ( ) 0.4
( | ) 0.75, ( | ) 0.3
P rain P not rain
P rain rained yesterday P sunny not rain
 
 

Using bayes theorem of conditional probability:
0.7 0.4
( | )
0.7 0.4 0.6 0.75
0.3835
P sunny yesterday raintoday


  

Ans: In a Bayesian framework, we model the data probabilistically as well as the parameters
that govern the distribution of the data. In a frequentist framework, the data is modeled
probabilistically, but the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here