Microsoft Word - SIT743-Assignment-2
Page 1 of 8
SIT743 Bayesian Learning and Graphical Models
Assignment-2
Total Marks = 100, Weighting - 40%
Due date: 24th May 2020 by 11.30 PM
---------------------------------------------------------------------------------------------------------------
INSTRUCTIONS:
• For this assignment, you need to submit the following TWO files.
1. A written document (A single pdf only) covering all of the items described in the
questions. All answers to the questions must be written in this document, i.e, not in
the other files (code files) that you will be submitting. All the relevant results
(outputs, figures) obtained by executing your R code must be included in this
document.
For questions that involve mathematical formulas, you may write the answers
manually (hand written answers), scan it to pdf and combine with your answer
document. Submit a combined single pdf of your answer document.
2. A separate “.R” file or ‘.txt’ file containing your code (R-code script) that you
implemented to produce the results. Name the file as “name-StudentID-Ass2-
Code.R" (where `name' is replaced with your name - you can use your surname or
first name, and StudentID with your student ID).
• All the documents and files should be submitted (uploaded) via SIT 743 Clouddeakin
Assignment Dropbox by the due date and time.
• Zip files are NOT accepted. All two files should be uploaded separately to the
CloudDeakin.
• E-mail or manual submissions are NOT allowed. Photos of the document are NOT
allowed.
=================================================================
Assignment tasks
Q1) [31 Marks]
Weather conditions influence the production of good quality coffee in a region. A list
of factors that influence the coffee cultivation, along with their possible values, and a
Bayesian network that represents the relationship between these factors (variables) are
given below.
M (Maximum Temperature) ∈ { < 20, 20-30, 30-40, > 40 }
N (Minimum Temperature) ∈ { < 0, 0-10, 10-20, > 20}
W (Wind speed) ∈ {Low, Medium, High}
H (Relative humidity) ∈ {< 50, 50-60, > 60}
R (Precipitation) ∈ {Low, High}
Page 2 of 8
S (Solar radiation) ∈ {Low, Medium, High}
1.1) Write down the joint distribution ���, �, �, ,
, �� for the above
network.
1.2) Find the minimum number of parameters required to fully specify the
distribution according to the above network.
1.3)
a) Write down a joint probability density function if there are no
independence among the variables is assumed.
) How many parameters are required, at a minimum, if there are no
independencies among the variables is assumed?
c) Compare with the result of the above question (Q1.2) and comment.
1.4) d-separation method can be used to find two sets of independent or
conditionally independent variables in a Bayesian network. For each of the
statements given below from (a) to (c), perform the following:
• List all the possible paths from the first (set of) node/s to the second (set
of) node/s.
• State if each of those paths is blocking or non-blocking with reasons.
• Hence, mention if the statement is true or false.
a) ⊥ S | ∅ (M is marginally independent of S)
) � ⊥ R | �N, H} (W is conditionally independent of R given {N, H})
c) �
, �� ⊥ W | H
Page 3 of 8
1.5) Write a R-Program to produce the above Bayesian network, and perform the
d-separation tests for all of the above cases mentioned in Q1.4 (a) to (c). Show
the plot of the network you obtained and the output (of d-separation test)
from your program.
1.6)
a) Show the step by step process to perform variable elimination to
compute ��� | � � ���, � � ����. Use the following variable ordering
for the elimination process:
N, H, M.
) What is the treewidth of the network, given the above elimination ordering?
[Marks XXXXXXXXXX = 31]
Q2) [16 Marks] Implementing a Bayesian network in R and performing inference
A belief network models the relation between the variables A, B, C, D and E, which
epresents the season, river flow rate, fish species, color and size respectively. Each
variable takes different states as given below.
� � !" #$� ∈ �%!&, '()�
* �(+,!( -.#% ("&!� ∈ �.#%, ℎ+0ℎ�
1 �-+ ℎ 2!3+! � ∈ �4" , 5#'�
6 �3#.#7(� ∈ �.+0ℎ&, 8!'+78, '"(9�
: � +;!� ∈ �%+'!, &ℎ+$�
The belief network that models these variables has (probability) tables as shown below.
Page 4 of 8
2.1) Use the below li
aries in R to create this belief network in R along with the
probability values, as shown in the above table.
You may use the following li
aries for this:
#https:
www.bioconductor.org/install/
#BiocManager::install(c("gRain", "RBGL", "gRbase"))
#BiocManager::install(c("Rgraphviz"))
li
ary("Rgraphviz")
li
ary(RBGL)
li
ary(gRbase)
li
ary(gRain)
#define the appropriate network and use the
“compileCPT()”function to Compile list of conditional
probability tables, and create the network.
a) Show the obtained belief network for this distribution
) Show the probability tables obtained from the R output, (and verify with
the above table).
2.2) Use R program to compute the following probabilities:
a) Given that the river flow rate is low, what is the probability that size is thin?
) Given that the colour is dark and the season is dry, what is the probability
that the fish species is Cod?
c) Find the joint distribution of colour and fish species.
d) Find the marginal distribution of fish species.
[Marks: XXXXXXXXXX) = 16]
Q3) [15 Marks]
Consider four binary variables A, B, C, D. The Directed Acyclic Graph (DAG) shown
elow describes the relationship between these variables along with their conditional
probability tables (CPT).
Page 5 of 8
3.1) In the above network, state why A is independent of B with reasons, i.e., A⊥B.
3.2) Hence, obtain an expression (in a simplified form) for ��6 � >|� � >, * � >� in
XXXXXXXXXXterms of ? only.
3.3) The table shown below provides 20 simulated data obtained for the above Bayesian
network. Use this data to find the maximum likelihood estimates of @, ?, A and
B.
3.4) Find the value of ��6 � >|� � >, * � >� using the values obtained for ? from
the above question Q3.3.
[Marks XXXXXXXXXX = 15]
Page 6 of 8
Q4) Bayesian Structure Learning [27 Marks]
For this question, you will be using a dataset, called “hailfinder” available from the
‘bnlearn’ R package. which contains 56 variables. This has meteorological data.
Use the following R code to load the hailfinder dataset:
li
ary (bnlearn)
# load the data.
data(hailfinder)
summary(hailfinder)
The true network structure of this dataset can be viewed (plot) using the following R
code.
li
ary(bnlearn)
# create and plot the network structure.
modelstring = paste0("[N07muVerMo][SubjVertMo][QGVertMotion][SatContMoist][RaoContMoist]",
"[VISCloudCov][IRCloudCover][AMInstabMt][WndHodograph][MorningBound][LoLevMoistAd][Date]",
"[MorningCIN][LIfr12ZDENSd][AMDewptCalPl][LatestCIN][LLIW]",
"[CombVerMo|N07muVerMo:SubjVertMo:QGVertMotion][CombMoisture|SatContMoist:RaoContMoist]",
"[CombClouds|VISCloudCov:IRCloudCover][Scenario|Date][CurPropConv|LatestCIN:LLIW]",
"[AreaMesoALS|CombVerMo][ScenRelAMCIN|Scenario][ScenRelAMIns|Scenario][ScenRel34|Scenario]",
"[ScnRelPlFcst|Scenario][Dewpoints|Scenario][LowLLapse|Scenario][MeanRH|Scenario]",
"[MidLLapse|Scenario][MvmtFeatures|Scenario][RHRatio|Scenario][SfcWndShfDis|Scenario]",
"[SynForcng|Scenario][TempDis|Scenario][WindAloft|Scenario][WindFieldMt|Scenario]",
"[WindFieldPln|Scenario][AreaMoDryAir|AreaMesoALS:CombMoisture]",
"[AMCINInScen|ScenRelAMCIN:MorningCIN][AMInsWliScen|ScenRelAMIns:LIfr12ZDENSd:AMDewptCalPl]",
"[CldShadeOth|AreaMesoALS:AreaMoDryAir:CombClouds][InsInMt|CldShadeOth:AMInstabMt]",
"[OutflowFrMt|InsInMt:WndHodograph][CldShadeConv|InsInMt:WndHodograph][MountainFcst|InsInMt]",
"[Boundaries|WndHodograph:OutflowFrMt:MorningBound][N34StarFcst|ScenRel34:PlainsFcst]",
"[CompPlFcst|AreaMesoALS:CldShadeOth:Boundaries:CldShadeConv][CapChange|CompPlFcst]",
"[InsChange|CompPlFcst:LoLevMoistAd][CapInScen|CapChange:AMCINInScen]",
"[InsSclInScen|InsChange:AMInsWliScen][R5Fcst|MountainFcst:N34StarFcst]",
"[PlainsFcst|CapInScen:InsSclInScen:CurPropConv:ScnRelPlFcst]")
dag = model2network(modelstring)
par(mfrow = c(1,1))
#BiocManager::install(c("Rgraphviz"))
graphviz.plot(dag)
Page 7 of 8
Use R programming, as appropriate, to answers the following questions.
4.1) Use the hailfinder dataset to learn Bayesian network structures using hill-
climbing (hc) algorithm, utilizing two different scoring methods, namely
Bayesian Information Criterion score (BIC score) and the Bayesian Dirichlet
equivalent (Bde score), for each of the following sample sizes of the data:
a) 100 (first 100 data)
XXXXXXXXXXfirst 1000 data)
c XXXXXXXXXXfirst 10000 data)
For each of the above cases,
• provide