ENG3104 Engineering Simulations and Computations Semester 2, 2018
Assessment: Assignment 2
Due: 8 October 2018 (deadline is two weeks after date in course spec)
Marks: 400
Value: 40%
1 (worth 100 marks)
Introduction
To do something useful with big data, models are devised from the large numbers of observations
in order to predict what will occur for some other observation(s). A simple linear model1 is of
the form:
yi =
N∑
j=1
xijaj (1)
where yi is the dependent variable, i is the observation number (there are a total of M observa-
tions), xij is the set of independent variables, N is the number of independent variables (for big
data, M � N), and aj are the set of model coefficients. Equation (1) lends itself to a matrix
formulation:
Y = XA (2)
The model coefficients aj are determined by measuring yi and xij . One of the dangers
of developing such a model is “over-fitting” the data. This is where aj are tuned for the M
observations so that aj is an excellent model for yi, i ≤ M , but is a poor model for yi, i > M .
Good practice is therefore to split the M observations into a “training dataset” (with M1
observations, M1 ≥ N and typically M1 � N) and a “test dataset” (with M2 observations,
M1 + M2 = M , and typically M2 < M1). The values of aj are determined from Eq. (1) using
the training dataset (with M1 observations). The values can then be validated using the test
dataset by calculating yi using Eq. (1) and calculating the e
or from the actual values ŷi.
In this question, you are going to apply this methodology to determine if it is possible to
estimate the mean pressure for the year based on temperature readings from each month. The
ideal gas law is:
p = ρRT (3)
where p is the pressure, ρ is the density, R the ideal gas constant and T the temperature.
Your computational task is to use Eq. (1)
p̄i =
12∑
j=1
Tijaj (4)
in the form of Eq. (2)
P = TA (5)
for the 9:00 readings. Here p̄i is the average pressure across all months for day i, Tij is the
temperature on day i and month j and aj is the average coefficient for month j. You will use
the entire 12 months’ worth of data (N = 12) to calculate the average pressure for each calenda
1Examples of non-linear models are:
1. having xij raised to some power other than 1
2. having xijxi(j−k) where k is some intege
3. having xij inside some function, e.g. lnxij , sinxij
1
ENG3104 Engineering Simulations and Computations Semester 2, 2018
date (M = 28 since there are only 28 days in Fe
uary), i.e. p̄1 is the average pressure calculated
using the 1st of July, 1st of August, 1st of September, etc. For your assignment, the following
value is to be used:
M2 = 2 +
2.9244
2
,
where M2 is to be rounded to the nearest integer. Because M > N (we don’t have M � N), we
will work with M2 ≤ N , which is not ideal, but is pragmatic, since it guarantees that M1 > N
to produce statistically-good estimates of aj .
Requirements
For this assessment item, you must perform hand calculations using Eq. (5):
1. Calculate a1 using only the 1st of July (i.e. M = 1, N = 1).
2. Calculate a1 and a2 using only the 1st and 2nd of both July and August (i.e. M = 2,
N = 2).
You must also produce MATLAB code which uses Eq. (5):
3. Repeats Requirements 1 and 2. Reports and verifies the results.
4. *Successfully loads all the relevant data.
5. *Repeats Requirement 2 using the loaded data. Reports and verifies the results.
6. **Reports the value of M2 before it is rounded, to confirm the values of M1 and M2 you
are to use. Calculates all the aj using the training dataset of M1 values and reports aj .
7. **Uses the test dataset of M2 values to assess the quality of the modelled values of p̄j .
8. ***The accuracy of the results is limited because the variability in the temperature and
pressure data is in the 3rd or 4th significant figure, and also because we do not have
ig data. To remedy the problem of significant figures, the data should be normalised.
The first normalisation technique to use in this circumstance is to “centre” the data in
the matrix T (subtract a constant value, sometimes the mean, from all the data), which
will make the variability in the 1st or 2nd significant figure. Use 15◦C to centre the
temperature data, produce new aj from your training dataset and test the coefficients.
See if you achieve some further numerical improvement in this case by “scaling” the data
in T (non-dimensionalising, normally by dividing by the standard deviation) so that all
the quantities are of the same order of magnitude2.
9. Discusses the results.
10. Has appropriate comments throughout.
The projected difficulty of a Requirement is indicated by the number of * at the start. All
students are expected to be able to complete Requirements which do not have an *.
2Scaling the centred data in this case may not do much, since the quantities are already of a similar order of
magnitude. If you had different types of variables in T with some much bigger than others (e.g. temperature,
pressure, the size of grains of sand), then scaling would vastly improve the outcome.
2
ENG3104 Engineering Simulations and Computations Semester 2, 2018
Assessment Criteria
Your code will be assessed using the following scheme. Note that you are marked based on how
well you perform for each category, so the co
ect answer determined in a basic way will receive
half marks and the co
ect answer determined using an excellent method/code will receive full
marks.
Quality of hand calculations 20 marks
Quality of Requirement 3 20 marks
Quality of Requirement 5 10 marks
Quality of Requirement 6 15 marks
Quality of Requirement 7 10 marks
Quality of normalisation(s) 10 marks
Quality of discussion(s) 5 marks
Quality of header(s) and comments 5 marks
Quality of code 5 marks
3
ENG3104 Engineering Simulations and Computations Semester 2, 2018
2 (worth 100 marks)
Introduction
When data is being measured, it is common for there to be data missing, which could be due
to a fault in the measuring equipment, or the variable being unmeasurable at that moment. In
the weather data for Dalby, the maximum temperature was not recorded on 29th October 2017,
presumably because not all the temperature readings were recorded for that day, so therefore
it is impossible to know whether the highest recorded temperature was actually the maximum.
Leaving unknown/unreliable readings blank is the best option, since inserting a value (such as
zero) could be a valid value, and therefore pollutes the data (this is why my prefe
ed option is to
fill an empty slot in an a
ay with NaN, since it is unlikely to have occu
ed from a calculation).
If you need to be able to use a value where there is one missing, then you need to use
some method of including an intelligent guess. In this question, you will use a global curve-fit
to provide the guess. All of you will use T3 (the temperature measured at 3:00 pm) as the
independent variable to model the maximum daily temperature, Tmax. You will also compare
the outcome for this modelling to using another variable, V , as the independent variable. Fo
your assignment, the following value is to be used:
Q2 = 1.7949 .
The independent variable (besides T3) you are to use is based on your value of Q2:
V ≡ Tmin , Q2 ≤ 5
V ≡ T9 , Q2 > 5
where T is temperature and the subscript refers to either the daily minimum or the particula
time of day.
Your task is to estimate the value of Tmax on 29th October 2017 using both T3 and V as
the independent variable.
Requirements
For this assessment item, you must perform hand calculations using Tmax and T3:
1. Take the values from 28th and 30th October 2017 and estimate the coefficients of the
three standard curve-fitting functions. These data points will provide a qualitative repre-
sentation of the overall trend.
You must also produce MATLAB code which:
2. Repeats Requirement 1 and verifies the results.
3. *Performs curve-fits of all the data for Tmax and T3. Use the MATLAB function isfinite
to filter the dataset so that only those dates with recordings of both Tmax and T3 are
included.
4. Validates the three standard curve-fitting functions obtained in Requirement 3 by com-
paring with the parameters obtained in Requirement 1. Given the limited data used in
Requirement 1 and the overall scatter in data, don’t expect the values to be very close.
5. Determines which curve-fit is the best.
6. Demonstrates that the chosen curve-fit is the best both graphically and numerically, show-
ing both the data and the relevant curve-fit.
4
ENG3104 Engineering Simulations and Computations Semester 2, 2018
7. Displays a message in the Command Window stating which type of curve-fit was chosen,
stating the parameters of the curve-fit and the result of the numerical test of the curve-fit.
8. Plots the best curve-fit along with the data in a separate figure with normal-scale axes.
9. Uses the best curve-fit to estimate Tmax for 29th October 2017.
10. *Reports the value of Q2, leading to the selection of V . Repeats Requirements 3, 8 and 9
using only a linear curve-fit with V the independent variable instead of T3. Plots the
curve-fit along with the data. Compares and discusses the two estimates for Tmax.
11. Has appropriate comments throughout.
The projected difficulty of a Requirement is indicated by the number of * at the start. All
students are expected to be able to complete Requirements which do not have an *.
Assessment Criteria
Your code will be assessed using the following scheme. Note that you are marked based on how
well you perform for each category, so the co
ect answer determined in a basic way will receive
half marks and the co
ect answer determined using an excellent method/code will receive full
marks.
Quality of hand calculations 20 marks
Quality of determination of appropriate curve-fit 30 marks
Quality of verifications/validations 10 marks
Quality of reporting of curve-fit 5 marks
Quality of plots (e.g. axis labels, titles) 5 marks
Quality of Requirement 10 10 marks
Quality of 29th October 2017 estimations 10 marks
Quality of header(s) and comments 5 marks
Quality of code 5 marks
5
ENG3104 Engineering Simulations and Computations Semester 2, 2018
3 (worth 100 marks)
Introduction
This question provides an alternative methodology to