Overleaf Example Western Sydney University Programming for Data Science (COMP7024) Assignment 2023 Q1 Due 12th of March 20231 1 Introduction This assignment consists of four questions, each of...

1 answer below »

Overleaf Example
Western Sydney University
Programming for Data Science (COMP7024)
Assignment
2023 Q1
Due 12th of March 20231
1 Introduction
This assignment consists of four questions, each of equal value, giving a total contribution of 40%
to this subject. The beginning of each question provides a
eakdown of marks for each part in
that question. For example, a
eakdown of XXXXXXXXXX = 10) implies a question consisting of three
parts, where the first, second and third parts are worth 1, 3 and 6 marks respectively.
Important
R and only packages (that is, R li
aries) described in the lectures and tutorials for this subject can
e used for generating answers for this assignment! In addition, R Markdown must not be used2.
Penalties may apply for non-conformance.
2 Answer structure
In doing this assignment, you should not seek to use the maximum word limit declared in the
Learning Guide. Note that marks are not awarded for using many words, rather, for using an
economy of words and only stating what is relevant to the question being answered. Consider the
old adage – “less can be more”. Therefore, show you know what is relevant and have mastered
the ability to get to the point using few, simple words and clear sentences. Seek to also apply this
philosophy to the code you write.
Your answers to this assignment are to be provided in a single R script file. All material in you
script file should be logically organised, so that related material can be easily and quickly located.
Clearly identify yourself in this file, as a minimum: full name and student ID, as comments at the
eginning of the file.
2.1 R script file
Textual answers should be included as comments in your script file, refer to listing 1 for examples
on including comments. The comments in your script file should be:
• Brief and to the point
• Stating a high level perspective
• Stating what is not immediately obvious, but worth mentioning
1You are welcome to submit early. Refer to “4.1 Early submission” section for further details.
2In short, only use R and R packages described in the lectures and tutorials for this subject. You are also not
allowed to use R Markdown for this assignment. All these requirements will also apply for the exam.
1
# In an R script file , comments are prefixed with the hash symbol
# A line with just a comment on it
# Generate a distribution of mean values from a sequence of digits
d <- replicate (1000 ,
{
s <- sample (0:9, replace = TRUE) # Generate a sequence of 10 numeric digits
mean(s) # A comment to end a line with R code on it
})
hist(d, main = ’Distribution of means ’) # Show distribution of means
# Of cause you would use smarter comments than those used here
# Only state what is not immediately obvious
Listing 1: Some R code with comments (shown in green)
Be
ief and to-the-point with respect to comments. The approach described in this section
should be the same as used for the exam. Make judicious use of comments a priority in
doing this assignment. After all, comments are meant to communicate important and useful
details. Make sure you also communicate well through wise choices in variable and function names.
Also make wise decisions regarding the layout of everything inside your R script file. Note if things
go wrong, good organisation and comments can help you, since they can show if appropriate logic
was intended.
3 Plagiarism
This is an individual effort assignment, therefore the answers you provide must be your own. You
may learn from others, but the understanding claimed by your assignment must be yours. If you
include any material in this assignment that is not your own, you must acknowledge that fact and
declare the source of that material. Be warned, your answers will be checked for plagiarism
and if caught, significant penalties may apply.
4 Submission
Once you have completed the assignment, you must upload your R script file via Turnitin; if you
wish, you can also e-mail your R script file directly to me3. This maybe wise if you are having
trouble with Turnitin or vUWS and are at risk of submitting late. Once you have e-mailed, seek
to successfully submit via Turnitin. Be aware that you may need to rename your R script file by
adding the extension “.txt”, otherwise you may not be successful in submitting via Turnitin.
On a Windows machine you can easily add a “.txt” extension via file explorer. Select the “View”
tab and tick “File name extensions”, refer figure 1. Then select the file to be renamed, press F2 to
enter edit mode and add “.txt” to the very end of the file name; do not remove the “.R” portion of
the file name.
Hopefully a similar process is available on other platforms. Determine the method you will use
and test it prior to submission.
You must submit your assignment no later than the due date declared on the first page of this
assignment, otherwise late submission penalties will apply, as described in the section titled “Late
submission penalties”. Prior to the due date, you may replace a previously submitted version, but
only the last submitted version will be marked!
XXXXXXXXXX
2
Figure 1: How to add “.txt” to the file extension on Windows
4.1 Early submission
Two “early submission” options are available, but you must choose which option
1. One or more submissions
2. Only one submission and priority marking
and follow the relevant instructions.
4.1.1 One or more submissions
You are free to make as many submissions as you wish, but only the last submission will be marked.
NB, if your last submission is after the due date, then late submission penalties will apply.
4.1.2 Only one submission and priority marking
You can only provide one submission and you must declare your request for this option by sending
me an email after completing your submission to turnitin. To use this option, send an email to
XXXXXXXXXX and use the subject “One submission and priority marking”.
I will endeavor to start marking submissions are they a
ive and in the order of their a
ival.
However the release of marks are subject to a caveat declared in the following section.
4.1.3 Caveat
The marking order will be determined by order of submission. However marks cannot be released
until everyone has submitted, or a minimum of one week has passed since the declared due date.
4.2 Late submission penalties
Late submission penalties exist. The contribution value of the assignment will reduce by 10% pe
day, for each day after the submission date; therefore four marks per day. For example, if you
assignment is four days late, the maximum possible mark you can score for the assignment is 24
out of 40.
3
Question XXXXXXXXXX = 10)
You are to develop a simple gambling game and test what the average outcome is if you always bet
$50.
(i)
Write the code necessary to perform a single turn of the game. The algorithm for the game is as
follows
• Randomly choose a bet that is one of the following values
10, 15, 20, 25 . . . , 90, 95, 100
• Simulate the roll of a pair of fair dice
• Determine the outcome of the roll as follows
– Any of the following results in losing your bet
11, 33, 55
– You receive twice your bet for any of the following
22, 44
– You receive five times your bet for rolling a 66
– Any other roll outcome results in losing half your bet
• Tell the user what the bet and the return values are
• If the return value is twice the bet, also print the following message on a new line
You won money!
• If the return value is five times the bet, then also print the following message on a new line
Jackpot win!!!
Make sure your code is well organised and has sensible documentation in the form of comments;
you will be expanding the capability of your code in the rest of this question. Also seek to make
wise choices regarding variable names and code layout.
(ii)
Wrap the dice simulation and return calculator you developed (i), within a function that looks like
etResult <- function(bet = 50)
{
# bet = bet to be made
#
# Simulate the roll of a pair of fair dice
.
.
.
# Determine the return from the bet
.
.
.
eturn(betReturn)
}
4
Insert within another function, the code you wrote in (i) to randomly determine a bet, make
use of the following function template
etGenerator <- function ()
{
# Randomly choose a bet within the following sequence
# {10, 15, 20, XXXXXXXXXX, 95, 100}
.
.
.
eturn(betAmount)
}
(iii)
Making use of the functions created in (ii), create the following function
playGame <- function(turns = 10)
{
# turns = number of bets to be made
.
.
.
eturn(betReturn)
}
in order to complete the entire functionality of your game as devised in (i), except using a specified
number of turns. However in this case, the function playGames() provides the following user output
• A single line of output for each iteration of the game, which looks as follows
Bet = 25, Dice outcome = 14, Winnings4 = 505
• A single line stating the final position for the player, for example, “lost $200”
(iv)
What is the overall position for the player after one hundred turns of the game, where every turn
consists of a $50 bet? The position should consist of
• Total outlay
• Total winnings
• Overall profit
4Winnings is the amount won
5Use a negative value to indicate a loss
5
Question XXXXXXXXXX = 10)
For this exercise, you are to make use of the built-in dataset called iris. The first six lines of the
dataset can be viewed as follows
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
XXXXXXXXXXsetosa
XXXXXXXXXXsetosa
XXXXXXXXXXsetosa
XXXXXXXXXXsetosa
XXXXXXXXXXsetosa
XXXXXXXXXXsetosa
More information on this dataset can be obtain within R by executing ?iris in the console or in
Wikipedia - Iris flower data set. In answering this question, only use functions provided within R
ase package, hence do not install any other package.
(i)
Using just functional programming, determine the mean Sepal.Length for each species of iris flower.
Hint, you only need a single and simple line of code. Using only one or two sentences, explain how
your code works.
(ii)
Using two different methods, repeat the exercise in (i), but without using functional programming.
Using only one or two sentences, explain how your code works.
(iii)
Only using functional programming, determine the mean for each numeric column of the iris dataset,
ut according to each species. Therefore produce the following
Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa XXXXXXXXXX.246
versicolor XXXXXXXXXX.326
virginica XXXXXXXXXX.026
Explain your code using no more than three simple sentences.
(iv)
Using the output of (iii), write code to build a tree structure6, which contains the above output
data. The tree structure is described as follows
• There are three
anches off the root and each represents a particular species
• Each species
anch
eaks into the following two
anches: Sepal and Petal
• The Sepal
anch
eaks into two
anches consisting of: Length and Width
• Similarly, the Petal
anch
eaks into two
anches consisting of: Length and Width
• The root of the tree consists of just the node, while the other end consists of 12
anches
You do not need to visualise the tree structure, just write code to create it.
6Use the most appropriate built-in R data structure to build this tree structure
6
https:
en.wikipedia.org/wiki/Iris_flower_data_set
Question XXXXXXXXXX = 10)
Here we will perform some simple analysis of data regarding the quality of different red wines. The
data is located on vUWS in the file called “wineQuality-red.csv”. Further details for this dataset
can be found at UCI - Wine Quality Data Set. The goal is not to become a wine expert, rather to
do some simple intuitive investigation.
Load the dataset and do some basic exploration and familiarization of it.
(i)
Write code to produce a single box plot that shows alcohol versus each wine quality. Give the plot
a reasonable appearance, hence having a title, axis labels and using colours. Repeat for residual
sugar versus quality and density versus quality. Using two simple sentences, which plot shows the
greatest connection and worst connection with quality?
(ii)
Using the coding method described in lecture 6, write code to reproduce the visualisation shown in
figure 2.
Figure 2: Various mean wine variables versus quality
Note that your visulaisation does not have to match exactly, in essence, just show the same
information.
7
http:
archive.ics.uci.edu/ml/datasets/Wine+Quality
(iii)
There is a built in function in R called cor(), which determines the co
elation between two variables.
More information

assignment2023q16-rishn2v2.pdf winequality-red-1-tgtzvvoi.csv

Answered 3 days After Mar 10, 2023

Solution

Santosh Vasant answered on Mar 11 2023

54 Votes

SOLUTION.PDF

Overleaf Example Western Sydney University Programming for Data Science (COMP7024) Assignment 2023 Q1 Due 12th of March 20231 1 Introduction This assignment consists of four questions, each of...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment