Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Master of Business Analytics assignment. Involves Coding in r programme

1 answer below »
Western Sydney University
The Nature of Data (MATH 7016)
Assignment
2022 Q2
Due 11:59pm Sunday 22nd of May 2022
Introduction
This assignment consists of four questions, each of equal value, giving a total contribution of 40%
to this subject. The beginning of each question provides a
eakdown of marks for each part in
that question. For example, a
eakdown of XXXXXXXXXX = 10) implies a question consisting of three
parts, where the first, second and third parts are worth 1, 3 and 6 marks respectively.
Important
R and only packages (that is, R li
aries) described in the lectures and tutorials for this subject can
e used for generating answers for this assignment! In addition, RMarkdown must not be used1.
Penalties may apply for non-conformance.
Answer structure
In doing this assignment, you should not need to use the maximum word limit declared in the
Learning Guide. Note that marks are not awarded for using many words, rather, for using an
economy of words and only stating what is relevant to the question being answered. Consider the
old adage – “less can be more”. Therefore, show you know what is relevant and have mastered
the ability to get to the point using few, simple words and clear sentences. Seek to also apply this
philosophy to the code you write.
Your answers to this assignment are to be provided in a single R script file. All material in you
script file should be logically organised, so that related material can be easily and quickly located.
Clearly identify yourself in this file, as a minimum: full name and student ID, as comments at the
eginning of the file.
R script file
Textual answers should be included as comments in your script file, refer to listing 1 for examples
on including comments. The comments in your script file should be:
• Brief and to the point
• Stating a high level perspective
• Stating what is not immediately obvious, but worth mentioning
1In short, only use R and R packages described in the lectures and tutorials for this subject. You are also not
allowed to use RMarkdown for this assignment. All these requirements will also apply for the exam.
1
# In an R script file , comments are prefixed with the hash symbol
# A line with just a comment on it
# Generate a distribution of mean values from a sequence of digits
d <- replicate (1000 ,
{
s <- sample (0:9, replace = TRUE) # Generate a sequence of 10 numeric digits
mean(s) # A comment to end a line with R code on it
})
hist(d, main = ’Distribution of means ’) # Show distribution of means
# Of cause you would use smarter comments than those used here
# Only state what is not immediately obvious
Listing 1: Some R code with comments (shown in green)
Be
ief and to-the-point with respect to comments. The approach described in this section
should be the same as used for the exam. Make judicious use of comments a priority in
doing this assignment. After all, comments are meant to communicate important and useful
details. Make sure you also communicate well through wise choices in variable and function names.
Also make wise decisions regarding the layout of everything inside your R script file. Note if things
go wrong, good organisation and comments can help you, since they can show if appropriate logic
was intended.
Plagiarism
This is an individual effort assignment, therefore the answers you provide must be your own. You
may learn from others, but the understanding claimed by your assignment must be yours. If you
include any material in this assignment that is not your own, you must acknowledge that fact and
declare the source of that material. Be warned, your answers will be checked for plagiarism
and if caught, significant penalties may apply.
Submission
Once you have completed the assignment, you must upload your R script file via Turnitin; if you
wish, you can also e-mail your R script file directly to me2. This maybe wise if you are having
trouble with Turnitin or vUWS and are at risk of submitting late. Once you have e-mailed, seek
to successfully submit via Turnitin. Be aware that you may need to rename your R script file by
adding the extension “.txt”, otherwise you may not be successful in submitting via Turnitin.
On a Windows machine you can easily add a “.txt” extension via file explorer. Select the “View”
tab and tick “File name extensions”, refer figure 1. Then select the file to be renamed, press F2 to
enter edit mode and add “.txt” to the very end of the file name; do not remove the “.R” portion of
the file name.
Hopefully a similar process is available on other platforms. Determine the method you will use
and test it prior to submission.
You must submit your assignment no later than the due date declared on the first page of this
assignment, otherwise late submission penalties will apply, as described in the section titled “Late
submission penalties”. Prior to the due date, you may replace a previously submitted version, but
only the last submitted version will be marked!
XXXXXXXXXX
2
Figure 1: How to add “.txt” to the file extension on Windows
Late submission penalties
Late submission penalties exist. The contribution value of the assignment will reduce by 10% pe
day, for each day after the submission date; therefore four marks per day. For example, if you
assignment is four days late, the maximum possible mark you can score for the assignment is 24
out of 40.
3
Question XXXXXXXXXX2 = 10)
Are the distributions the same?
Table 1 contains the distribution of students across two faculties and levels of degrees. The table
contains the number of students successfully completing a degree in the specified faculty. Does a
statistically significant difference exist between the faculties?
Bachelor Masters Doctorate
Engineering XXXXXXXXXX
Science XXXXXXXXXX
Table 1: Distribution of degrees awarded
(i)
Using R code, load the above data within an appropriate data structure(s). Include labels in you
data structure(s). Include
ief descriptions regarding the key parts of your code as comments
within your R code.
(ii)
Using R, produce what you believe is the most useful visualisation that shows how the distribu-
tions of faculty and degrees vary. Make the visualisation worthy of inclusion in a report. Briefly
describe any key observations seen in the visualisation and make a prediction whether a statistically
significant difference exists.
(iii)
Perform a hypothesis test in order to determine whether a statistically significant difference exists
etween the distributions. You are free to use the simplest method available, as used in lectures o
tutorials. But make sure you include the following:
• The Null and Alternative Hypotheses used
• Any assumptions, or important details / parameters used
• Declare the result of the hypothesis test
• What does the hypothesis test result mean with respect to the distributions
(iv)
Repeat (iii) in its entirety, except for the following: develop your own code from scratch, hence
make use of the R replicate() function and other basic functions as used in lectures or tutorials.
(v)
Finally, compare and contrast the results from (iii) and (iv) and
iefly comment on what you found.
4
Question XXXXXXXXXX = 10)
Is there a statistically significant difference?
The provided dataset, contained in the file called “question2 dataset.csv”, reports on the perfor-
mance of two new drugs. The dataset contains two columns labeled Val and Grp. Val is a measure
of the drug’s performance, assume performance / effectiveness is proportional to the magnitude of
Val. Consider Grp to simply state the particular drug trialled.
(i)
Produce a single but useful visualisation of the dataset and
iefly interpret what you see. Make
the visualisation worthy of a report. Briefly state why you chose that visualisation.
(ii)
Perform a hypothesis test using the most appropriate statistical method. Note that you are free to
use a single line of code, as used in lectures or tutorials. Make sure you clearly include the following:
• Given what you saw in (i), state how you will compare the drugs and why
• The Null and Alternative Hypotheses used
• Any assumptions, or important details / parameters used
• Declare the results and interpretation of the results of the hypothesis test
(iii)
Repeat (ii) in entirety, developing R code from scratch, but using a permutation test approach, as
used in lectures or tutorials.
(iv)
Briefly compare and contrast the results of (ii) and (iii).
5
Question XXXXXXXXXX = 10)
Predicting demand
A particular help desk, always has seventeen operators on duty. On average only fourteen operators
are simultaneously busy helping customers.
(i)
What is the probability that all operators will be simultaneously busy? Briefly explain the approach
used.
(ii)
What is the probability that one or more callers will have to wait for an operator to become
available? Briefly explain the approach used.
(iii)
Draw a nicely presented plot showing operator demand for zero to twenty-five operators. Make this
plot appropriate for a report to management.
Estimating probability
Imagine an exam that consists of seven multiple choice questions and each question has five possible
answers, but only one is co
ect. Also imagine that answers to the questions are to be randomly
selected.
(iv)
Write R code to determine the theoretic probability of getting five or more questions co
ect by
luck. Make sure you
iefly include code comment(s) stating the logic of how you calculated the
answer.
6
Question XXXXXXXXXX = 10)
Interval estimation
Using a 90% confidence interval approach, what is the fewest and largest number of heads expected
when a biased coin (75% probably of heads) is tossed 50 times?
(i)
Using a simulation method as shown in the lectures and tutorials of this subject, generate a distri-
ution suitable for determining a confidence interval; use code you developed from scratch3. Briefly
explain the key steps involved as simple comments within your source code.
(ii)
For the distribution obtained above in (i):-
• Determine the mean number of heads
• Determine the confidence interval
(iii)
Produce an appropriate plot showing the distribution obtained in (i) and include the details obtained
in (ii). Make the plot worthy of a report.
3From scratch means making use of R replicate() and associated functions.
7
Answered 2 days After May 19, 2022

Solution

Mohd answered on May 21 2022
108 Votes
-
-
-
5/19/2022
li
ary(readr)
li
ary(magrittr)
li
ary(dplyr)
li
ary(ggplot2)
li
ary(rmarkdown)
li
ary(tidyr)
Bachelor Masters Doctorate Engineering 360 141 14 Science 616 309 32
mytab <- matrix(c(360,616, 141,309, 14,32), ncol=3, byrow=FALSE)
colnames(mytab) <- c('Bachelor','Masters','Doctorate')
ownames(mytab) <- c('Engineering','Science')
mytab <- as.table(mytab)
Null and Alternative Hypotheses used Null Hypothesis: There is no association between between the faculties and degrees. Alternative Hypothesis: There is a association between the faculties and degrees. As we can see from Chi square test output, t(2)=4.6062 and P value>0.05. we will accept the null hypothesis and reject the alternative hypothesis. hence there is no association between the faculties and degrees at five percent significance level. if we increase significance lvel to 10 percent, then there is a association between the faculties and degrees | P value<0.1.
chisq.test(mytab)
##
## Pearson's Chi-squared test
##
## data: myta
## X-squared = 4.6062, df = 2, p-value = 0.09995
li
ary(readr)
question2dataset <- read_csv("New folder (3)/question2dataset.csv")
View(question2dataset)
ggplot(data=question2dataset)+
geom_boxplot(mapping = aes(x=Grp,y=Val),outlier.colour = "red",
outlier.shape = 2,
outlier.size = 3)+
geom_point(mapping = aes(x=Grp,y=Val))
hist(question2dataset$Val,main="Histogram of values")
t.test(Val~Grp, data=question2dataset)
##
## Welch Two Sample t-test
##
## data: Val by Grp
## t = -1.9016, df = 47.286, p-value = 0.06333
## alternative hypothesis: true difference in means between group a and group b is not equal to 0
## 95 percent confidence interval:
## -4.2206415 0.1185016
## sample estimates:
## mean in group a mean in group b
## 100.3086 102.3596
mean(question2dataset$Val[question2dataset$Grp == "a"])
## [1] 100.3086
mean(question2dataset$Val[question2dataset$Grp == "b"])
## [1] 102.3596
test.stat1 <- abs(mean(question2dataset$Val[question2dataset$Grp == "a"]) -
mean(question2dataset$Val[question2dataset$Grp == "b"]))
test.stat1
## [1] 2.05107
median(question2dataset$Val[question2dataset$Grp == "a"])
## [1] 100.2253
median(question2dataset$Val[question2dataset$Grp == "b"])
## [1] 102.1962
test.stat2 <- abs(median(question2dataset$Val[question2dataset$Grp == "a"]) -median(question2dataset$Val[question2dataset$Grp == "b"]))
test.stat2
## [1] 1.970877
Permutation Test
set.seed(1979)
n <- length(question2dataset$Grp)
P <- 100000
variable <- question2dataset$Val
PermSamples <- matrix(0,...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here