Western Sydney UniversityThe Nature of Data (MATH 7016)Assignment2022 Q2Due 11:59pm Sunday 22nd of...

Question

Western Sydney UniversityThe Nature of Data (MATH 7016)Assignment2022 Q2Due 11:59pm Sunday 22nd of May 2022IntroductionThis assignment consists of four questions, each of equal value, giving a total contribution of 40%to this subject. The beginning of each question provides a eakdown of marks for each part inthat question. For example, a eakdown of XXXXXXXXXX = 10) implies a question consisting of threeparts, where the first, second and third parts are worth 1, 3 and 6 marks respectively.ImportantR and only packages (that is, R liaries) described in the lectures and tutorials for this subject cane used for generating answers for this assignment! In addition, RMarkdown must not be used1.Penalties may apply for non-conformance.Answer structureIn doing this assignment, you should not need to use the maximum word limit declared in theLearning Guide. Note that marks are not awarded for using many words, rather, for using aneconomy of words and only stating what is relevant to the question being answered. Consider theold adage – “less can be more”. Therefore, show you know what is relevant and have masteredthe ability to get to the point using few, simple words and clear sentences. Seek to also apply thisphilosophy to the code you write.Your answers to this assignment are to be provided in a single R script file. All material in youscript file should be logically organised, so that related material can be easily and quickly located.Clearly identify yourself in this file, as a minimum: full name and student ID, as comments at theeginning of the file.R script fileTextual answers should be included as comments in your script file, refer to listing 1 for exampleson including comments. The comments in your script file should be:• Brief and to the point• Stating a high level perspective• Stating what is not immediately obvious, but worth mentioning1In short, only use R and R packages described in the lectures and tutorials for this subject. You are also notallowed to use RMarkdown for this assignment. All these requirements will also apply for the exam.1# In an R script file , comments are prefixed with the hash symbol# A line with just a comment on it# Generate a distribution of mean values from a sequence of digitsd {s mean(s) # A comment to end a line with R code on it})hist(d, main = ’Distribution of means ’) # Show distribution of means# Of cause you would use smarter comments than those used here# Only state what is not immediately obviousListing 1: Some R code with comments (shown in green)Be ief and to-the-point with respect to comments. The approach described in this sectionshould be the same as used for the exam. Make judicious use of comments a priority indoing this assignment. After all, comments are meant to communicate important and usefuldetails. Make sure you also communicate well through wise choices in variable and function names.Also make wise decisions regarding the layout of everything inside your R script file. Note if thingsgo wrong, good organisation and comments can help you, since they can show if appropriate logicwas intended.PlagiarismThis is an individual effort assignment, therefore the answers you provide must be your own. Youmay learn from others, but the understanding claimed by your assignment must be yours. If youinclude any material in this assignment that is not your own, you must acknowledge that fact anddeclare the source of that material. Be warned, your answers will be checked for plagiarismand if caught, significant penalties may apply.SubmissionOnce you have completed the assignment, you must upload your R script file via Turnitin; if youwish, you can also e-mail your R script file directly to me2. This maybe wise if you are havingtrouble with Turnitin or vUWS and are at risk of submitting late. Once you have e-mailed, seekto successfully submit via Turnitin. Be aware that you may need to rename your R script file byadding the extension “.txt”, otherwise you may not be successful in submitting via Turnitin.On a Windows machine you can easily add a “.txt” extension via file explorer. Select the “View”tab and tick “File name extensions”, refer figure 1. Then select the file to be renamed, press F2 toenter edit mode and add “.txt” to the very end of the file name; do not remove the “.R” portion ofthe file name.Hopefully a similar process is available on other platforms. Determine the method you will useand test it prior to submission.You must submit your assignment no later than the due date declared on the first page of thisassignment, otherwise late submission penalties will apply, as described in the section titled “Latesubmission penalties”. Prior to the due date, you may replace a previously submitted version, butonly the last submitted version will be marked! XXXXXXXXXX2Figure 1: How to add “.txt” to the file extension on WindowsLate submission penaltiesLate submission penalties exist. The contribution value of the assignment will reduce by 10% peday, for each day after the submission date; therefore four marks per day. For example, if youassignment is four days late, the maximum possible mark you can score for the assignment is 24out of 40.3Question XXXXXXXXXX2 = 10)Are the distributions the same?Table 1 contains the distribution of students across two faculties and levels of degrees. The tablecontains the number of students successfully completing a degree in the specified faculty. Does astatistically significant difference exist between the faculties?Bachelor Masters DoctorateEngineering XXXXXXXXXXScience XXXXXXXXXXTable 1: Distribution of degrees awarded(i)Using R code, load the above data within an appropriate data structure(s). Include labels in youdata structure(s). Include ief descriptions regarding the key parts of your code as commentswithin your R code.(ii)Using R, produce what you believe is the most useful visualisation that shows how the distribu-tions of faculty and degrees vary. Make the visualisation worthy of inclusion in a report. Brieflydescribe any key observations seen in the visualisation and make a prediction whether a statisticallysignificant difference exists.(iii)Perform a hypothesis test in order to determine whether a statistically significant difference existsetween the distributions. You are free to use the simplest method available, as used in lectures otutorials. But make sure you include the following:• The Null and Alternative Hypotheses used• Any assumptions, or important details / parameters used• Declare the result of the hypothesis test• What does the hypothesis test result mean with respect to the distributions(iv)Repeat (iii) in its entirety, except for the following: develop your own code from scratch, hencemake use of the R replicate() function and other basic functions as used in lectures or tutorials.(v)Finally, compare and contrast the results from (iii) and (iv) and iefly comment on what you found.4Question XXXXXXXXXX = 10)Is there a statistically significant difference?The provided dataset, contained in the file called “question2 dataset.csv”, reports on the perfor-mance of two new drugs. The dataset contains two columns labeled Val and Grp. Val is a measureof the drug’s performance, assume performance / effectiveness is proportional to the magnitude ofVal. Consider Grp to simply state the particular drug trialled.(i)Produce a single but useful visualisation of the dataset and iefly interpret what you see. Makethe visualisation worthy of a report. Briefly state why you chose that visualisation.(ii)Perform a hypothesis test using the most appropriate statistical method. Note that you are free touse a single line of code, as used in lectures or tutorials. Make sure you clearly include the following:• Given what you saw in (i), state how you will compare the drugs and why• The Null and Alternative Hypotheses used• Any assumptions, or important details / parameters used• Declare the results and interpretation of the results of the hypothesis test(iii)Repeat (ii) in entirety, developing R code from scratch, but using a permutation test approach, asused in lectures or tutorials.(iv)Briefly compare and contrast the results of (ii) and (iii).5Question XXXXXXXXXX = 10)Predicting demandA particular help desk, always has seventeen operators on duty. On average only fourteen operatorsare simultaneously busy helping customers.(i)What is the probability that all operators will be simultaneously busy? Briefly explain the approachused.(ii)What is the probability that one or more callers will have to wait for an operator to becomeavailable? Briefly explain the approach used.(iii)Draw a nicely presented plot showing operator demand for zero to twenty-five operators. Make thisplot appropriate for a report to management.Estimating probabilityImagine an exam that consists of seven multiple choice questions and each question has five possibleanswers, but only one is coect. Also imagine that answers to the questions are to be randomlyselected.(iv)Write R code to determine the theoretic probability of getting five or more questions coect byluck. Make sure you iefly include code comment(s) stating the logic of how you calculated theanswer.6Question XXXXXXXXXX = 10)Interval estimationUsing a 90% confidence interval approach, what is the fewest and largest number of heads expectedwhen a biased coin (75% probably of heads) is tossed 50 times?(i)Using a simulation method as shown in the lectures and tutorials of this subject, generate a distri-ution suitable for determining a confidence interval; use code you developed from scratch3. Brieflyexplain the key steps involved as simple comments within your source code.(ii)For the distribution obtained above in (i):-• Determine the mean number of heads• Determine the confidence interval(iii)Produce an appropriate plot showing the distribution obtained in (i) and include the details obtainedin (ii). Make the plot worthy of a report.3From scratch means making use of R replicate() and associated functions.7

Mohd · Accepted Answer

-
-
-
5/19/2022
library(readr)
library(magrittr)
library(dplyr)
library(ggplot2)
library(rmarkdown)
library(tidyr)
Bachelor Masters Doctorate Engineering 360 141 14 Science 616 309 32
mytab 0.05. we will accept the null hypothesis and reject the alternative hypothesis. hence there is no association between the faculties and degrees at five percent significance level.

Master of Business Analytics assignment. Involves Coding in r programme

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment