401077 Introduction to Biostatistics, Autumn 2018
Due Sunday April 1, 2018
Please answer all 7 questions. Record your answers in the template document provided and submit via Turnitin before 11:59pm on the due date. The marks allocated to each question are shown in the assignment. A total of 30 marks are available and this assignment is worth 30% of your overall grade.
Some of the questions require you to analyse the unique assignment data set which I have created for you. This is labelled ‘dataforxxxxxxxx.RData’ where xxxxxxxx represents your Student ID number. You can find this with your Assessment materials in vUWS. Please also locate and read ‘Description of your data set.docx’.
Note: Each student will get different answers as the data sets differ.
Question 1 (5 marks)
Consider the University of Eastern Sydney data set assigned to you.
a. Explain why course of study (course) is a categorical variable. (1 mark)
. Explain why mental well-being score (WEMWBS) is a numeric variable. (1 mark)
c. Using the University of Eastern Sydney data set assigned to you and R Commander, produce a boxplot of the relationship between course of study (course) and mental well-being score (WEMWBS). Your chart should include descriptive axis labels. (1 mark)
d. Does the chart in c. show any evidence that mental well-being (WEMWBS) score differs according to course of study? Explain why or why not. (2 mark)
Question 2 (5 marks)
a. Using the University of Eastern Sydney data set assigned to you and R Commander, tabulate the relationship between gender (sex) and the course of study (course). Your table should include descriptive labels. (1 mark)
. Using row or column percentages describe the relationship between gender and course of study. (2 marks)
c. Using conditional probabilities explain why gender and course of study are not independent. Hint: You only need to show independence in one the four courses of study. (2 marks)
Question 3 (5 marks)
a. Using the University of Eastern Sydney data set assigned to you and R Commander, draw an appropriate graph of self-reported alcohol consumption per week (alc). (Don’t forget to provide meaningful labels on your axes). (1 mark)
. Describe the shape of this distribution of self-reported alcohol consumption. Include appropriate numerical measures (statistics) of shape in your description. (2 marks)
c. Using appropriate statistics describe the centre and spread of the distribution of the students’ self-reported average alcohol consumption. You must differentiate which result(s) apply to centre and which apply to spread. (2 marks)
Question 4 (4 marks)
a. Using the University of Eastern Sydney data set assigned to you and R Commander, graph the relationship between the logarithm transformed self reported alcohol consumption (logalc) and mental wellbeing (WEMWBS) score. (1 mark)
. Describe in words the relationship between log alcohol consumption per week and mental well-being (WEMWBS) score in this data set. (3 marks)
Question 5 (3 marks)
In Ireland, Davoren et el XXXXXXXXXXreport that that 65.2% of male University students self-reported alcohol consumption levels which are classified as hazardous.
A random sample of 30 Australian male university students were interviewed about their alcohol consumption.
a. If Australian male university student do not differ from the Irish male university students, what is the probability that our random sample of 30 Australian male students will contain 15 or less students with hazardous alcohol consumption levels? (1 mark)
. If Australian male university students do not differ from the Irish male university students, we would predict 25% of all samples to contain fewer than how many students with hazardous alcohol consumption? (1 mark)
c. If Australian male university students do not differ from the Irish male university students, estimate the mean number of students with hazardous alcohol consumption per sample of size 30 males? Show any working. (1 mark)
Question 6 (4 marks)
a. If the average WEMWBS mental well-being score for University students is Normally distributed with a mean of 44 and standard deviation of 4, what is the range of WEMWBS values which contains the middle 95% of University students? (1 mark)
. Using the information about the distribution of WEMWBS scores in a. and R Commander, what percentage of University students score greater than 50 on the WEMWBS? (1 mark)
c. Suppose the mental well-being of four randomly chosen students was measured using the WEMWBS. Using the Central Limit Theorem and the information in part a. to estimate the probability that the mean score from these 4 students is greater than 60. Show any working. (2 marks)
Question 7 (4 marks)
a. A report claims that 12% of Western Sydney University students are aged less than 20 years and 16% are 20 or more years of age. Is this information sufficient for you to determine the probability that a randomly selected Western Sydney University student will be less than 20 years of age? Explain why or why not. (2 marks)
. Suppose particular measures of depression and anxiety are both Normally distributed with high scores indicating more severe disease. Suppose a student has a Z-score of -0.2 on the measure of depression and a Z-score of 0.4 on the measure of anxiety. Based on this information do you expect this student is likely to need treatment for depression, anxiety, both anxiety and depression or neither anxiety nor depression? Explain your answer. (2 marks)