In this assignment there is some parts make sure you do it in parts.

Question

Harshal · Accepted Answer

**ALERT: PLEASE, READ THE “CW2_Brief.docx” file CAREFULLY BEFORE YOU START**
**Data**
For this exam, you will need a range of datasets named 1. "heart", 2. "kidney", 3. "ovarian", and 4. the data files from the genomics practical. 
Data 1-3 are provided in two formats, 1) as a combined ".RDS" file and 2) as individual files in ".csv" files (provided as a ".zip"). You will find that using the ".RDS" is the easiest way to get all the data in one go. However, if you want to try JASP for some of the questions, you will find the ".csv" files inside the ".zip" file the easiest way to go.
The details of these datasets are provided with the corresponding questions.
Data 4 is provided as part of the practical sessions, and you should use those.
ANALYSIS: Only three questions from the exam require analysis inside RStudio. You are provided with template notebook “CW2_analysis_template.Rmd” with the relevant chunks already inserted to help you.
Good Luck
Please, attempt all questions below.
(IMPORTANT NOTE: There are no mistakes in the question text, so answer the questions as you see them)
Question 1 (10 marks)
You are an intern in a neurology research laboratory tasked with studying the effect of lifestyle on the risk of dementia. You have been given a data table containing three columns as follows: 
Column 1) weekly use of bleach (value range = 1 – infinite),
Column 2) length of fingernails (value range = short, medium, and long), and
Column 3) the risk of dementia (value range = Low and high). 
Please, answer the below questions using the above data:
1a - What type of variables are 1) weekly use of bleach, 2) waist size, and 3) the risk of dementia? (3 marks)
Ans:
weekly use of bleach (value range = 1 – infinite),    continuous variable 
length of fingernails (value range = short, medium, and long),  categorial variable
the risk of dementia (value range = Low and high).   Categorial variable
1b - What is the appropriate plot for visualising the relationship between “Column 2” and "Column 3"? (0.5 mark)
Ans:  Grouped Bar chart or stacked bar chart 
1c - What is the suitable plot for visualising the relationship between "Column 3" and "Column 1"? (0.5 mark)
Ans:  Box plot or violin plot 
1d - Name the statistical tests to confirm observations made in 1b and 1c above and list all their assumptions? (2 marks)
Ans: chi-square test for 1b
Assumption 1. Independence
						         2. sufficient sample size
      Independence t-test or Mann-Whitney test for 1c
Assumption 1. Independent observation 
	          					         2. normality 
		                                                         3. homogeneity of variance
1e – What type of plot is shown below (1 mark)? And interpret the plot in 50 words (2 marks). What is the median survival time for the “Low Score” Group? (1 mark)
Ans:
Figure 1e:
Above plot is Kaplan Meier plot 
A Kaplan-Meier plot displays the probability of an event (such as survival probability) over time. It provides insight into the survival patterns of a population and allows comparisons between different groups in this case the two groups are low score and high score. The plot shows the proportion of individuals at risk at each time point and highlights any differences in survival between groups. Here the p-value associated with the Kaplan-Meier plot is less than 0.05, which suggests that there is a statistically significant difference in survival between the low score and high score. This means that the observed difference in survival rates is unlikely to occur by chance alone. It provides evidence to support the hypothesis that the groups have different survival patterns or outcomes.
median survival time for the “Low Score” Group
Here the survival curve for the "Low Score" group in a Kaplan-Meier plot does not cross the 0.5 probability mark, it means that the median survival time cannot be directly determined from the plot
Question 2 (20 marks):
The kidney data file has columns called "region" and "KRT19", representing the kidney region of the sample and the expression of the gene KRT19 as TPM, respectively. 
Before attempting the question under this section, use online search to understand how the kidney regions in the data map to the anatomy of the kidney. Also, use online search to understand the full name of the KRT19 genes.
Please, answer the below questions using the above data:
2a - Using the correct statistical test, is the expression of KRT19 different across the kidney regions and between pairs of kidney regions? (10 marks)
(You MUST perform and comment on all the assumption checks and pre-tests to get any mark for question 2a)
Ans: we use one way anova to determine if the expression of KRT19 differs across kidney regions and between pairs of kidney regions
> library(readxl)
> kidney  View(kidney)
> length(kidney)
[1] 3
> region = kidney$region
> KRT10 = kidney$KRT19
> GGT1= kidney$GGT1
> 
> library(stats)
> #perform one way anova
> result = aov(KRT10 ~ region, data = kidney)
> result
Call:
   aov(formula = KRT10 ~ region, data = kidney)
Terms:
                   region Residuals
Sum of Squares  187.38708  60.43356
Deg. of Freedom         3        36
Residual standard error: 1.29565
Estimated effects may be unbalanced
> 
> # Check ANOVA assumptions
> # 1. Normality assumption
> shapiro.test(residuals(result))
	Shapiro-Wilk normality test
data:  residuals(result)
W = 0.94001, p-value = 0.03461
>#here p value is less than 0.05 hence data does not follow normality assumption
> # 2. Homogeneity of variances assumption (Levene's test)
> library(car)
> leveneTest(KRT19 ~ region, data = kidney)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  3  2.1461 0.1114
      36               
Warning message:
In leveneTest.default(y = y, group = group, ...) : group coerced to factor.
>
#here p value is > 0.05 hence it satisfy the homogeneity test of variance
> # Perform Tukey's HSD test
> tukey_result  
> # View the pairwise comparison results
> tukey_result
  Tukey multiple comparisons of means
    95% family-wise confidence level
Fit: aov(formula = KRT10 ~ region, data = kidney)
$region
                    		diff        lwr      	 upr     p adj
glomeruli-cortex     -0.572  -2.1325433 0.9885433 0.7575930
medulla-cortex        2.710   1.1494567  4.2705433 0.0002266
pelvis-cortex            4.810   3.2494567  6.3705433 0.0000000
medulla-glomeruli  3.282   1.7214567  4 .8425433 0.0000113
pelvis-glomeruli      5.382   3.8214567  6.9425433 0.0000000
pelvis-medulla        2.100    0.5394567  3.6605433 0.0047128
2b - Produce and interpret a publication-ready plot decorated with the correct statistical test for each pair.

In this assignment there is some parts make sure you do it in parts.

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment