Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

INSTRUCTIONS The purpose of this exercise is to obtain a better understanding of your technical skills and capabilities in the areas of data processing, R programming and data visualization /...

1 answer below »

INSTRUCTIONS

The purpose of this exercise is to obtain a better understanding of your technical skills and capabilities in the areas of data processing, R programming and data visualization / reporting.

Please use R (https://www.r-project.org/) to complete these tasks.

The R package “NanoStringNorm” (https://cran.r-project.org/web/packages/NanoStringNorm/index.html) can be used to read in RCC files.

Please provide the following in your response to this assessment:

1. R code used to complete the tasks for quality control (Section 3.1), data analysis (Section 3.2) and reporting (Section 3.3) as indicated below. R code should include comments that allows the reviewer to follow the process used to complete this case study.

2. Brief report that summarizes what you did and shows all figures and tables created for this case study. You may choose the format for reporting (e.g., PowerPoint, HTML report).

3 CASE STUDY

A pharmaceutical company generated gene expression data using the NanoString nCounter assay for five subjects across two timepoints (i.e. 10 samples total). Raw data from these 10 samples in NanoString’s Reporter Code Count (RCC) format have been made available along with an annotation file connecting RCC files with subjects and timepoints as shown below:

RCC File

Subject

Timepoint

GSM2055823_01_4353_PD_mRNA

1

Baseline

GSM2055824_02_4355_PD_mRNA

1

Post-Treatment

GSM2055825_03_3366_PD_mRNA

2

Baseline

GSM2055826_04_4078_PD_mRNA

2

Post-Treatment

GSM2055827_05_4846_PD_mRNA

3

Baseline

GSM2055828_06_3746_PD_mRNA

3

Post-Treatment

GSM2055829_07_3760_PD_mRNA

4

Baseline

GSM2055830_08_3790_PD_mRNA

4

Post-Treatment

GSM2055831_09_4436_PD_mRNA

5

Baseline

GSM2055832_10_4050_PD_mRNA

5

Post-Treatment

3.1 Quality Control

3.1.1 Overview

The NanoString assay contains a set of negative and positive control genes indicated by code class “positive” or “negative” in the raw files as shown in the following example extracted from file GSM2055832_10_4050_PD_mRNA.RCC:

These positive and negative control genes are used to assess quality by evaluating signal and noise levels.

3.1.2 Task

Please generate a heatmap showing positive and negative control genes in columns and samples in rows. Please consider the potentially different scales of the data and transform the data appropriately if needed.

Technnical Assessment for Data Scientist Position

3.2 Data Analysis

3.2.1 Overview

The Pharma client is interested in showing differences between the baseline and post-treatment timepoints for two genes of interest: MCL1 and CXCL1.

3.2.2 Task

Please generate a figure showing boxplots of summary statistics (minimum, 25th percentile, mean, median, 75th percentile, and maximum) for each timepoint by gene. If possible, please also overlay the individual data points for each sample.

In addition, please generate a table that lists the numeric values of the summary statistics shown in the boxplot.

3.3 Reporting

3.3.1 Overview

The Pharma client asked for a report that contains all figures and tables generated for this project.

3.3.2 Task

Please provide a brief report in a format of your choosing per the instructions for this case study
Answered Same Day Aug 03, 2021

Solution

Mohd answered on Aug 04 2021
138 Votes
Untitled
Untitled
-
8/4/2021
knitr::opts_chunk$set(echo = TRUE,cache = TRUE,warning = FALSE,message = FALSE,dpi = 180,fig.width = 8,fig.height = 5)
#if (!requireNamespace("BiocManager", quietly = TRUE))
#install.packages("BiocManager")
#BiocManager::install("ComplexHeatmap")
li
ary(ComplexHeatmap)
#BiocManager::install("dendextend")
li
ary(dendextend)
#install.packages("co
plot")
li
ary(co
plot)
li
ary(rcc)
#BiocManager::install("NanoStringNorm")
li
ary(nanostringr)
cc_file<-read_rcc(path = "/cloud/project/files")
Heatmap
my_matrix<-as.matrix(rcc_file$raw[,c(4:13)])
class(rcc_file)
## [1] "list"
class(my_matrix)
## [1] "matrix" "a
ay"
fontsize<-0.5
#Gene_symbol<-data.frame(Gene=mmc$Gene.Symbol)
Heatmap(my_matrix)
Heatmap(my_matrix,cluster_columns = FALSE,
row_names_side = "left",
row_names_gp=gpar(cex=fontsize))
COL.OVD <- "#66C2A5"
COL.OVO <- "#A6D854"
COL.OVCL <- "#FC8D62"
COL.HLD <- "#8DA0CB"
COL.HLO <- "#E78AC3"
getNum <- function(str.vect) {
sapply(strsplit(str.vect, "[_]"), "[[", 2)
}
#boxplot(perFOV ~ fove.counted, ylab = "% fov", main = "% FOV by fove.counted", data = rcc_file$exp, pch = 20,
#col = c(COL.HLD, COL.OVD, COL.OVCL, COL.HLO, COL.OVO))
#abline(h = 75, lty = 2, col = "red")
#grid(NULL, NULL, lwd = 1)
#Co
eletionMatrix
es <- cor(rcc_file$raw[4:13])
ound(res, 2)
## gsm2055823014353pdmrna-zxlba3e
## gsm2055823014353pdmrna-zxlba3er 1.00
## gsm2055824024355pdmrna-cplirsoi 0.90
## gsm2055825033366pdmrna-vdzuy1ic 0.91
## gsm2055826044078pdmrna-ir4cfdoi 0.98
## gsm2055827054846pdmrna-1og2mkza 0.90
## gsm2055828063746pdmrna-owqwv5us 0.77
## gsm2055829073760pdmrna-hgjac45r 0.90
## gsm2055830083790pdmrna-av1oifdi 0.94
## gsm2055831094436pdmrna-3ubwxwbn 0.91
## gsm2055832104050pdmrna-xftlqjmo 0.95
## gsm2055824024355pdmrna-cplirsoi
## gsm2055823014353pdmrna-zxlba3er 0.90
## gsm2055824024355pdmrna-cplirsoi 1.00
## gsm2055825033366pdmrna-vdzuy1ic 0.71
## gsm2055826044078pdmrna-ir4cfdoi 0.88
## gsm2055827054846pdmrna-1og2mkza 1.00
## gsm2055828063746pdmrna-owqwv5us 0.57
## gsm2055829073760pdmrna-hgjac45r 0.99
##...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here