A file combining clinical data and genetic data is provided and has been preprocessed...

Question

A file combining clinical data and genetic data is provided and has been preprocessed (BRCAMerged.csv).

A number of functions have been provided in cells#1-12 of the attached notebook (AssignmentWeek3.pdf). The corresponding script is provided as a Jupyternotebook and an R script, both called AssignmentWeek3.

The question asked in this assignment is to determine a selection of genes, called a genetic signature, that best separates normals and cancer samples. You can use the trail-and-error approach and visualize on a heatmap.

1) Download the attached files and place them in the same folder:

BRCAMerged.csv
AssignmentWeek3.ipynb
AssignmentWeek3.R

2) Run the script either as Week3.iptnb(Jupyternotebook installation) or as Week3.R (RStudioinstallation).

3) Through trial-and-error, for example adjusting the number of genes from Cell#12, fixed at 100, represent a better grouping of genes that optimally separates normals from cancer samples - you can see this by having most red samples grouped together. You can use any method of your choice to determine an optimal genetic signature.

You need to find a set of genes to generate a heatmap which groups more red samples together than the original heatmap.
The columns 1-40 of our input data contain the values of clinical features. As explained in the assignment description, we are selecting features representing the genes expressed differently between two groups. Therefore, you cannot select the clinical features.
If you want to use the BSS/WSS method, you can change the number of the genes selected for the original heatmap in our R script. This means that you can choose a subset of the genes selected for the original heatmap.
You can use any feature selection method other than the BSS/WSS method if you wish.

4) Turn in the assignment as a plain R script file (not Jupyter Notebook file), an original heatmap image, and a heatmap image of the selected genes, attached to your submission.

Note: the file BRCAMerged.csv can be downloaded from Google Drive:https://drive.google.com/file/d/1I8yySge8gTfKR2WlpQ_Q1SSAR-O8dtwn/view?usp=sharing(Links to an external site.)

Attachments

Data Dictionary.pdf
AssignmentWeek3.pdf
AssignmentWeek3.ipynb
AssignmentWeek3.r
BRCAMerged.csv:https://drive.google.com/file/d/1I8yySge8gTfKR2WlpQ_Q1SSAR-O8dtwn/view?usp=sharing

assignmentweek3-1-h1plmvvl.pdf assignmentweek3-d1kwpqr4.ipynb assignmentweek3-wfsxa404.r data-dictionary-5-t2jun1ls.pdf

Subhanbasha · Accepted Answer

# cell #1 loading the data set, which has patients as rows and variables as columns
mrnaNorm

A file combining clinical data and genetic data is provided and has been preprocessed (BRCAMerged.csv). A number of functions have been provided in cells#1-12 of the attached notebook...

Attachments

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment