The file used in this assignment is the onecombining clinical data and genetic data provided in the...

Question

The file used in this assignment is the onecombining clinical data and genetic data provided in the previous module (BRCAMerged.csv).

A number of functions have been provided in cells#1-15 of the attached notebook (AssignmentWeek4.pdf). The corresponding script is provided as a Jupyternotebook and an R script, both called AssignmentWeek4.

The question asked in this assignment is to compare classification models and is similar to the worksheet.

1) Download the attached files and place them in the same folder:

BRCAMerged.csv
AssignmentWeek4.ipynb
AssignmentWeek4.R

2) Run the script either as AssignmentWeek4.iptnb(Jupyternotebook installation) or as AssignmentWeek4.R (RStudioinstallation).

3) Create at the end additional code to add some clinical variables to the genetic variables exclusively used in AssignmentWeek4 as provided. Namely, the goal is to addnumeric variables we have not analyzed yet, such as stage (column #5), Diagnosis.Age (column #6), Birth.from.Initial.Pathologic.Diagnosis.Date (column #12), Death.from.Initial.Pathologic.Diagnosis.Date (column #14), Last.Alive.Less.Initial.Pathologic.Diagnosis.Date.Calculated.Day.Value (column #15), Days.to.Last.Followup (column #16), Disease.Free.Months (column #17), Fraction.Genome.Altered (column #22), HER2.ihc.score (column #25), Overall.Survival..Months. (column #32). To do this, create a modified version of Cell #6 to add these variables to the genetic variables. You should now have a dataset for analysis containing 20541 variables.

4) Run again the analyses in cells [9] to [15], which you can duplicate below the previous code, to see whether there are differences in the classification results. Enter the R code in the next cells.

5) Turn in the assignment as a plain R script file (not Jupyter notebook file), attached to your submission.

Note: the file BRCAMerged.csv can also be downloaded from Google Drive:https://drive.google.com/file/d/1I8yySge8gTfKR2WlpQ_Q1SSAR-O8dtwn/view?usp=sharing

assignmentweek4bhi557-ffompm1n.pdf

Mohd · Accepted Answer

Untitled
Untitled
-
8/2/2021
cell #1
#install.packages("randomForest")
#install.packages("class")
cell #2
memory.limit(size=3500)
## Warning in memory.limit(size = 3500): cannot decrease memory limit: ignored
## [1] 8036
library(randomForest)
## randomForest 4.

The file used in this assignment is the onecombining clinical data and genetic data provided in the previous module (BRCAMerged.csv). A number of functions have been provided in cells#1-15 of the...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment