Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

· The main idea of the project is to take two articles /papers. Take dataset from one article/paper apply the methods (for analysis) from the other article/paper on the dataset you collected from the...

1 answer below »
· The main idea of the project is to take two articles /papers. Take dataset from one article/paper apply the methods (for analysis) from the other article/paper on the dataset you collected from the first article/paper and get the results. You must submit a report on what you have done.
· You must get an idea from those two articles/papers and make a project from them. (you can take any two articles/papers from bioinformatic stream after 2017 and do analysis and submit the results and report)
· Use python as the programming language to do analysis during your project.

Colorectal cancer stages transcriptome analysis
RESEARCH ARTICLE
Colorectal cancer stages transcriptome
analysis
Tianyao Huo1, Ronald Canepa2, Andrei Sura1, François Modave1, Yan Gong3,4*
1 Department of Health Outcomes & Policy, College of Medicine, University of Florida, Gainesville, Florida,
United States of America, 2 Information Technology and Services, University of Florida, Gainesville, Florida,
United States of America, 3 Department of Pharmacotherapy and Translational Research and Center fo
Pharmacogenomics, College of Pharmacy, University of Florida, Gainesville, Florida, United States of
America, 4 University of Florida Health Cancer Center, Gainesville, Florida, United States of America
* XXXXXXXXXX
Abstract
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of
cancer-related deaths in the United States. The purpose of this study was to evaluate the
gene expression differences in different stages of CRC. Gene expression data on 433 CRC
patient samples were obtained from The Cancer Genome Atlas (TCGA). Gene expression
differences were evaluated across CRC stages using linear regression. Genes with
p�0.001 in expression differences were evaluated further in principal component analysis
and genes with p�0.0001 were evaluated further in gene set enrichment analysis. A total of
377 patients with gene expression data in 20,532 genes were included in the final analysis.
The numbers of patients in stage I through IV were 59, 147, 116 and 55, respectively. NEK4
gene, which encodes for NIMA related kinase 4, was differentially expressed across the fou
stages of CRC. The stage I patients had the highest expression of NEK4 genes, while the
stage IV patients had the lowest expressions (p = 9*10−6). Ten other genes (RNF34,
HIST3H2BB, NUDT6, LRCh4, GLB1L, HIST2H4A, TMEM79, AMIGO2, C20orf135 and
SPSB3) had p value of XXXXXXXXXXin the differential expression analysis. Principal component
analysis indicated that the patients from the 4 clinical stages do not appear to have distinct
gene expression pattern. Network-based and pathway-based gene set enrichment analyses
showed that these 11 genes map to multiple pathways such as meiotic synapsis and pack-
aging of telomere ends, etc. Ten of these 11 genes were linked to Gene Ontology terms
such as nucleosome, DNA packaging complex and protein-DNA interactions. The protein
complex-based gene set analysis showed that four genes were involved in H2AX complex
II. This study identified a small number of genes that might be associated with clinical stages
of CRC. Our analysis was not able to find a molecular basis for the cu
ent clinical staging
for CRC based on the gene expression patterns.
Introduction
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of can-
cer-related deaths in the United States [1]. Among the five subtypes of CRC (adenocarcinomas,
PLOS ONE | https:
doi.org/10.1371/journal.pone XXXXXXXXXXNovember 28, 2017 1 / 11
a XXXXXXXXXX
a XXXXXXXXXX
a XXXXXXXXXX
a XXXXXXXXXX
a XXXXXXXXXX
OPENACCESS
Citation: Huo T, Canepa R, Sura A, Modave F,
Gong Y XXXXXXXXXXColorectal cancer stages
transcriptome analysis. PLoS ONE 12(11):
e XXXXXXXXXXhttps:
doi.org/10.1371/journal.
pone XXXXXXXXXX
Editor: Hiromu Suzuki, Sapporo Ika Daigaku,
JAPAN
Received: June 2, 2017
Accepted: November 10, 2017
Published: November 28, 2017
Copyright: © 2017 Huo et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
eproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: TCGA clinical data
and expression data were manually downloaded
from the Broad Institute (TCGA data version
2016_01_28) via the fire
owse.org website
(http:
fire
owse.org/?cohort=
COADREAD&download_dialog=true). The code
used to download the data can be accessed here:
https:
github.com/indera/crc_transcriptome_
analysis.
Funding: The authors received no specific funding
for this work.
https:
doi.org/10.1371/journal.pone XXXXXXXXXX
http:
crossmark.crossref.org/dialog/?doi=10.1371/journal.pone XXXXXXXXXX&domain=pdf&date_stamp= XXXXXXXXXX
http:
crossmark.crossref.org/dialog/?doi=10.1371/journal.pone XXXXXXXXXX&domain=pdf&date_stamp= XXXXXXXXXX
http:
crossmark.crossref.org/dialog/?doi=10.1371/journal.pone XXXXXXXXXX&domain=pdf&date_stamp= XXXXXXXXXX
http:
crossmark.crossref.org/dialog/?doi=10.1371/journal.pone XXXXXXXXXX&domain=pdf&date_stamp= XXXXXXXXXX
http:
crossmark.crossref.org/dialog/?doi=10.1371/journal.pone XXXXXXXXXX&domain=pdf&date_stamp= XXXXXXXXXX
http:
crossmark.crossref.org/dialog/?doi=10.1371/journal.pone XXXXXXXXXX&domain=pdf&date_stamp= XXXXXXXXXX
https:
doi.org/10.1371/journal.pone XXXXXXXXXX
https:
doi.org/10.1371/journal.pone XXXXXXXXXX
http:
creativecommons.org/licenses
y/4.0
http:
fire
owse.org/?cohort=COADREAD&download_dialog=true
http:
fire
owse.org/?cohort=COADREAD&download_dialog=true
https:
github.com/indera/crc_transcriptome_analysis
https:
github.com/indera/crc_transcriptome_analysis
carcinoid tumors, gastrointestinal stromal tumors, lymphomas and sarcomas), adenocarcino-
mas are the most common (95% of all CRCs). Cu
ently the staging of CRC, refe
ed to as clini-
cal staging, is based on results of physical exams, biopsies, and imaging tests (CT or MRI scan,
X-rays, PET scan, etc.). The criteria of staging are based on: 1) how far the cancer has grown
into the wall of the intestine; 2) whether it has reached nea
y structures; and 3) whether it has
spread to the nea
y lymph nodes or to distant organs. The results of surgery can be combined
with clinical staging to determine the pathologic stages. The most often used CRC staging sys-
tem is the AJCC cancer staging manual developed by American Joint Committee on Cance
(AJCC), based on conditions of primary tumor (T), regional lymph nodes (N) and distant
metastasis (M) [2]. The earliest stage cancers are called stage 0, then range from stage I through
IV, with additional sub-stages identified with the letters A, B and C [3].
Several genes, such as WNT, WAPK/PI3K, TGF-β, TP, have been associated with CRC. Fo
instance, mutations in adenomatous polyposis col (APC) gene, a tumor suppressor gene, were
found to be responsible for familial adenomatous polyposis and then further developed to
CRC [4]. MisMatch Repair system genes such as MLH1 and MSH2 gene were found to be asso-
ciated with Lynch syndrome, the most frequent form of hereditary CRC [5, 6]. Further, a
12-gene recu
ence score assay has been developed as a prognostic factor in stage II-III colon
or rectal carcinoma [7–9]. Even though many genes have been associated with an increased
isk of CRC, the genetic differences across different stages of CRC have not been clearly identi-
fied. So far, only one study had assessed the gene expression levels of three candidate genes
(MMP9, MMP28 and TIMP1) across CRC stages and found no statistically significant differ-
ences based on the stage of CRC [10]. There have been no studies in the literature comparing
the gene expression levels in the entire transcriptome across CRC stages. The purpose of this
study is to explore transcriptome-wide gene expression differences across different stages of
CRC followed by gene ontology, gene set network analysis approaches based on the publicly
available RNAseq dataset in The Cancer Genome Atlas (TCGA) [11].
Materials and methods
Data acquisition
The Cancer Genome Atlas (TCGA) (http:
cancergenome.nih.gov/) is a joint effort between
the National Cancer Institute (NCI) and the National Human Genome Research Institute
(NHGRI) to facilitate the sharing of data and speed up cancer research [11, 12]. The Eli and
Edythe L. Broad (Broad) Institute of MIT and Harvard is a joint venture between both institu-
tions and several area hospitals (https:
www.
oadinstitute.org/about-us). Their “FireHose”
project ingests, aggregates, standardizes, and processes TCGA data via automated pipelines in
an attempt to accelerate analysis and discoveries (https:
confluence.
oadinstitute.org
display/GDAC/Rationale).
The Broad Institute has established pipelines for processing each TCGA dataset and the
outputs from each stage of the pipeline are made available as a versioned set. Illumina HiSeq
expression data was processed by Broad Institute to output both reads per kilobase per million
mapped reads (RPKM) expression values [13] and RNA-seq by Expectation-Maximization
(RSEM) values [14] normalized to “upper quartile count at 1000”. TCGA clinical data and
expression data were manually downloaded from the Broad Institute (TCGA data version
2016_01_28) via the fire
owse.org website.
(http:
fire
owse.org/?cohort=COADREAD&download_dialog=true). The code used to
download the data can be accessed here: https:
github.com/indera/crc_transcriptome_
analysis.
Colorectal cancer stages transcriptome analysis
PLOS ONE | https:
doi.org/10.1371/journal.pone XXXXXXXXXXNovember 28, 2017 2 / 11
Competing interests: The authors have declared
that no competing interests exist.
http:
cancergenome.nih.gov
https:
www.
oadinstitute.org/about-us
https:
confluence.
oadinstitute.org/display/GDAC/Rationale
https:
confluence.
oadinstitute.org/display/GDAC/Rationale
http:
fire
owse.org
http:
fire
owse.org/?cohort=COADREAD&download_dialog=true
https:
github.com/indera/crc_transcriptome_analysis
https:
github.com/indera/crc_transcriptome_analysis
https:
doi.org/10.1371/journal.pone XXXXXXXXXX
Data merging
Using Python XXXXXXXXXXand version XXXXXXXXXXof the Pandas module, the expression data from the
Broad Institute was read into a Pandas dataframe, transposed, and re-saved. The clinical data
were also transposed in the same manner. Additionally, in order to cut down on the size of the
data and number of components of interest, only a subset of the columns from the clinical
data were kept for the analysis. These included common demographic data such as patient
gender, race, ethnicity, and age; clinical data such as cancer stage, associated International
Classification of Diseases (ICD) 10 codes, presence of polyps, whether analysis had been done
for common mutations such as KRAS and BRAF; and finally, approximately 85 different ali-
quot identifiers from the TCGA dataset itself.
Matching of clinical data with expression data was performed using TCGA’s "hy
idization
REF" identifier from the expression data and searching against the aliquot identifiers present
in the clinical data. Eventually, 377 patients with gene expression data from 20,532 genes were
included in the final analysis.
Differential expression analysis
Gene expression differences were evaluated across the disease stages using linear regression.
The standard deviation of the gene expression level for each gene was computed. The genes
with standard deviation of zero, which indicates no change in the gene expression, were
emoved from further analysis. To select top genes that are differentially expressed across can-
cer stages, a linear regression model was performed for each gene to test the trend in gene
expression with increasing cancer stages. The analyses adjusted for age, gender and race/eth-
nicity of the patients. Genes with p�0.0001 were considered suggestive and the expression
level by cancer stages were presented for these genes. Analyses were performed using R version
3.3.1 and SAS 9.4 (Cary, NC).
Principal component analysis
In order to identify gene expression pattern of the selected CRC samples across different
stages, all the genes with p�0.001 in the linear model analysis were included in the principal
component analysis using SAS. Ten principal components (PCs) were identified and the first
two PCs were plotted according to the staging status of the CRC patients.
Gene annotation and gene set enrichment analysis
Genes with expression difference of p� XXXXXXXXXXwere evaluated further in gene annotation
using DAVID [15]. Then the gene IDs and official gene names were used for further analysis.
ConsensusPathDB tool [16, 17] was then used to perform network-based and pathway-based
analyses on
Answered Same Day Apr 24, 2021

Solution

Pushpendra answered on Apr 27 2021
136 Votes
Age Of Diagnosis For TCGA Patient by Cancer Type
Abstract
Cancer represents a significant challenge for humankind, as early diagnosis and treatment
are difficult to achieve. BMI was used to categorize each person as underweight, normal
weight, overweight or obese. Two- and five-year survival rates were applied to estimate the
prognosis for each cancer type. All data were statistically analyzed. We identified that males
were more susceptible to lung, liver and skin cancer when compared with females, whereas
females were more susceptible to thyroid,
east and adrenal cortex cancer. High BMI (>25)
was positively associated with the occu
ence of cancer, although patients with high BMI at
the time of initial diagnosis had higher two/five-year survival rates. The survival rates for
cancer were positively co
elated with the age at initial pathologic diagnosis. Some types of
cancer were associated with particularly young ages of onset, including adrenocortical
carcinoma, cervical and endocervical cancers,
ain lower grade glioma, pheochromocytoma
and paraganglioma, testicular germ cell tumors and thyroid carcinoma. Hence, the early
diagnosis and prognosis for these cancers need to be improved. In conclusion, sex, BMI and
age are associated with the incidence and survival rates for cancers. These results could be
used to supplement precision and personalized medicine.
Introduction
Cancers are diseases involving the uncontrollable growth of abnormal cells that overcome
the usual limitations to cell division. Cancer is now a relatively common disease. For
example, there were about 90.5 million individuals diagnosed with cancer in 2015 , and
more than 14.1 million new cases of cancer occur each year. Cancer is a leading cause of
death worldwide, accounting for 8.8 million deaths in 2015; 15.7% of all deaths. The most
common causes for cancer-related death are lung, liver, colorectal, stomach and
east
cancers . Cancer is becoming an enormous burden on society in high- and...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here