08/04/2021 QuBi/modules
iol303 - EvoBioLabatHunte
diverge.hunter.cuny.edu/labwiki/QuBi/modules
iol303 1/7
Figure 1. Development of Dictyostelium (M. Grimson, R. Blanton, Texas Tech
University)
QuBi/modules
iol303
From EvoBioLabatHunte
Bioinformatics Lab: Exploration of Gene Expression in Dictyostelium species
Contents
1 Objectives
2 Lab Report
Grading Policy
3 Introduction
4 Procedures
4.1
Understand
the design of
an RNA-SEQ
experiment
using NCBI
GEO database
4.2 Search fo
gene
information
using
DictyBase
4.3 Explore
expression
profiles of
individual
genes using
dictyExpress
4.4 Identify
co-regulated
genes using
co
elational
distances and
cluste
analysis
5 Discussion
Questions
6 References &
Resources
Objectives
1. Understand the RNA-SEQ technology and its use in genome-wide identification of gene functions.
2. Be able to identify co-expressed and co-repressed genes based on time-course gene expression data.
Lab Report Grading Policy
http:
diverge.hunter.cuny.edu/labwiki/File:Dicty-cycle.gif
08/04/2021 QuBi/modules
iol303 - EvoBioLabatHunte
diverge.hunter.cuny.edu/labwiki/QuBi/modules
iol303 2/7
Figure 2. a) Results from four individual Northern
lots examining four different genes and measuring
mRNA production over time, as indicated. b)
Results from a series of microa
ays for the same
four genes of interest. Note the color scale on the
ottom of b), where
ight green indicates a 20-fold
epression and
ight red indicates a 20-fold
induction. Black indicates no change in
transcription. (Source: Campbell & Heyer. (2003).
Discovering Genomics, Proteomics, &
Bioinformatics. Pearson Education, Inc.)
Introduction (1 pts)
Define transcriptome. List key steps in RNA-SEQ technology. Describe advantages of high-throughput
technologies in comparison with traditional gene-by-gene approaches of studying gene function. You
statements are not to be copied from the Lab Manual.
Materials and Methods (1 pts)
Describe experimental procedures of the study that have produced these gene expression data by reading
this paper (http:
genomebiology.com/2010/11/3/R35) and this experimental report
(http:
www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE XXXXXXXXXXAnswer the following questions:
1. Name of the two species used in experiments
2. How many genes were measured for their expression levels?
3. How many time points, developmental stages, and cell types have been tested?
Results (5 pts)
1. Table 2 (annotation for 15 genes)
2. Expression profiles (screen capture for 15 genes)
3. Table 4 (co
elation coefficients)
4. Heat map of 15 genes (screen capture)
Discussion (2 pts)
Answer the four discussion questions.
Summary/Conclusion (1 pt)
A sentence or two will suffice.
References (1 pt)
Credit is given for pertinent references obtained from sources other than the Lab Manual.
Introduction
Gene expression is the transcription of a DNA template
into RNA molecules, some of which are eventually
translated into proteins. In a multicellular organism, the
subset of genes that are expressed defines and gives rise to
a specific tissue or cell type. In this laboratory exercise, we
will use bioinformatics techniques to identify genes up- and
down-regulated in Dictyostelium during its development
from a unicellular stage to a multi-cellular stage.
Due to its unique mode of development (Figure 1),
Dictyostelium is an important model organism for the study
of how multicellular organisms evolved from unicellula
ones. It is also a key disease model for understanding
cancer, especially regarding the mechanism of cell
migration, chemotaxis, and metastasis.
Traditionally, gene expressions are studied one gene at a
time using blotting techniques. For example, in a Northern
Blot experiment (Figure 2a), the whole messenger RNA
(mRNA) content of a cell is extracted and loaded on a solid
gel slab. Different mRNA molecules are then separated
using electrophoresis and transfe
ed to a nitrocellulose
sheet. To identify if a gene is expressed, a radioactively (or fluorescently) labeled oligonucleotide probe that is
specific to the gene sequence is applied to the sheet. If the gene is expressed, the probe will hy
idize with a
http:
diverge.hunter.cuny.edu/labwiki/File:Bio_202_fig_4.jpg
http:
genomebiology.com/2010/11/3/R35
http:
www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17637
08/04/2021 QuBi/modules
iol303 - EvoBioLabatHunte
diverge.hunter.cuny.edu/labwiki/QuBi/modules
iol303 3/7
Figure 3. A typical RNA-Seq
experiment. Briefly, long RNAs
are first converted into a li
ary of
cDNA fragments through eithe
RNA fragmentation or DNA
fragmentation. Sequencing
adaptors (blue) are subsequently
added to each cDNA fragment
and a short sequence is obtained
from each cDNA using high-
throughput sequencing
technology. The resulting
sequence reads are aligned with
the reference genome o
transcriptome, and classified as
three types: exonic reads, junction
eads and poly(A) end-reads.
These three types are used to
generate a base-resolution
expression profile for each gene,
as illustrated at the bottom; a
yeast ORF with one intron is
shown. (Source: Wang, Gerstein,
and Snyder (2009)
(http:
www.ncbi.nlm.nih.gov/pubm
The expression level of a gene is
measured by its FPKM, which
stands for fragments per kilobase
of total gene length per million
mapped reads. In essence, FPKM
is the amount of short reads
mapped to a gene normalized by
the gene length and the total
number of reads generated from
an experiment. The normalization
y gene length and total reads
makes it possible to compare
expression levels across genes as
well as among experiments.
specific mRNA molecule and a black band will appear on an Xray film.
Other blotting techniques for detecting gene expression include Southern
Blot, in which mRNAs in a cell are reverse transcribed to thei
complementary DNA (cDNA) before being hy
idized with gene-specific
oligo-nucleotide probes. In a Western Blot experiment, the protein product
(instead of the mRNA intermediate) of a gene is probed using antibodies
(instead of the oligonucleotide probes).
After the genomic revolution since the 1990s, it became possible to study
the expression of all genes in a cell at once using high-throughput
techniques. Detecting the expression profiles of a whole genome was made
possible by the availability of the whole genome sequences of bacteria,
yeasts, and humans. The DNA microa
ay (Figure 2b) is one such high
throughput technique. In contrast to the Northern Blot technique in which
the mRNA sample is fixed on a nylon sheet, nucleotide probes for all genes
are fixed on a glass slide, creating a “gene chip”. The cellular mRNAs are
everse transcribed into cDNAs labeled with fluorescent dyes, which are
then hy
idized with the gene chips. After the unattached cDNAs are
washed away, the fluorescent intensity remains at each probe location is
measured as an indication of the amount of mRNA transcribed from each
gene in a genome. The entire cellular RNA content transcribed from a
genome is called a transcriptome. Each DNA microa
ay reading is
therefore essentially a snap-shot of the whole genome expression profile of
a cell at a particular physiological stage. It is no longer necessary to know
or decide beforehand candidate genes to be targets of exploration, as in the
traditional blotting techniques.
Most recently, direct sequencing of the whole mRNA content of a cell
using the so-called RNA-SEQ technology (Figure 3) provides an
alternative and even more accurate way of obtaining the transcriptome of a
cell. Unlike the microa
ay technology, the RNA-SEQ technology allows
de novo discovery of transcribed genes since it does not rely on a pre-
defined DNA probes. Another major advantage of the RNA technology is
its ability to detect splice variants, which are differentially spliced exons of
the same gene.
These high-throughput technologies, however, create new technical
challenges of their own. The main challenge is the analysis of the huge
amount of data resulting from each microa
ay or sequencing experiment.
First, data from high-throughput experiments need computer-assisted data
processing and analysis. Second, statistical analysis and testing become
essential tools for the discovery and exploration of gene functions, e.g.,
finding co-expressed genes.
Procedures
HINT: Start a WORD or PowerPoint file as your personal lab notebook.
Using this file, you could copy and paste gathered information as well as
write notes to yourself.
Understand the design of an RNA-SEQ experiment using
NCBI GEO database
(http:
www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17637)
http:
diverge.hunter.cuny.edu/labwiki/File:RNA-SEQ-1.png
http:
www.ncbi.nlm.nih.gov/pubmed/ XXXXXXXXXX
http:
www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17637
08/04/2021 QuBi/modules
iol303 - EvoBioLabatHunte
diverge.hunter.cuny.edu/labwiki/QuBi/modules
iol303 4/7
1. Name and describe the two species tested in experiments
2. How many genes were measured for their expression levels for each species?
3. How many time points, developmental stages, and cell types have been tested for expression differences?
4. How many replicates for each developmental stage?
Search for gene information using DictyBase (http:
dictybase.org/)
1. Select at least five genes from each of 3 gene groups in Table 1
2. For each of the five genes, search its annotation in DictyBase (http:
dictybase.org/) by copying &
pasting the ID in the search box (top right) and click "Search All"
3. Collect the gene information and make a table by following the example in Table 2
Table 1. Gene lists
Gene
Group DictyBase IDs
Group
A
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0285425
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0269124
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0289329
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0280961
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0292266
DDB_G0281387
Group
B
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0288273
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0290141
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0268302
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0267604
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0276871
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0292388
DDB_G0293742
Group
C
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0280049
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0274211
DDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G XXXXXXXXXXDDB_G0269222
DDB_G XXXXXXXXXXDDB_G0271806
Table 2. Gene annotations
DictyBase ID GeneName
Gene
Product Description
GO-
Molecula
Function
(MF)
(pick one)
GO-
Biological
Process
(BP) (pick
one)
GO-
Cellula
Component
(CC) (pick
one)
Curato
Notes (
ief
quote)
DDB_G XXXXXXXXXXacrA adenylatecyclase
contains a cyclase
domain, 7
transmem
ane
helices, a histidine
kinase domain, and
two receive
domains
adenylate
cyclase
activity
sporulation
esulting in
formation
of a
cellula
spore
integral
component
of
mem
ane
The acrA gene
encodes the
late
developmental
stage
adenylate
cyclase which
is essential fo
spore
encapsulation.
Explore expression profiles of individual genes using dictyExpress
(http:
www.dictyExpress.org)
http:
dictybase.org
http:
dictybase.org
http:
www.dictyexpress.org
08/04/2021 QuBi/modules
iol303 - EvoBioLabatHunte
diverge.hunter.cuny.edu/labwiki/QuBi/modules
iol303 5/7
Figure 4. Pearson's
Figure 5. Calculate r using the Excel function
CORREL()
1. Click on the website for dictyExpress and "Run dictyExpress (RNA-seq)."
2. A tutorial may start if this is your first time using the website. Feel free to do the tutorial. If you do not
want to do the tutorial, close the tutorial box.
3. In the "Experiment and Gene Selection" panel, select "1. D. discoideum vs D. purpurem, Parikh A et.al.,
D. discoideum." This will select the experiment you read about. Make sure it is highlighted before you do
the next steps.
4. In the "Experiment and Gene Selection" panel, type in a Gene Name from Group A (e.g., acrA) in the
area under “Genes.”
5. Click "Update Selection." A plot should generate in the "Expression Time Course" panel. Screen shot
this plot in your notebook/Word Document/PowerPoint file (Hints: Check the "Legend" box to show the
gene’s name on the plot. Click the lower right a
ow to expand the plot to full screen as needed. You can
move around the windows as well by dragging and dropping the window near the title of the window. Fo
example, you can drag and drop the “Expression Time Courses” window to the center of the screen if you
wish).
6. Is this gene up- or down-regulated during development?