Background Microarrays are a tool that can be used for measuring the transcriptional activity of many genes simultaneously. A single microarray chip contains a substrate on which is placed a number of...

1 answer below »

Background
Microa
ays are a tool that can be used for measuring the transcriptional
activity of many genes simultaneously. A single microa
ay chip contains a
substrate on which is placed a number of spots. Each spot contains a DNA
sequence, usually of a specific gene, and when a transcriptionally active
iological sample is pippetted onto the microa
ay, the mRNA in the sample
that co
esponds to the DNA in the spot, binds to the spot. The amount of
mRNA bound to a spot provides an indication of the level of activity of the
co
esponding gene
and the amount can
e measured using
luminescence. A
single microa
ay
chip can contain
tens of thousands of
spots, providing a
cheap and effective
way of measuring
the transcriptional
activity of tens of
thousands of genes,
simultaneously.

For more information about microa
ays and how they work, see
http:
en.wikipedia.org/wiki/DNA_microa
ay
http:
unsolvedmysteries.oregonstate.edu/microa
ay_07
http:
www.bio.davidson.edu/genomics/chip/chip.html

Microa
ays are a not a perfect technology. There is a limit to their detection
accuracy and there can be systematic variations in the sensitivity of whole
chips from one to the next due to manufacturing issues. To compensate for
the systematics variation, we have to normalize the measurements of
microa
ay chips against each other so that we can make a fair comparison
etween chips. We can also apply normalization between genes so that we
can better compare the measurements of one gene to another.

A microa
ay often includes spots for a specific set of genes called
housekeeping genes. These are genes related to the core functionality of the
cell such as energy metabolism and cell maintenance that we would expect
not to change. These spots are useful for co
ecting for the systematic
variation between chips. Because we expect measurements for the
housekeeping genes to be unchanging, we can add or subtract values to all
the measurements taken on the same chip until the housekeeping gene
measurements match across all the chips. For more details about
housekeeping genes see http:
en.wikipedia.org/wiki/Housekeeping_gene

In this project, you are going to analyse data from an experiment that was
ecorded when a dish of cells were infected by a virus. Samples of the cells
were taken from the dish at the moment of infection and every hour for 4
hours afterwards and measured for transcriptional activity by microa
ay.
http:
en.wikipedia.org/wiki/DNA_microa
ay
http:
unsolvedmysteries.oregonstate.edu/microa
ay_07
http:
www.bio.davidson.edu/genomics/chip/chip.html
http:
en.wikipedia.org/wiki/Housekeeping_gene

You will normalize the data to co
ect for systematic chip variations and to
enable genes to be compared before plotting the data to identify which genes
are up regulated as a result of infection, which genes are down regulated and
which genes are not affected by infection.

Here we will only look at a subset of the spots on an a
ay (not tens of
thousands). We will consider 25 spots and write the co
esponding
measurements as a 25-component vector.

We have the following five vectors describing transcriptional activity:-

M0 = [ 1, 4, 3, 100, 12, 1000, 25, 32, 500, 700, 20,
9000, 25, 650, 40, 3, 67, 2, 305, 80, 200, 50, 2,
15, 4];

M1 = [ 1, 6, 2, 120, 14, 900, 25, 35, 550, 650, 23,
7500, 25, 700, 43, 6, 72, 8, 360, 70, 220, 60, 1,
14, 8];

M2 = [ 1.5, 5, 7, 150, 13, 550, 1, 40, 520, 600, 27,
5500, 27, 900, 44, 12, 75, 9, 360, 50, 190, 50,
4, 13, 6];

M3 = [ 122, 120, 120, 170, 140, 700, 100.5, 134, 570, 600, 125,
6100, 125, 1050, 122, 130, 187, 109, 460, 145, 330,
160, 105, 113, 104];

M4 = [ 25, 25, 30, 40, 50, 600, 6, 33, 500, 600, 26,
5500, 26, 1150, 14, 32, 126, 8, 360, 28, 230, 60,
2, 13, 5];

M0 is the set of microa
ay measurements taken at the point of infection - 0hrs
PI (Post Infection). M1 is the set of microa
ay measurements taken one hour
after infection - 1hr PI (Post Infection). M2 is the set of microa
ay
measurements taken two hours after infection - 2hrs PI (Post Infection). M3
is the set of microa
ay measurements taken three hours after infection - 3hrs
PI (Post Infection). M4 is the set of microa
ay measurements taken four
hours after infection - 4hrs PI (Post Infection).

Spots for a given gene are located in the same place on each microa
ay and
so the measurement for each gene for different time points are located in the
same place within each vector. Hence M0[12], M1[12], M2[12], M3[12] and
M4[12] describe the measurements for the same gene across the five time
points.

The last component (component number 25) contains the measurements for
the housekeeping gene that we will use to normalize the data.

Please submit your code by midnight on the night of the 16th November, 2021
(UK time) to the submission dropbox in the week 7 folder of blackboard learn.
This exercise is weighted at 20% of your final mark for the module.

The goal of this project is to write code that will cluster the above data in order
to identify up-regulated, down-regulated and unregulated genes.

1) Write a script that creates the five vectors above. 10%

2) In the same script, combine these vectors into a 5x25 matrix in time
order such that the first row contains the measurements at 0hr PI and
the fifth row contains the measurements at 4hr PI. 10%

3) In the same script, use the housekeeping gene to co
ect for
systematic variations between microa
ays. In the data above, you can
see that the measurements in M3 are systematically overestimated.
This step should eliminate this overestimation. Your script should
a. Calculate the mean value of the housekeeping gene across the
five time points XXXXXXXXXX%
. Calculate the difference between each measurement of the
housekeeping gene and the mean of the housekeeping gene.
XXXXXXXXXX%
c. Subtract this difference from the measurements of all the genes
at the same time point. This allows us to co
ect for the
sensitivity variation between microa
ay chips. This should yield
a 5x25 matrix in which the housekeeping gene measurements
are constant across all time points and M3 is no longer
systematically overestimated XXXXXXXXXX%

4) In the same script, normalize the genes against each other so that they
can be easily compared. Using the 5x25 matrix from 3), your script
should
a. calculate the mean measurement for each gene and the
minimum measurement for each gene and rescale each
measurement according to the formula:-
RescaledMeasurement = (Measurement –
MinimumMeasurement) / MeanMeasurement 15%
. adjust the measurements for each gene at all the time points
according to the formula:-
AdjustedRescaledMeasurement =
RescaledMeasurement + (1 –
RescaledMeasurementAt0Hrs)
so that all the time courses start from the same point, which we
will a
itrarily choose to be XXXXXXXXXX%

5) In the same script, produce a colour-coded plot of the time course of
activity for each gene such that any gene with
AdjustedRescaledMeasurementAt4Hrs >1.3 is drawn in red.
AdjustedRescaledMeasurementAt4Hrs <0.7 is drawn in blue.
AdjustedRescaledMeasurementAt4Hrs otherwise is drawn in green.

You can plot a vector, a, with the command
plot(a);
and you can plot the vector, a, in red, blue or green with the commands
plot(a, ‘r’);
plot(a, ‘b’);
plot(a, ‘g’);
espectively.

We want to plot all 25 genes on the same set of axes and this
essentially means using the plot command 25 times in a loop.
However, by default, Octave wipes the axes clean before running a plot
command which, by default, stops us putting 25 plots on the same axis.
The solution is to use the command
hold on;
immediately after the plot command as this tells Octave not to wipe the
figure in future. For example
plot(a, ‘r’);
hold on;

Be aware that the first time you use the plot command Octave will
produce the plot very slowly as it has to load the plotting plug-ins, but
thereafter it should run reasonably quickly.

The final figure you should obtain is 20%

computingproject1info-1hbm4xs2.pdf

Answered 4 days After Dec 14, 2021

Solution

Nishchay answered on Dec 19 2021

120 Votes

SOLUTION.PDF

Background Microarrays are a tool that can be used for measuring the transcriptional activity of many genes simultaneously. A single microarray chip contains a substrate on which is placed a number of...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment