Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Microsoft Word - CW-220CT-2 Faculty of Engineering, Environment and Computing EEC 220CT Assignment Brief 2018/19 Module Title Data and Information Retrieval individual Cohort (Sep) Module Code 220CT...

1 answer below »
Microsoft Word - CW-220CT-2
Faculty of Engineering, Environment and Computing
EEC 220CT

Assignment Brief 2018/19

Module Title
Data and Information Retrieval
individual Cohort
(Sep)
Module Code
220CT
Coursework Title (e.g. CWK1)
CW 1
Hand out date:
29 October 2018
Lecturer
Rachid Anane
Due date:
7 December 2018
Estimated Time (hrs): 20
Coursework type:
CW
% of Module Mark
50
Submission a
angement online via CUMoodle:
File types and method of recording:
Mark and Feedback date:
Mark and Feedback method: feedback file
Module Learning Outcomes Assessed:
1. Explain the difference between data and information and its significance as a business
esource. 2. Identify the main advantages and disadvantages of using database and
information retrieval systems. 3. Analyse, design, implement and manage a database
solution for a specified commercial or scientific objective. 4. Demonstrate understanding of
Big Data as a concept and as a business tool through the application of data analysis
techniques
Task and Mark distribution:
1. Normalisation (25%)
2. Database design (25%)
3. MapReduce (25%)
4. Recommendation Systems (25%)
Notes:
1. You are expected to use the CUHarvard referencing format. For support and advice on how this
students can contact Centre for Academic Writing (CAW).
2. Please notify your registry course support team and module leader for disability support.
3. Any student requiring an extension or defe
al should follow the university process as outlined
here.
4. The University cannot take responsibility for any coursework lost or co
upted on disks, laptops
or personal computer. Students should therefore regularly back-up any work and are advised to
save it on the University system.
5. If there are technical or performance issues that prevent students submitting coursework
through the online coursework submission system on the day of a coursework deadline, an
appropriate extension to the coursework submission deadline will be agreed. This extension
will normally be 24 hours or the next working day if the deadline falls on a Friday or over the
weekend period. This will be communicated via email and as a CUMoodle announcement.
220CT – Data and Information retrieval

This assignment is made up of four parts:
- Part 1 deals with normalisation and E-R modelling.
- Part 2 covers database design.
- Part 3 involves the application of MapReduce
- Part 4 concerns recommendation systems

Part 1: Normalisation (This task is worth 25 marks)

The International Space Station (ISS) is a habitable artificial satellite in low Earth o
it. It is
the ninth space station to be inhabited by crews following previous o
ital stations that were
launched by the US the former Soviet Union and later Russia. The ISS is intended to be a
laboratory, observatory and factory in space as well as to provide transportation,
maintenance, and act as a staging base for possible future missions to the Moon, Mars and
eyond. In order to support the crew and overall operation of ISS the space agencies in
charge of running the station conduct regular missions to launch spacecraft ca
ying
payloads of essential or replacement equipment up to ISS. A payload inventory, see table
elow, is recorded of each mission, consisting of the space agency leading the mission and
the equipment payload to be sent up to ISS.

Mission
No.
Agency
No.
Lead
Agency
Country Mission
Date
Equipment Qty Equipment
Weight
ISS-
2237
178 JAXA Japan 14/12/2016 Potable
water
dispenser
2 100kg
Flexible air
duct
6 0.5kg
Small
storage
ack
4 2kg

ISS-
3664
526 ESA EU 16/01/2017 Bio Filter 6 0.20kg

ISS-
2356
167 NASA USA 12/042017 Small
storage
ack
3 2kg
Battery
pack
2 5Kg
Urine
transfer
tubing
2 1.5kg
O2 scru
er 1 50kg

ISS-
1234
032 Roskosmos Russia 16/04/218 Small
storage
ack
1 2kg
Flexible air
duct
2 0.5kg

1. Explain why the table is not normalised
2. Identify and state the functional dependencies in the table
3. Generate 1NF, 2NF and 3NF normalised relations.
- Justify clearly every step
- Produce the co
esponding tables
4. Produce SQL statements to create the 3NF relations (tables), and include SQL insert
statements for each of the tables.
5. Comment critically on the normalisation process.
6. Generate the ER diagram co
esponding to the table.



Part 2: Database Design (This task is worth 25 marks)

The NASA exoplanet dataset archive can be found here:
https:
exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-
tblView?app=ExoTbls&config=planets

In the context of Big Data, you are asked to design a database solution for the exoplanet
data set above. Your solution must include the following:

1. The database solution of your choice.
2. Justification for the choice of the database.
3. A detailed explanation of how the data will be stored and accessed in the database
you choose.
4. The benefits and drawbacks of this solution in relation to the type of data above and
the size of the data set.
5. The quality of service (QoS), such as scalability that should be provided to the user
should this solution be adopted.

Part 3: Sequential and parallel processing (This task is worth 25 marks)

Consider a flight data store with the following data structure, where all times are in GMT.
Each record consists of the 13 attributes; the set of allowable values of the attributes and
format are specified in the description (metadata).

XXXXXXXXXXData Value Description
1 Year XXXXXXXXXX
2 Month XXXXXXXXXX
3 Day of Month XXXXXXXXXX
4 Day of the Week 1 (Monday) – 7(Sunday)
5 Departure Time Recorded Departure time (hhmm)
6 Actual Departure time Scheduled Departure time (hhmm)
7 A
ival Time Recorded A
ival time (hhmm)
8 Ca
ier Ca
ier code (unique)
9 Flight Number Flight Number
10 Departure Delay minutes
11 A
ival Delay minutes
12 Cancellation Yes or No
13 Weather Delay minutes

An example record would have the following values:
(2015, 4, 20, 5, 1430, 1400, 1820, 131, JL729, 30, 15, No, 0)

Flight monitors would like to determine the number of flights which were delayed for each
ca
ier.

1. Assuming that the data is stored in a relational database produce, with justification,
the SQL statement to create the table and the SQL statement to determine the
number of flights which were delayed for each ca
ier.
2. Assuming that the data is too large to be processed in a centralised manner, and that
it is stored in an ordinary file, produce a distributed solution which applies
MapReduce to the data processing.
a) Justify your decisions and all the steps of your solution, and specify clearly the
map and reduce functions.
) Identify the advantages and drawbacks of this solution.
c) Use diagrams if required.
3. Assuming that the monitors wish to determine the number of delayed flights for a
specific year or month for example, comment on the general applicability of your
solution.
Part 4: Big Data and recommendation systems (This task is worth 25 marks)

Research and comment critically on the structure and the use of recommendation systems.

a) You should pay particular attention to the rationale, the architecture, the processes,
the effectiveness, the implications of recommendation systems and relevant issues
within a Big Data context.
Your arguments should be supported by specific examples and case studies and
should be properly referenced.
Use suitable diagrams if required.

) Produce in your own words a well-structured and adequately referenced report that
should be no more than 1000 words.
Mark Scheme

Q1
Achieve 40% Achieve 70%

• Evidence of partially co
ect applicable
and co
ectly identified database.

• Evidence of reasoning behind database
choice.

• For each activity a
ief explanation of
design decisions should be provided.

• Models providing detail about the design
decisions and database design provided.
• A complete and co
ect design, including
all elements.
• A complete explanation of the reasons
ehind the choice of Database.
• A complete and fully implemented database.

• For each step an explanation and
justification of how and why it was applied.








Q2
Achieve 40% Achieve70%
• Basic d e f i n i t i o n of wh a t d a t a
m i n i n g i s with a few references.
• Basic understanding of sequential and
parallel processing.

• Basic application of a partially co
ect
SQL query.

• Partial understanding of parallel
processing.

• Partially co
ect MapReduce solution.

• Basic rationale for the solution
presented.

• Excellent definition of what data mining
is with a diverse set of
Answered Same Day Nov 11, 2020

Solution

Ankit answered on Nov 19 2020
135 Votes
Student_Name
                    Student_Id    
                    Title of assignment
Part 1: Normalisation
1. The table is not normalised because neither Mission No. nor Agency No. is the key (both have duplicates) so the key must be one of those (probably id) together with the equipment id. The other one (Equipment name, Qty and Equipment weight) then doesn't depend on that key, but just on id.
2. Functional dependency of tables are listed below:
Equipment_id equipment,qty,equipment_weight
Mission_no-agency_no,mission_date
Agency_no-Lead_agency,country
Equipment_id-Mission_no
3. 1-3 Normal forms
1 NF
· Each table cell should contain a single value.
· Data must not contain repeating groups.
    Mission No.
    Agency No.
    Lead Agency
    Country
    Mission Date
    ISS2237
    178
    JAXA
    Japan
    14/12/2016
    ISS3664
    526
    ESA
    EU
    16/01/2017
    ISS2356
    167
    NASA
    USA
    12/042017
    ISS1234
    032
    Roskosmos
    Russia
    16/04/218
    Mission No.
    Equipment
    Qty
    Equipment Weight
    ISS2237
    Potable water dispense
    2
    100kg
    ISS2237
    Flexible air duct
    6
    0.5kg
    ISS2237
    Small storage rack
    4
    2kg
    ISS3664
    Bio Filte
    6
    0.20kg
    ISS2356
    Small storage rack
    3
    2kg
    ISS2356
    Battery pack
    2
    5Kg
    ISS2356
    Urine transfer tubing
    2
    1.5kg
    ISS2356
    O2 scru
e
    1
    50kg
    ISS1234
    Small storage rack
    1
    2kg
    ISS1234
    Flexible air duct
    2
    0.5kg
2 NF
“The data is said to be in second normalized form If,
1. It is in First normal form
2. There should...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here