Microsoft Word - CW-220CT-2Faculty of Engineering, Environment and Computing EEC 220CT Assignment...

Question

Microsoft Word - CW-220CT-2Faculty of Engineering, Environment and Computing EEC 220CT  Assignment Brief 2018/19  Module Title       Data and Information Retrieval individual Cohort  (Sep) Module Code 220CT Coursework Title (e.g. CWK1)    CW 1 Hand out date: 29 October 2018 Lecturer   Rachid Anane Due date: 7 December 2018  Estimated Time (hrs):  20Coursework type:  CW % of Module Mark    50 Submission aangement online via CUMoodle: File types and method of recording: Mark and Feedback date: Mark and Feedback  method:  feedback fileModule Learning Outcomes Assessed: 1. Explain the difference between data and information and its significance as a business esource. 2. Identify the main advantages and disadvantages of using database and information retrieval systems. 3. Analyse, design, implement and manage a database solution for a specified commercial or scientific objective. 4. Demonstrate understanding of Big Data as a concept and as a business tool through the application of data analysis techniquesTask and Mark distribution: 1. Normalisation (25%) 2. Database design (25%) 3. MapReduce (25%) 4. Recommendation  Systems (25%)Notes: 1. You are expected to use the CUHarvard referencing format. For support and advice on how this students can contact Centre for Academic Writing (CAW). 2. Please notify your registry course support team and module leader for disability support. 3. Any student requiring an extension or defeal should follow the university process as outlined here.  4. The University cannot take responsibility for any coursework lost or coupted on disks, laptops or personal computer. Students should therefore regularly back-up any work and are advised to save it on the University system. 5. If there are technical or performance issues that prevent students submitting courseworkthrough the online coursework submission system on the day of a coursework deadline, an appropriate extension to the coursework submission deadline will be agreed. This extension will normally be 24 hours or the next working day if the deadline falls on a Friday or over the weekend period. This will be communicated via email and as a CUMoodle announcement.220CT – Data and Information retrieval  This assignment is made up of four parts: - Part 1 deals with normalisation and E-R modelling. - Part 2 covers database design. - Part 3 involves the application of MapReduce   - Part 4 concerns recommendation systems  Part 1: Normalisation  (This task is worth 25 marks)  The International Space Station (ISS) is a habitable artificial satellite in low Earth oit. It is the ninth space station to be inhabited by crews following  previous oital stations that  were  launched by  the US the former Soviet Union and later Russia. The ISS is intended to be a laboratory, observatory and factory in space as well as to provide transportation, maintenance, and act as a staging base for possible future missions to the Moon, Mars and eyond. In order to support the crew and overall operation of ISS the space agencies in charge of running the station conduct regular missions to launch spacecraft caying payloads of essential  or replacement  equipment up to ISS. A payload inventory, see table elow, is recorded of each mission, consisting of the space agency leading the mission and the equipment payload to be sent up to ISS.    Mission No. Agency No. Lead  Agency Country Mission Date Equipment Qty Equipment Weight ISS-2237 178 JAXA Japan 14/12/2016 Potable water dispenser 2 100kg      Flexible air duct 6 0.5kg      Small storage ack 4 2kg         ISS-3664 526 ESA EU 16/01/2017 Bio Filter 6 0.20kg         ISS-2356 167 NASA USA 12/042017 Small storage ack 3 2kg      Battery pack 2 5Kg      Urine transfer tubing 2 1.5kg      O2 scruer 1 50kg         ISS-1234 032 Roskosmos Russia 16/04/218 Small storage ack 1 2kg      Flexible air duct 2 0.5kg 1. Explain why the table is not normalised 2. Identify and state the functional dependencies in the table 3. Generate 1NF, 2NF and 3NF normalised relations. - Justify clearly every step - Produce the coesponding tables 4. Produce SQL statements to create the 3NF relations (tables), and include SQL insert statements for each of the tables. 5. Comment critically on the normalisation process. 6. Generate the ER diagram coesponding to the table.    Part 2: Database Design (This task is worth 25 marks)  The NASA exoplanet dataset archive can be found here:  https:exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=planets  In the context of Big Data, you are asked to design a database solution for the exoplanet data set above. Your solution must include the following:   1. The database solution of your choice.  2. Justification for the choice of the database. 3. A detailed explanation of how the data will be stored and accessed in the database you choose.  4. The benefits and drawbacks of this solution in relation to the type of data above and the size of the data set.  5. The quality of service (QoS), such as scalability that should be provided to the user should this solution be adopted. Part 3: Sequential and parallel processing  (This task is worth 25 marks)  Consider a flight data store with the following data structure, where all times are in GMT. Each record consists of the 13 attributes; the set of allowable values of the attributes and format are specified in the description (metadata).         XXXXXXXXXXData Value     Description  1 Year XXXXXXXXXX  2  Month XXXXXXXXXX  3  Day of Month XXXXXXXXXX  4  Day of the Week    1 (Monday) – 7(Sunday)  5  Departure Time    Recorded Departure time (hhmm)  6  Actual Departure time   Scheduled Departure time (hhmm)  7  Aival Time     Recorded Aival time (hhmm)  8  Caier     Caier code (unique)  9  Flight Number    Flight Number  10  Departure Delay    minutes  11  Aival Delay     minutes  12  Cancellation     Yes or No  13  Weather Delay    minutes  An example record would have the following values:(2015, 4, 20, 5, 1430, 1400, 1820, 131, JL729, 30, 15, No, 0)  Flight monitors would like to determine the number of flights which were delayed for each caier.  1. Assuming that the data is stored in a relational database produce, with justification, the SQL statement to create the table and the SQL statement to determine the number of flights which were delayed for each caier.2. Assuming that the data is too large to be processed in a centralised manner, and that it is stored in an ordinary file, produce a distributed solution which applies MapReduce to the data processing.a) Justify your decisions and all the steps of your solution, and specify clearly the map and reduce functions.  ) Identify the advantages and drawbacks of this solution. c) Use diagrams if required.3. Assuming that the monitors wish to determine the number of delayed flights for a specific year or month for example, comment on the general applicability of your solution.Part 4: Big Data and recommendation systems (This task is worth 25 marks)   Research and comment critically on the structure and the use of recommendation systems.  a) You should pay particular attention to the rationale, the architecture, the processes, the effectiveness, the implications of recommendation systems and relevant issues within a Big Data context.  Your arguments should be supported by specific examples and case studies and should be properly referenced.  Use suitable diagrams if required.  ) Produce in your own words a well-structured and adequately referenced report that should be no more than 1000 words.  Mark Scheme  Q1 Achieve 40% Achieve 70%  • Evidence of partially coect applicable and coectly identified database.  • Evidence of reasoning behind database choice.  • For each activity a ief explanation of design decisions should be provided.  • Models providing detail about the design decisions and database design provided. • A complete and coect design, including all elements.• A complete explanation of the reasons ehind the choice of Database.• A complete and fully implemented database.  • For each step an explanation and justification of how and why it was applied.         Q2 Achieve 40% Achieve70% • Basic d e f i n i t i o n  of  wh a t  d a t a  m i n i n g  i s  with a few references.• Basic understanding of sequential and parallel processing.  • Basic application of a partially coect SQL query.  • Partial understanding of parallel processing.  • Partially coect MapReduce solution.  • Basic rationale for the solution presented.   • Excellent  definition  of  what  data  mining  is with  a  diverse  set  of

Ankit · Accepted Answer

Student_Name
					Student_Id	
					Title of assignment
Part 1: Normalisation
1. The table is not normalised because neither Mission No. nor Agency No. is the key (both have duplicates) so the key must be one of those (probably id) together with the equipment id. The other one (Equipment name, Qty and Equipment weight) then doesn't depend on that key, but just on id.
2. Functional dependency of tables are listed below:
Equipment_id equipment,qty,equipment_weight
Mission_no-agency_no,mission_date
Agency_no-Lead_agency,country
Equipment_id-Mission_no
3. 1-3 Normal forms
1 NF
· Each table cell should contain a single value.
· Data must not contain repeating groups.
	Mission No.
	Agency No.
	Lead Agency
	Country
	Mission Date
	ISS2237
	178
	JAXA
	Japan
	14/12/2016
	ISS3664
	526
	ESA
	EU
	16/01/2017
	ISS2356
	167
	NASA
	USA
	12/042017
	ISS1234
	032
	Roskosmos
	Russia
	16/04/218
	Mission No.
	Equipment
	Qty
	Equipment Weight
	ISS2237
	Potable water dispenser
	2
	100kg
	ISS2237
	Flexible air duct
	6
	0.5kg
	ISS2237
	Small storage rack
	4
	2kg
	ISS3664
	Bio Filter
	6
	0.20kg
	ISS2356
	Small storage rack
	3
	2kg
	ISS2356
	Battery pack
	2
	5Kg
	ISS2356
	Urine transfer tubing
	2
	1.5kg
	ISS2356
	O2 scrubber
	1
	50kg
	ISS1234
	Small storage rack
	1
	2kg
	ISS1234
	Flexible air duct
	2
	0.5kg
2 NF
“The data is said to be in second normalized form If,
1. It is in First normal form
2.

Microsoft Word - CW-220CT-2 Faculty of Engineering, Environment and Computing EEC 220CT Assignment Brief 2018/19 Module Title Data and Information Retrieval individual Cohort (Sep) Module Code 220CT...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment