Microsoft Word - CM XXXXXXXXXXCoursework.docx
1
Cardiff School of Computer Science and Informatics
Coursework Assessment for CM3104 Large Scale Databases
Module Code: CM3104
Module Title: Large Scale Databases
Lecturer: C.B. Jones, A.I. Abdelmoty, J. Shao
Assessment Title: Coursework 1
Assessment Number: 1
Date Set: Thursday 1st November 2018
Submission Date and Time: Wednesday 5th December 2018 at 9:30am.
Return Date: Week 12, Friday 11th January 2019.
This assignment is worth 30 % of the total marks available for this module. The penalty for
late or non-submission is an award of zero marks.
Your submission must include the official Coursework Submission Cover sheet, which can be
found here:
https:
docs.cs.cf.ac.uk/downloads/coursework/Coversheet.pdf
Submission Instructions
Description Type Name
Cover
sheet
Compulsory One PDF (.pdf) file [student number].pdf
PART A Compulsory One PDF (.pdf) file comprising your
answer to all questions with
snapshots of the mongodb shell as
explained below.
PartA_[student number].pdf
Compulsory The javascript file with your answer
to all the questions in Task 2.
PartA_[student number].js
PART B Compulsory One PDF (.pdf) file that contains
answers to all questions.
PartB_[student number].pdf
PART C Compulsory One PDF (.pdf) file that includes for
each of the five answers to Part C:
1 - The Oracle SQL query;
2 – The answer to the query;
3 - A screen shot of:
the query in Oracle followed by
the Oracle output from the query.
PartC_[student number].pdf
Any deviation from the submission instructions above (including the number and types of
files submitted) may result in a mark of zero for the assessment or question part.
2
Coursework PART A: NoSQL Databases [worth 10 marks]
In this part of the coursework you will make use of two data sets: a restaurants data set (in
the file, restaurants.js) and a zipcodes data set (in the file zipcodes.js). You are building an
application with MongoDB that will use both data sets to find information about restaurants
in different cities.
An example record from the restaurants dataset is:
{
"_id" : ObjectId("55cba2476c522cafdb053ae8"),
"location" : {
"coordinates" : [
XXXXXXXXXX,
XXXXXXXXXX
],
"type" : "Point"
},
"name" : "C & C Catering Service"
}
An example record from the zipcodes dataset is:
{
"_id" : "01002",
"city" : "CUSHMAN",
"loc" : [
XXXXXXXXXX,
XXXXXXXXXX
],
"pop" : 36963,
"state" : "MA"
}
Here is a map of a sample from the restaurants dataset.
3
Task 1
Import both datasets into MongoDB in ONE database. Each dataset should be stored in a
separate collection. For example, you could build a “services” database with two
collections: “restaurants” and “zipcodes”.
(1 mark)
Task 2
1. Find all the Pizza restaurants in the dataset and count their number.
A Pizza restaurant is any restaurant which has the word “Pizza” in its title.
(1 mark)
2. For each Pizza restaurant from the above set, find the city that it is located in.
You can assume that a restaurant is located in the nearest city to its location.
(2 marks)
3. In your application you will need to retrieve the restaurants and their associated
cities frequently.
Create a new collection (“geodb”) that contains all the data in the restaurant
collection, but for every restaurant document in the new collection, add an
additional field “city” whose value is the closest city to the restaurant’s location. Use
the same method of finding the city as you have done in question 2.
Documents in your new “geodb” collection should be similar to the following:
{
"_id" : ObjectId("55cba2476c522cafdb053ae8"),
"location" : {
"coordinates" : [
XXXXXXXXXX,
XXXXXXXXXX
],
"type" : "Point"
},
"name" : "C & C Catering Service",
"city" : "BROOKLYN"
}
(3 marks)
4
4. Query the new “geodb” collection to find restaurants, grouped by city. Your answer
should include: city name, number of restaurants in the city and a list of all the
names of the restaurants in the city.
An example document of the results of this query is as follows:
{
"City": " MOBILE",
"No of Restaurants": 3,
"Restaurants": [
"Island Soft Pretzel Stop",
"Dairy Queen Grill & Chill",
"Statue Of Liberty Deli"
]
}
(2 marks)
5. A different method of storing the restaurant and city information together is by
modifying the original “zipcodes” collection by adding a field “restaurantsInCity”
whose value is an a
ay of all the restaurants that are located in the city. An
example of a document in the modified “zipcodes” collection is as follows:
{
"_id" : "36607",
"city" : "MOBILE",
"loc" : [
XXXXXXXXXX,
XXXXXXXXXX
],
"pop" : 8610,
"state" : "AL"
“restaurantsInCity” : [“Island Soft Pretzel Stop”, “Dairy Queen Grill & Chill”, "Statue
Of Liberty Deli"]
}
Explain whether this is a sensible modelling option in MongoDB, by refe
ing to its
effectiveness for storing and retrieving the information about restaurants and
associated cities.
(1 mark)
5
UPLOADS FOR PART A
1. Save your answer to the questions in Task 2 in a javascript file with the name:
PartA_[student number].js
2. Save your answer to all questions in Task 1 and Task 2 in a pdf file:
PartA_[student number].pdf. You can take a snapshot of the question being
executed in the MongoDB shell, clearly demonstrating the answer to the question
(include snapshots of all intermediate steps as appropriate). A sample of the results
is sufficient – this should be the first 5 documents. An example of a sufficient
snapshot of the answer to a query to find all records in the “zipcodes” collection is as
follows.