COMP 3380 Winter 2023 Project Overview This project is an opportunity for you to design and implement your own database based on publicly available data. You will model the database using the ER...

2 answer below »

COMP 3380 Winter 2023 Project Overview
This project is an opportunity for you to design and implement your own database based on publicly
available data. You will model the database using the ER and EER concepts learned in class. You will
design an interface for an analyst with no programming experience to query your database . The project
will be 20% of your final grade.
Project Teams
The project is to be completed in team of two to five people. You are free to select your own teams by
assigning yourself to a group in UMLearn. Anyone who is not in a team by January 26,2023 will be
assigned a team by the course instructor.
A note about team: some teams will have communication problems, or work distribution problems. ALL
team members are responsible for effective team functioning. In extreme cases only contact your
instructor to mediate a group meeting. Waiting just before a deadline to solve the problem is too late.
Project Deliverables
The project is divided into three parts. Parts of the project build on each other. When starting the
project, read through every part’s description to have an idea of what’s coming. For example, in Part 3,
you will have to explain decisions you made for Part 1, taking good notes while doing Part 1 will help
with Part 3. Each part will be submitted to UMLearn.
• Part 1 (35%): Finding data and designing a database. Due Fe
uary 28th XXXXXXXXXXat 8pm.
• Part 2 (50%): Creating and populating a database and implementing an interface. March 28th ,
2023 at 8pm.
• Part 3 (15%): Writing a report. Due April 6,2023 at 8pm.
Project Marking
Please see the marking ru
ics in the individual part descriptions to help prioritize your work. In general,
we will be looking for ambitious and creative projects that are well executed. Details listed in the
descriptions are but minimum requirements, an excellent project deserving of an A+ will seek to
surpass what is described in this document.
All ru
ics use a non-linear scale as follows to represent the quality of the work submitted. Each item
will be graded /5. Furthermore, some items might ca
y more weight and are denoted by multipliers.
• 5 – top of the class, amazing
• 4 – great work, all required components covered and done well
• 3 – OK work, some components missing, some things done well
• 2 – poor work, some components missing, few components done well
• 1 – very poor work, many components missing, components not done well
• 0 – not submitted or does not demonstrate understanding the deliverable
Part 1: Designing a Database
A) Data Discovery
The first part of the project involves finding some data to analyze. One viable source is the Winnipeg
Open Data portal (https:
data.winnipeg.ca), though you are free to use other sources (e.g., wikis). Data
from any public source is acceptable but remember to acknowledge your source(s)! (Be aware of any
copyright or licensing issues if you want to use this data beyond this course.)
You can aggregate data from multiple sources, but all data must ultimately be connected. In other
words, the ER diagram you draw with this data for Part 1.B must be a connected graph (even better, try
to have a tightly connected graph, i.e., the graph remains connected if you start removing tables). Aim
to find and/or create a dataset that will ultimately
eak down to more than 10 tables and 1000 rows
with relatively few support tables compared to main tables after completing Part 1.B. A support table is
a table that is usually small in both arity and cardinality, and is mostly used for lookup purposes (e.g., a
table that just has Rank and Salary, where you can lookup a Salary based on Rank).
B) Database Design
You will draw an ER model (including EER components, if appropriate) which represents the database
you have chosen to create. The model must include participation and cardinality constraints, as well as a
ief justification for each. Justifications should explain the “why” of constraints, not merely putting
them into words (e.g., “not every Song is written by an Artist” = bad, “some Songs are written by
unknown Artists, and so aren’t in the Wrote table” = better).
You will then convert your ER model to a relational model and normalize it as much as possible using the
ules and standards discussed in class and in the lectures.
Part 1: Deliverable
A PDF containing a
ief summary of your dataset (e.g., a short description of what it is, how much
data), your ER model, and the final relational model (post-merge and post-normalization). Include your
justifications for participation/cardinality constraints and the steps you took for translating your ER
model to your final relational model (including steps for merging and normalizing).
Part 1: Marking
• Quality of dataset (e.g., size, connectedness)
• Ratio of support tables to main tables
• ER/EER diagram
• Justification of participation and cardinality constraints
• Translating and merging ER/EER diagram to relational model
• Normalization
Part 2: The Database and its Interface
A) Database Creation and Population
You will implement the database you designed in Part 1. If you were given feedback from Part 1, it
should be incorporated with You will be given a Microsoft SQL Server schema you must create your
database on.
https:
data.winnipeg.ca
Once your database is created, you will populate it with your data. You are must to use a code-based
method to add records, you will submit your code for the project.
B) Implementing an Interface
You will create a front-end interface which allows a person (say, an analyst) to access and use your
database. It can be as simple (e.g., command line interface) or as feature-rich (e.g., complete GUI) as you
want, and can be implemented using any programming language you want (e.g., Java, Python, as a
website). However, you must consider the following requirements:
• When using the interface, the database should be relatively secure according to what was
discussed in class (e.g., can’t allow freely entering SQL commands to prevent SQL injection).
• Your interface should support an analyst trying to answer interesting questions they might have
of the data, which are not easily answerable. You might consider taking some time to come up
with interesting questions an analyst might have and allow your interface to execute the
elevant queries. You should support at least one query which includes GROUP BY, one that
includes ORDER BY, and one that includes an aggregate function. Note that your interface does
not necessarily have to make these components explicit, at a minimum, it should simply allow
someone to run those queries. Some tips:
o GROUP BY and aggregate functions are hard for humans to do on the fly, so are an easy
way to create interesting queries.
o Queries should be relevant and potentially useful to an analyst. Complex queries that
are hundreds of words long, nested four layers deep could be interesting for you to
implement but might be too convoluted for an analyst to ever reasonably run.
o Don’t wo
y too much about optimizing your queries, as long as they run in a reasonable
amount of time. However, consider informing the user with a message of some kind if a
query is cu
ently being executed (so that they don’t think your interface has crashed).
• Content from all tables should be accessible one way or another.
Part 2: Deliverable
Everything required to create database tables, relationships, populate the tables, and finally the
program you wrote for interacting with your database. Include a readme.md file with instructions on
how to create and populate the database and run your program.
Part 2: Marking
• Database created and populated
• Queries – co
ectness
• Queries – complexity and interestingness
• Interface – functionality
• Interface – level of abstraction and ease-of-use
Part 3: Project Report
For this last part, you will write a short report XXXXXXXXXXpages) detailing your progress through the project. If
you took good notes when working through Parts 1 and 2, this should be relatively quick and easy to put
together. The report should at least include the following information, but you should consider adding
anything else that you find interesting or that might help someone reading your report understand what
you did.
• A summary of the data: Why was it chosen? What does if consist of? How large is it (file size,
number of records)? Don’t forget to acknowledge the sources of your data! Include an ER
diagram.
• A discussion of the data model
o Why was it
oken down into those tables?
o Did you face any difficult choices when designing the model (e.g., tricky
participation/cardinality ratio decisions)?
o Did the data model cleanly fit into the relational database?
o Do you regret any decisions you made in your model? Did you change your model from
part 1 when implementing it in part 2? What changes, and why?
• Could the data be modelled in a different way, why or why not? Given the work completed,
would you choose this model?
• A list of interesting queries you can run using the interface: Explain what the queries return, you
don’t have to include the SQL code.
• Does this dataset require a relational database? Would other database system be a better
choice in modelling this data? Why or why not? Would the “interesting queries” you wrote be
easier or harder to re-create if you were using an alternative database?
• Would this database be a good teaching tool for COMP 3380? Are there good problems for
future students to solve in this database? (Optional)
Part 3: Deliverable
A PDF of your report.
Part 3: Marking
• Summary of data
• Discussion of data model
• Summary of the database
• Summary of the interface
• List of interesting queries
• Other interesting discussions or summaries
• Writing quality

Comp 3380 Winter 2023 ASSIGNMENT 3 - Due Date Thursday March 30,2023 at 8:30 am
This assignment is individual work. Be sure to fill out the honesty
declaration available on the course website. You will not be able to see
your grade without filling out the declaration.
Please submit your assignment in Crowdmark
Question 1 (10 marks)
(a) Briefly describe the 3 approaches to database programming including at least one
advantage and one disadvantage of each approach (3 marks)
(b) Describe impedance mismatch issues, and possible solution (2 marks)
(c) Describe any 2 issues arising from SQL injection any 2 protecting techniques (2 marks)
(d) Describe at least 2 uses and 2 issues of indexing in relational database (2 mark)
(e) Compare and contrast VIEW and TRIGGER in relational database (1 marks)

Question 2 (20 marks)
Consider the Parking Winnipeg dataset provide in UMLEARN in answering the
following questions using a programming language (Python or Java) and clearly showing
how you CONNECT-EXECUTE-CLOSE.
a. Create a database table for the dataset introducing a SURROGATE key as the
primary key. Submit you programming code for the create table showing the addition
of su
ogate key to the existing attributes (5 marks)
. Write an embedded SQL statement within your programming language to classify the
dataset to year and day name hierarchy. See expected sample output in table 1 (5
marks)
Year Day Count
2010 Monday 2343
c. Write an embedded SQL statement to retrieve the street, year, month, day, and the
hour when parking authority issued the highest number of tickets. (5 marks)
d. Write an embedded SQL query to create a database procedure (classifier) to classify
the full fine into 3 categories (5 marks)
a. “Common Ticket” if the FULLFINE is less than or equal to 70
. “Valued Ticket” if the FULLFINE is between 71 and 200 inclusive
c. “Most valued Ticket “ if the FULLFINE is greater than 200
d. “Unknown” if the full fine is null
Bonus: Call and execute the procedure within your program (1 mark)
Question 3 (5 MARKS)
Proof using Armstrong inference rule
3 Given the following relation R and its functional dependencies, Find the closure of C Using
Armstrong’s Axioms.
R(A, B, C, D, X, Y, W)
BDW -> C
Y -> W
DXY -> AB
C -> BW
BC -> Y
Question 4 (20 marks)
Use the Li
ary Database for the following questions in Neo4j
a. Create the database including all relationships in Neo4j (5 marks)
Submit the queries and the outputs
. Submit the query, graph, and text for the following
i. Write a Cypher

project-ppk2di52.pdf maindata-bfttbqzw.zip assignment-32023v2-pigd0roe-kk1hktkk.pdf summary-of-dataset-lytcfazw-3pnvvqxd.pdf project-part1-erd-and-relational-disgram-gcqn53gl-uhesjhuu.pdf

Answered 15 days After Mar 20, 2023

Solution

Sumit Kumar answered on Apr 04 2023

43 Votes

SOLUTION.PDF

COMP 3380 Winter 2023 Project Overview This project is an opportunity for you to design and implement your own database based on publicly available data. You will model the database using the ER...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment