Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles...

1 answer below »

11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
lo
main/submissions/W200_Final_Exam.ipynb 1/7
UC-Berkeley-I-School / mids-w200-fall21-Lucas-Charles Private
Code Issues Pull requests Actions Projects Wiki Security
mids-w200-fall21-Lucas-Charles / submissions / W200_Final_Exam.ipyn
charleslucas1217berkeley Final Exam History
1 contributo
main
490 lines (490 sloc XXXXXXXXXXKB
https:
github.com/UC-Berkeley-I-School
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/issues
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/pulls
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/actions
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/projects
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/wiki
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/security
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/pulse
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/tree/main/submissions
https:
github.com/charleslucas1217berkeley
https:
github.com/charleslucas1217berkeley
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/commit/db2
ea2561f6f6bd99754b3dd48eadcb3b30c09
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/commits/main/submissions/W200_Final_Exam.ipyn
11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
lo
main/submissions/W200_Final_Exam.ipynb 2/7
W200 Introduction to Data Science
Programming, UC Berkeley MIDS
Instructions
The final exam is designed to evaluate your grasp of Python theory as well as Python
coding.
This is an individual exam.
You have 48 hours to complete the exam, starting from the point at which you first
access it.
You will be graded on the quality of your answers. Use clear, persuasive arguments
ased on concepts we covered in class.
Please double-click the markdown cells where it says "Your answer here" to input
answers (if you need more cells please make them markdown cells)
Use only Python standard li
aries, matplotlib, seaborn, NumPy and Pandas for this
exam
Please push the exam to your github repo in the folder /SUBMISSIONS/final_exam
YOUR NAME HERE
1: Short Answer Questions (25 pts - each question =
5 pts)
a) The following class Cart and method add_to_cart are parts of a larger program
used by a mobile phone company. The method add_to_cart will work when an object of
type MobileDevice or of type ServiceContract is passed to it. State whether the method
add_to_cart is a demonstration of the following items (yes/no) and the reasoning (1-2
sentences):
1. Inheritance
2. Polymorphism
3. Duck typing
4. Top-down design
5. Functional programming
In [ ]: # Method:
class Cart():

def __init__(self):
XXXXXXXXXXself.cart = []
XXXXXXXXXXself.total = 0

def add_to_cart(self, item):
XXXXXXXXXXself.cart.append(item)
11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
lo
main/submissions/W200_Final_Exam.ipynb 3/7
a) Your answer here
1. Inheritance:
2. Polymorphism:
3. Duck typing:
4. Top-down design:
5. Functional programming:
) Suppose you have a long list of digits (0-9) that you want to write to a file. From a
storage standpoint, would it be more efficient to use ASCII or UTF-8 as an encoding? What
is the most efficient way to create an even smaller file to store the information?
) Your answer here
c) Why is it important to sanity-check your data before you begin your analysis? What could
happen if you don't?
c) Your answer here
d) How do you determine which variables in your dataset you should check for issues prio
to starting an analysis?
d) Your answer here
e1) Explain why the following code prints what it does.
e1) Your answer here
e2) Explain why the following code prints something different.
e2) Your answer here
2: General Coding Questions (15 pts - each question
5 pts)
a) Using a list comprehension: Make a list of the squared numbers greater than 25 that are
the square of non-negative integer less than 10. Fill in a list comprehension below so that
we get this desired output.
pp ( )
XXXXXXXXXXself.total += item.price
In [ ]: def f(): pass
print(type(f))
In [ ]: def f(): pass
print(type(f()))
In [ ]: # 2a) Your code here
11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
lo
main/submissions/W200_Final_Exam.ipynb 4/7
) Below is a data frame of customers that have different cooling systems. Your data science
team lead wants the column cooling_system to be labeled with the integer numbers 1-4
instead of the text as shown below:
1 = Air Conditioning / AC / Air Con
2 = Heat Pump / HP
3 = Evaporative Cooler / Evap Coole
4 = Fan
Make a new column called cooling_type that maps the text values to the new numeric
values. Filter out the values that are not included in the mapping above. Print out/display
this new data frame. Be sure to list any assumptions also!
c) From the dataframe below, use groupby in Pandas to show how many total delegates
were obtained grouped by favorite color. Print this out.
# 2a) Your code here
In [ ]: import pandas

# creating a data frame from scratch - list of lists

data = [ [101, 'AC'],
[102, 'Heat Pump'],
[103, 'Air Con'],
[104, 'Air Conditioning'],
[105, 'Fan'],
[106, 'None'],
[107, 'Evap Cooler'],
[108, None],
[109, 'AC'],
[110, 'Evaporative Cooler'],
[111, 'geothermal'],
[112, 1]
]

# create a data frame with column names - list of lists

col_names = ['Cust_Number', 'Cooling_System']
df = pandas.DataFrame(data, columns=col_names)
df
In [ ]: # 2b) Your code here
In [ ]: import pandas

# creating a data frame from scratch - list of lists

data = [ ['marco', 165, 'blue', 'FL'],
['jeb', 0, 'red', 'FL'],
['chris', 0, 'white', 'NJ'],
['donald', 1543, 'white', 'NY'],
['ted', 559, 'blue', 'TX'],
['john', 161, 'red', 'OH']
11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
lo
main/submissions/W200_Final_Exam.ipynb 5/7
3: Bitcoin coding problem (20 points):
Bitcoin
Consider a record of a one-time investment in bitcoin with value of that investment tracked
monthly, provided as an (ordered) tuple of dictionaries, where each dictionary comprises
one key for the month and co
esponding one value for the value of the investment, and
the first entry (Jan 2018) is the initial investment made on 01 Jan 2018, shown in data
elow.
Write Python code to take such a record of any length (the below data is only a sample),
and output a table/dataframe comprising a row for each month with columns for date, start
alance, and return. Print out this table/dataframe.
Also, visualize the record as two vertically a
anged plots.
The top plot should show a line plot of start balance vs. month
The bottom plot should show a bar plot of return vs. month, with a black horizontal line
at return=0, and bars color-coded such that positive returns are green and negative
eturns are red.
The two plots' horizontal axes should align. Demonstrate that your code works by
applying it to data .
Notes:
The gain for each period is the end balance minus the start balance.
The growth factor for each period is the end balance divided by the start balance.
The return for each period is the growth factor minus 1.
4: Clinical disease data (40 pts)
Your boss comes to you Monday morning and says “I figured out our next step; we are
]

# create a data frame with column names - list of lists

col_names = ['name', 'delegates', 'color', 'state']
df = pandas.DataFrame(data, columns=col_names)
df
In [ ]: # 2c) Your code here
In [ ]: data = ({"Jan 2018":1000},{"Feb 2018":1100},{"Mar 2018":1400},{"Apr 2018":700},{
data
In [ ]: # 3) Your code here
11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
lo
main/submissions/W200_Final_Exam.ipynb 6/7
y y g y g p
going to pivot from an online craft store and become a data center for genetic disease
information! I found ClinVar which is a repository that contains expert curated data, and it is
free for the taking. This is a gold mine! Look at the file and tell me what gene and mutation
combinations are classified as dangerous.”
Make sure that you only give your boss the dangerous mutations and include:
1) Gene name
2) Mutation ID numbe
3) Mutation Position (chromosome & position)
4) Mutation value (reference & alternate bases)
5) Clinical significance (CLNSIG)
6) Disease that is implicated
Requirements
1) The deliverables are the final result as a dataframe with a short discussion of any
specifics. (that is, what data you would present to your boss with the explanation of you
esults)
2) Limit your output to the first 100 harmful mutations and tell your boss how many total
harmful mutations were found in the file
3) Use the instructor-modified "clinvar_final.txt" at this link:
https:
drive.google.com/file/d/1Zps0YssoJbZHrn6iLte2RDLlgruhAX1s/view?usp=sharing
This file was modified to be not exactly the same as 'standard' .vcf file to test your data
parsing skills. This is a large file so do NOT upload it into your github repo!
4) Replace missing values in the dataframe with: 'Not_Given'. Print or display this (including
the Not_Given count) for the column CLNSIG by using pandas value_counts() function
(https:
pandas.pydata.org/docs
eference/api/pandas.Series.value_counts.html).
5) State in your answer how you define harmful mutations
6) Do your best on getting to above requirements and submit whatever you do before
the deadline. If your work is incomplete be sure to describe the blockers that got in
your way and how you might get past them (if given more time).
7) You can use as many code blocks as you need. Please clean-up your code and make it
eadable for the graders!
Hints
We do not expect you to have any medical knowledge to solve this problem; look at
the data, read the documentation provided, and write down your assumptions!
https:
drive.google.com/file/d/1Zps0YssoJbZHrn6iLte2RDLlgruhAX1s/view?usp=sharing
https:
pandas.pydata.org/docs
eference/api/pandas.Series.value_counts.html
11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
https:
github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles
lo
main/submissions/W200_Final_Exam.ipynb 7/7
7) You can use as many code blocks as you need. Please clean-up your code and make it readable for the
graders!
Hints
• We do not expect you to have any medical knowledge to solve this problem; look at the data,
Answered 2 days After Nov 29, 2021

Solution

Sandeep Kumar answered on Dec 02 2021
127 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here