Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Microsoft Word - homework4.docx Data Focused Python XXXXXXXXXXHomework 4 Due: 11:59 PM Sunday, November 27, 2022 This is an individual assignment. Problem 1 a. Read the file 'expenses.txt'...

1 answer below »
Microsoft Word - homework4.docx
Data Focused Python XXXXXXXXXXHomework 4
Due: 11:59 PM Sunday, November 27, 2022 This is an individual assignment.

Problem 1
a. Read the file 'expenses.txt' into a DataFrame named expenses – the data is separated by a colon ( : ). The first
line contains column headers. Display expenses

Read the file 'people.txt’, also colon-separated with column headers in the first line, into DataFrame people.
Display people.

Read the file ‘departments.txt’ into DataFrame departments – same formatting. Display departments.

. Clean people by removing any row with null data – display the rows removed (hint: use a variant of isna() ).
Then remove anyone whose department is not in departments (recall the isin( ) method from Lab 7) – again
display the rows removed. Display the cleaned version of people.

c. Clean expenses by removing any row with null data – display rows removed. Then remove rows whose id is
not in people – display rows removed. Then remove rows with a malformed date field (only check for the
co
ect number of digits, 8 – use a lambda and apply() ) – display rows removed. Display the cleaned version of
expenses.

d. Merge expenses and people on the ‘ID’ column using an inner join – name the result alldata. Sort alldata on
‘ID’. Create sums by grouping the rows (use groupby( ) ) on ‘ID’ and then use agg(‘sum’) to get the total of
each person’s expenses. Display sums.

Problem 2
Use the expenses and people DataFrames from Problem 1 to display the following graphs. Make sure you label
the graphs and axes co
ectly. Use the pandas functions groupby( ) with the column you need, together with
either count( ) or sum( ) – as in, groupby().sum(). Also, use the groupby parameter as_index=False to
store the grouping column as an actual column instead of as the index. The following graph shows the idea,
although the values may not be co
ect (it's sample data, and I was lazy about labels):








a. Bar chart containing the departments and the number of people employed in each.
. Bar chart containing the id's and the total expense amount for each id.
c. Bar chart containing the expense categories and the total expense amount for each category.
d. Pie chart containing the same information as #c.

Read the file “countryData.csv” into a DataFrame named countryData; this is a version of the GDP – Olympic
medals data from Lab 7. Use countryData to create regression plots for the following using Seaborn’s regplot.
Compute the co
elation coefficient and use it as an annotation on the graph.
e. Population against GDP
f. GDP against Weighted
g. Formula against Total
h. Formula against Weighted
JTW
Problem 3
Write a script by copying the relevant parts of the Lab 8 code and modify it to do these things:
a. Create a list with these topic strings: Python; Data Science; Data Analysis; Machine Learning; and Deep
Learning. Use these topics, one at a time, to query the Google Books API.
For each returned JSON string:
. Convert the JSON string to a dict using loads( ) (as in the lab), then use this to convert it to a DataFrame:
pd.io.json.json_normalize ( thedict['items'] )
c. Extract just the 'volumeInfo.title' and 'volumeInfo.authors' columns.
d. Relabel those two columns as 'Title' and 'Authors'.
After creating the five DataFrame objects, use concat( ) to create one table called bigTable (use
ignore_index=True). The function takes a list of the DataFrames to concatenate (i.e., in [ ]'s).
e. Display bigTable.
f. Re-display bigTable in the following way, using regular Python to display data extracted from bigTable.
Create the table headers (left justified), then use a for loop over bigTable.index, which will count on the index
number starting at 0. Display at most 25 characters of the title (just use [:25], even if the title has fewer
characters and only the first author. It should look something like this (your data may vary):




ID:Amount:Category:Date:Description
5:5.25:supply: XXXXXXXXXX:box of staples
7:79.81:meal: XXXXXXXXXX:lunch with ABC Corp. clients Al, Bob, and Cy
4:43.00:travel: XXXXXXXXXX:cab back to office
4:383.75:travel: XXXXXXXXXX:flight to Boston, to visit ABC Corp.
22:55.00:travel: XXXXXXXXXX:cab to ABC Corp. in Cam
idge, MA
17:23.25:meal: XXXXXXXXXX:dinner at Logan Airport
5:318.47:supply: XXXXXXXXXX:paper, toner, pens, paperclips, tape
22:142.12:meal: XXXXXXXXXX:host dinner with ABC clients, Al, Bob, Cy, Dave, Ellie
20:20.20::
20:20.20:::
49:303.94:util: XXXXXXXXXX:Peoples Gas
49:121.07:util: XXXXXXXXXX:Verizon Wireless
8:7.59:supply: XXXXXXXXXX:Python book (used)
8:79.99:supply: XXXXXXXXXX:spare 20" monito
13:49.86:supply: XXXXXXXXXX:Stoch Cal for Finance II
7:6.53:meal: XXXXXXXXXX:Dunkin Donuts, drive to Big Inc. near DC
7:127.23:meal: XXXXXXXXXX:dinner, Tavern64
22:33.07:meal: XXXXXXXXXX:dinner, Uncle Julio's
7:86.00:travel: XXXXXXXXXX:mileage, drive to/from Big Inc., Reston, VA
7::travel: XXXXXXXXXX:mileage, drive to/from Big Inc., Reston, VA
50:22.00:travel: XXXXXXXXXX:tolls
7:378.81:travel: XXXXXXXXXX:Hyatt Hotel, Reston VA, for Big Inc. meeting
8:1247.49:supply: XXXXXXXXXX:Dell 7000 laptop/workstation
40:6.99:supply: XXXXXXXXXX:HDMI cable
49:212.06:util: XXXXXXXXXX:Duquesne Light
8:23.86:supply: XXXXXXXXXX:Practical Guide to Quant Finance Interviews
5:195.89:supply: XXXXXXXXXX:black toner, HP 304A, 2-pack
5:195.89:supply: XXXXXXXXXX:
22:86.00:travel: XXXXXXXXXX:mileage, drive to/from Big Inc., Reston, VA
18:32.27:meal: XXXXXXXXXX:lunch at Clyde's with Fred and Gina, Big Inc.
7:22.00:travel: XXXXXXXXXX:tolls
5:119.56:util: XXXXXXXXXX:Verizon Wireless
5:284.23:util: XXXXXXXXXX:Peoples Gas
5:8.98:supply: XXXXXXXXXX:Flair pens
5:8.98:supply:202325:Flair pens
5:22.95:supply: XXXXXXXXXX:Bic pens
4:149.95:travel: XXXXXXXXXX:Car rental
28:2245.25:supply:2022512:party
28::supply:2022512:party
18:77.75:meal: XXXXXXXXXX:Lunch with investors
7:950.15:travel: XXXXXXXXXX:flight to Chicago
5:22.95:supply: XXXXXXXXXX:Bic pens
4:149.95:travel: XXXXXXXXXX:Car rental
18:77.75:meal: XXXXXXXXXX:Lunch with investors
7:950.15:travel: XXXXXXXXXX:flight to Chicago
7:950.15:travel:220418:flight to Chicago
5:22.95:supply: XXXXXXXXXX:Bic pens
4:5.19:meal: XXXXXXXXXX:McDonalds
4:149.95:travel: XXXXXXXXXX:Car rental
18:77.75:meal: XXXXXXXXXX:Lunch with investors
18:77.75:: XXXXXXXXXX:Lunch with investors
7:950.15:travel: XXXXXXXXXX:flight to Chicago

Department:Location
Sales:New York
Office:New York
Research:San Francisco
Manufacturing:Pittsburgh
Marketing:Chicago
Answered 1 days After Nov 21, 2022

Solution

Baljit answered on Nov 23 2022
38 Votes
In [1]:
In [29]:
In [30]:
In [31]:
In [33]:
In [35]:
In [38]:
import pandas as pd
import json
from urllib.request import urlopen
from pandas.io.json import json_normalize
import sys
api = "https:
www.googleapis.com
ooks/v1/volumes?q=isbn/"
list1=["Python",'DataScience','DataAnalysis','MachineLearning','DeepLearning']
sps= urlopen(api + list1[0])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df1=pd.json_normalize (thedict['items'])
df1=df1[['volumeInfo.title','volumeInfo.authors']]
df1.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}

sps= urlopen(api + list1[1])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df2=pd.json_normalize (thedict['items'])
df2=df2[['volumeInfo.title','volumeInfo.authors']]
df2.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}

sps= urlopen(api + list1[2])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df3=pd.json_normalize (thedict['items'])
df3=df3[['volumeInfo.title','volumeInfo.authors']]
df3.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}

sps= urlopen(api + list1[3])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df4=pd.json_normalize (thedict['items'])
df4=df4[['volumeInfo.title','volumeInfo.authors']]
df4.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}

sps= urlopen(api + list1[4])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df5=pd.json_normalize (thedict['items'])
df5=df5[['volumeInfo.title','volumeInfo.authors']]
df5.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}

igTable=pd.concat([df1,df2,df3,df4,df5],ignore_index=True)
In [39]:
Out[39]:
Title Authors
0 Learning Python [Mark Lutz]
1 Regular Expressions Cookbook [Jan Goyvaerts, Steven Levithan]
2 The Python Book [Rob Mastrodomenico]
3 Python for Data Analysis [Wes McKinney]
4 Python Basics [Dan Bader, Joanna Jablonski, Fletcher Heisler]
5 The Quick Python Book [Vernon L. Ceder, Naomi R. Ceder]
6 Python Data Science Handbook [Jake VanderPlas]
7 Teaching and Learning in Further Education [Prue Huddleston, Lorna Unwin]
8 Deep Learning with Python [Francois Chollet]
9 Introduction to Programming Using Python, Stud... [David I. Schneider]
10 Data Science from Scratch [Joel Grus]
11 Python Data Science Handbook [Jake VanderPlas]
12 R for Data Science [Hadley Wickham, Ga
ett Grolemund]
13 Foundations of Data Science [Avrim Blum, John Hopcroft, Ravindran Kannan]
14 A Hands-On Introduction to Data Science [Chirag Shah]
15 Data Science [John D. Kelleher, Brendan Tierney]
16 Data Science for Business [Foster Provost, Tom Fawcett]
17 Introduction to Data Science and Machine Learning [Keshav Sud, Pakize Erdogmus, Seifedine Kadry]
18 Data Science Thinking [Longbing Cao]
19 Data Science on AWS [Chris Fregly, Antje Barth]
20 Python for Data Analysis [Wes McKinney]
21 Data Analysis [Siegmund Brandt]
22 Introduction to Statistics and Data Analysis [Christian Heumann, Michael Schomaker, Shalabh]
23 Data Analysis and Applications 1 [Christos H. Skiadas, James R. Bozeman]
24 Data Analysis for Business, Economics, and Policy [Gábor Békés, Gábor Kézdi]
25 ggplot2 [Hadley Wickham]
26 Bayesian Data Analysis, Third Edition [Andrew Gelman, John B. Carlin, Hal S. Stern, ...
27 Research Methods and Data Analysis for Busines... [James E. Sallis, Geir Gripsrud, Ulf Henning O...
28 Excel Data Analysis For...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here