Microsoft Word - homework4.docxData Focused Python XXXXXXXXXXHomework 4 Due: 11:59 PM Sunday,...

Question

Microsoft Word - homework4.docxData Focused Python XXXXXXXXXXHomework 4 Due: 11:59 PM Sunday, November 27, 2022  This is an individual assignment.  Problem 1 a. Read the file 'expenses.txt' into a DataFrame named expenses – the data is separated by a colon ( : ).  The first line contains column headers. Display expenses  Read the file 'people.txt’, also colon-separated with column headers in the first line, into DataFrame people. Display people.  Read the file ‘departments.txt’ into DataFrame departments – same formatting. Display departments.  . Clean people by removing any row with null data – display the rows removed (hint: use a variant of isna() ). Then remove anyone whose department is not in departments (recall the isin( ) method from Lab 7) – again display the rows removed. Display the cleaned version of people.  c. Clean expenses by removing any row with null data – display rows removed. Then remove rows whose id is not in people – display rows removed. Then remove rows with a malformed date field (only check for the coect number of digits, 8 – use a lambda and apply() ) – display rows removed. Display the cleaned version of expenses.  d. Merge expenses and people on the ‘ID’ column using an inner join – name the result alldata. Sort alldata on ‘ID’. Create sums by grouping the rows (use groupby( ) ) on ‘ID’ and then use agg(‘sum’) to get the total of each person’s expenses. Display sums.  Problem 2 Use the expenses and people DataFrames from Problem 1 to display the following graphs. Make sure you label the graphs and axes coectly. Use the pandas functions groupby( ) with the column you need, together with either count( ) or  sum( ) – as in, groupby().sum(). Also, use the groupby parameter as_index=False to store the grouping column as an actual column instead of as the index.  The following graph shows the idea, although the values may not be coect (it's sample data, and I was lazy about labels):         a. Bar chart containing the departments and the number of people employed in each. . Bar chart containing the id's and the total expense amount for each id. c. Bar chart containing the expense categories and the total expense amount for each category. d. Pie chart containing the same information as #c.  Read the file “countryData.csv” into a DataFrame named countryData; this is a version of the GDP – Olympic medals data from Lab 7. Use countryData to create regression plots for the following using Seaborn’s regplot. Compute the coelation coefficient and use it as an annotation on the graph. e. Population against GDP f. GDP against Weighted g. Formula against Total h. Formula against Weighted JTWProblem 3 Write a script by copying the relevant parts of the Lab 8 code and modify it to do these things: a. Create a list with these topic strings: Python; Data Science; Data Analysis; Machine Learning; and Deep Learning. Use these topics, one at a time, to query the Google Books API. For each returned JSON string: . Convert the JSON string to a dict using loads( ) (as in the lab), then use this to convert it to a DataFrame: pd.io.json.json_normalize ( thedict['items'] ) c. Extract just the 'volumeInfo.title' and 'volumeInfo.authors' columns. d. Relabel those two columns as 'Title' and 'Authors'. After creating the five DataFrame objects, use concat( ) to create one table called bigTable (use ignore_index=True). The function takes a list of the DataFrames to concatenate (i.e., in [ ]'s). e. Display bigTable. f. Re-display bigTable in the following way, using regular Python to display data extracted from bigTable. Create the table headers (left justified), then use a for loop over bigTable.index, which will count on the index number starting at 0. Display at most 25 characters of the title (just use [:25], even if the title has fewer characters and only the first author. It should look something like this (your data may vary):     ID:Amount:Category:Date:Description5:5.25:supply: XXXXXXXXXX:box of staples7:79.81:meal: XXXXXXXXXX:lunch with ABC Corp. clients Al, Bob, and Cy4:43.00:travel: XXXXXXXXXX:cab back to office4:383.75:travel: XXXXXXXXXX:flight to Boston, to visit ABC Corp.22:55.00:travel: XXXXXXXXXX:cab to ABC Corp. in Camidge, MA17:23.25:meal: XXXXXXXXXX:dinner at Logan Airport5:318.47:supply: XXXXXXXXXX:paper, toner, pens, paperclips, tape22:142.12:meal: XXXXXXXXXX:host dinner with ABC clients, Al, Bob, Cy, Dave, Ellie20:20.20::20:20.20:::49:303.94:util: XXXXXXXXXX:Peoples Gas49:121.07:util: XXXXXXXXXX:Verizon Wireless8:7.59:supply: XXXXXXXXXX:Python book (used)8:79.99:supply: XXXXXXXXXX:spare 20" monito13:49.86:supply: XXXXXXXXXX:Stoch Cal for Finance II7:6.53:meal: XXXXXXXXXX:Dunkin Donuts, drive to Big Inc. near DC7:127.23:meal: XXXXXXXXXX:dinner, Tavern6422:33.07:meal: XXXXXXXXXX:dinner, Uncle Julio's7:86.00:travel: XXXXXXXXXX:mileage, drive to/from Big Inc., Reston, VA7::travel: XXXXXXXXXX:mileage, drive to/from Big Inc., Reston, VA50:22.00:travel: XXXXXXXXXX:tolls7:378.81:travel: XXXXXXXXXX:Hyatt Hotel, Reston VA, for Big Inc. meeting8:1247.49:supply: XXXXXXXXXX:Dell 7000 laptop/workstation40:6.99:supply: XXXXXXXXXX:HDMI cable49:212.06:util: XXXXXXXXXX:Duquesne Light8:23.86:supply: XXXXXXXXXX:Practical Guide to Quant Finance Interviews5:195.89:supply: XXXXXXXXXX:black toner, HP 304A, 2-pack5:195.89:supply: XXXXXXXXXX:22:86.00:travel: XXXXXXXXXX:mileage, drive to/from Big Inc., Reston, VA18:32.27:meal: XXXXXXXXXX:lunch at Clyde's with Fred and Gina, Big Inc.7:22.00:travel: XXXXXXXXXX:tolls5:119.56:util: XXXXXXXXXX:Verizon Wireless5:284.23:util: XXXXXXXXXX:Peoples Gas5:8.98:supply: XXXXXXXXXX:Flair pens5:8.98:supply:202325:Flair pens5:22.95:supply: XXXXXXXXXX:Bic pens4:149.95:travel: XXXXXXXXXX:Car rental28:2245.25:supply:2022512:party28::supply:2022512:party18:77.75:meal: XXXXXXXXXX:Lunch with investors7:950.15:travel: XXXXXXXXXX:flight to Chicago5:22.95:supply: XXXXXXXXXX:Bic pens4:149.95:travel: XXXXXXXXXX:Car rental18:77.75:meal: XXXXXXXXXX:Lunch with investors7:950.15:travel: XXXXXXXXXX:flight to Chicago7:950.15:travel:220418:flight to Chicago5:22.95:supply: XXXXXXXXXX:Bic pens4:5.19:meal: XXXXXXXXXX:McDonalds4:149.95:travel: XXXXXXXXXX:Car rental18:77.75:meal: XXXXXXXXXX:Lunch with investors18:77.75:: XXXXXXXXXX:Lunch with investors7:950.15:travel: XXXXXXXXXX:flight to Chicago Department:LocationSales:New YorkOffice:New YorkResearch:San FranciscoManufacturing:PittsburghMarketing:Chicago

Baljit · Accepted Answer

In [1]:
In [29]:
In [30]:
In [31]:
In [33]:
In [35]:
In [38]:
import pandas as pd
import json
from urllib.request import urlopen
from pandas.io.json import json_normalize
import sys
api = "https://www.googleapis.com/books/v1/volumes?q=isbn/"
list1=["Python",'DataScience','DataAnalysis','MachineLearning','DeepLearning']
rsps= urlopen(api + list1[0])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df1=pd.json_normalize (thedict['items'])
df1=df1[['volumeInfo.title','volumeInfo.authors']]
df1.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}
 
rsps= urlopen(api + list1[1])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df2=pd.json_normalize (thedict['items'])
df2=df2[['volumeInfo.title','volumeInfo.authors']]
df2.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}
 
rsps= urlopen(api + list1[2])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df3=pd.json_normalize (thedict['items'])
df3=df3[['volumeInfo.title','volumeInfo.authors']]
df3.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}
 
rsps= urlopen(api + list1[3])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df4=pd.json_normalize (thedict['items'])
df4=df4[['volumeInfo.title','volumeInfo.authors']]
df4.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}
 
rsps= urlopen(api + list1[4])
items=rsps.read().decode('utf-8')
thedict= json.loads(items)
df5=pd.json_normalize (thedict['items'])
df5=df5[['volumeInfo.title','volumeInfo.authors']]
df5.rename(columns = {'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}
 
bigTable=pd.concat([df1,df2,df3,df4,df5],ignore_index=True)
In [39]:
Out[39]:
Title Authors
0 Learning Python [Mark Lutz]
1 Regular Expressions Cookbook [Jan Goyvaerts, Steven Levithan]
2 The Python Book [Rob Mastrodomenico]
3 Python for Data Analysis [Wes McKinney]
4 Python Basics [Dan Bader, Joanna Jablonski, Fletcher Heisler]
5 The Quick Python Book [Vernon L. Ceder, Naomi R. Ceder]
6 Python Data Science Handbook [Jake VanderPlas]
7 Teaching and Learning in Further Education [Prue Huddleston, Lorna Unwin]
8 Deep Learning with Python [Francois Chollet]
9 Introduction to Programming Using Python, Stud... [David I. Schneider]
10 Data Science from Scratch [Joel Grus]
11 Python Data Science Handbook [Jake VanderPlas]
12 R for Data Science [Hadley Wickham, Garrett Grolemund]
13 Foundations of Data Science [Avrim Blum, John Hopcroft, Ravindran Kannan]
14 A Hands-On Introduction to Data Science [Chirag Shah]
15 Data Science [John D. Kelleher, Brendan Tierney]
16 Data Science for Business [Foster Provost, Tom Fawcett]
17 Introduction to Data Science and Machine Learning [Keshav Sud, Pakize Erdogmus, Seifedine Kadry]
18 Data Science Thinking [Longbing Cao]
19 Data Science on AWS [Chris Fregly, Antje Barth]
20 Python for Data Analysis [Wes McKinney]
21 Data Analysis [Siegmund Brandt]
22 Introduction to Statistics and Data Analysis [Christian Heumann, Michael Schomaker, Shalabh]
23 Data Analysis and Applications 1 [Christos H. Skiadas, James R. Bozeman]
24 Data Analysis for Business, Economics, and Policy [Gábor Békés, Gábor Kézdi]
25 ggplot2 [Hadley Wickham]
26 Bayesian Data Analysis, Third Edition [Andrew Gelman, John B. Carlin, Hal S. Stern, ...
27 Research Methods and Data Analysis for Busines... [James E. Sallis,

Microsoft Word - homework4.docx Data Focused Python XXXXXXXXXXHomework 4 Due: 11:59 PM Sunday, November 27, 2022 This is an individual assignment. Problem 1 a. Read the file 'expenses.txt'...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment