Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Part B - MRJob and Hive with CSV (8 marks) In Part B your task is to answer a question about the data in a CSV file, first using MRJob, and then using Hive. By using both to answer the same question...

1 answer below »
Part B - MRJob and Hive with CSV (8 marks)
In Part B your task is to answer a question about the data in a CSV file, first using MRJob, and then using Hive. By using both to answer the same question about the same file you can more readily see how the two techniques compare.
When you click the panel on the right you'll get a connection to a server that has, in your home directory, a CSV file called "orders.csv", containing data about book orders (feel free to open the file and explore its contents).
Here are the fields in the file:
OrderDate (date)
ISBN (string)
Title (string)
Category (string)
PriceEach (decimal(5,2))
Quantity (integer)
FirstName (string)
LastName (string)
City (string)
Your task is to find the total dollar amount of orders for each city.
Your results should appear as the following:
ATLANTA XXXXXXXXXX
AUSTIN XXXXXXXXXX
BOISE XXXXXXXXXX
CHEYENNE XXXXXXXXXX
CHICAGO XXXXXXXXXX
CODY XXXXXXXXXX
EASTPOINT XXXXXXXXXX
KALMAZOO XXXXXXXXXX
MACON XXXXXXXXXX
MIAMI XXXXXXXXXX
MORRISTOWN XXXXXXXXXX
SEATTLE XXXXXXXXXX
TALLAHASSEE XXXXXXXXXX
TRENTON XXXXXXXXXX
(There is no need to sort the results or remove the quotation marks.)
First (4 marks)
Write a MRJob job to do this. A file called "job.py" has been created for you - you just need to fill in the details. You should be able to modify MRJob jobs that you have already seen in this week's content.
You can test your job by running the following command (it tells Python to execute job.py, using orders.csv as input):
$ python job.py orders.csv
Second (4 marks)
Write a Hive script to do this. A file called "script.hql" has been created for you - you just need to fill in the details. You should be able to modify Hive scripts that you have already seen in this week's content.
You can test your script by running the following command (it tells Hive to execute the commands contained in the file script.hql):
$ hive -f script.hql
Answered 2 days After Apr 09, 2021 University of Sydney

Solution

Kamal answered on Apr 11 2021
154 Votes
import csv
sum = 0
lcount = 0
with open('orders.csv') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
for col in csvreader:
print (col[8], col[4]);
lcount += 1
csvfile.close()
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here