Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Column Value Description customer_id string ID of the customer - super duper hashed days_since_first_order integer Days since the first order was made days_since_last_order integer Days since the last...

2 answer below »

Column Value Description
customer_id string ID of the customer - super duper hashed
days_since_first_order integer Days since the first order was made
days_since_last_order integer Days since the last order was made
is_newsletter_subscriber string Flag for a newsletter subscriber
orders integer Number of orders
items integer Number of items
cancels integer Number of cancellations - when the order is cancelled after being placed
eturns integer Number of returned orders
different_addresses integer Number of times a different billing and shipping address was used
shipping_addresses integer Number of different shipping addresses used
devices integer Number of unique devices used
vouchers integer Number of times a voucher was applied
cc_payments integer Number of times a credit card was used for payment
paypal_payments integer Number of times PayPal was used for payment
afterpay_payments integer Number of times AfterPay was used for payment
apple_payments integer Number of times Apple Pay was used for payment
female_items integer Number of female items purchased
male_items integer Number of male items purchased
unisex_items integer Number of unisex items purchased
wapp_items integer Number of Women Apparel items purchased
wftw_items integer Number of Women Footwear items purchased
mapp_items integer Number of Men Apparel items purchased
wacc_items integer Number of Women Accessories items purchased
macc_items integer Number of Men Accessories items purchased
mftw_items integer Number of Men Footwear items purchased
wspt_items integer Number of Women Sport items purchased
mspt_items integer Number of Men Sport items purchased
curvy_items integer Number of Curvy items purchased
sacc_items integer Number of Sport Accessories items purchased
msite_orders integer Number of Mobile Site orders
desktop_orders integer Number of Desktop orders
android_orders integer Number of Android app orders
ios_orders integer Number of iOS app orders
other_device_orders integer Number of Other device orders
work_orders integer Number of orders shipped to work
home_orders integer Number of orders shipped to home
parcelpoint_orders integer Number of orders shipped to a parcelpoint
other_collection_ordersinteger Number of orders shipped to other collection points
average_discount_onoffer float Average discount rate of items typically purchased
average_discount_used float Average discount finally used on top of existing discount
evenue float $ Dollar spent overall per person

What to Submit?
Task2.ipynbThe completed notebook (one for each group) with all the run-able code on all requirements. In general, you need to complete, save the results of running, download and submit your notebook from Python platform such as Google Colab. You need to clearly list the answer for each question, with sufficient coding comments, and the expected format from your notebook will be like in Figure 1.
Task2Report.pdf You (group) are also required to put your answer (code) and running results
from SIT742Task2.ipynb into a pdf as the report for your task2 assignment (copy the code and
paste into the report, the code format such as Indentation should be same in the ipynb notebook).
In this report (one for each group), you will need to include the questions for the assignment for both
Part 1 and Part 2. Also you will need to provide a clear explanation on your logic for solving each
question. In the explanation, you will need to cover below parts: 1). why you decide to choose you
solution; 2). are there any other solutions that could solve the question; 3). whether your solution is
the optimal or not? why? The length of the explanation part for each question is limited below 100
words.
Link to data-
https:
aw.githubusercontent.com/tulipla
sit742/develop/Assessment/2022/data/assignment2data.json
Question 1
Open the assignment2data.json file and convert it to csv format as dataframe in pandas. Removing
the duplicated rows from dataframe and save as the new dataframe. The meaning of the column is
in assignment2data.pdf
Create some new features for the dataframe by using below code:
df [ ’ female_item_rate ’ ] = df [ ’ female_items ’ ] / df [ ’ items ’ ]
df [ ’ male_item_rate ’ ] = df [ ’ male_items ’ ] / df [ ’ items ’ ]
df [ ’ unisex_items_rate ’ ] = df [ ’ unisex_items ’ ] / df [ ’ items ’ ]
• Write a code find out how many rows (customers) could have the value female_item_rate == 1
and the value male_item_rate == 1 and the value orders > 4:11
Question 2
Open the assignment2data.json file and convert it to csv format as dataframe in pandas. Removing
the duplicated rows from dataframe and save as the new dataframe. The meaning of the column is
in assignment2data.pdf
In this question, you will use the original format of the data to group data on the value of column
is_newsletter_subscriber to show the average order value, the max order value, the median orde
value.
Question 3
Transaction Data Analysis
In this part, we will do the analysis on the customer transaction data. The data is from customer transaction.(link to data set-
https:
github.com/tulip-la
sit742
lo
develop/Assessment/2022/data/customer_transaction.csv)
The row of the data represents the item transaction from customer (one item from a
transaction for that customer). The product is represented as the product_id and the commodity.
There is also a column basket_id to help group the transaction together into basket level (check out
asket).
Question 3.1
You will need to group the customer_id and basket_id to find out the product commodity in each
asket. Then you will need to answer:
• How many transactions based on basket level? what is the average basket size?
• What is the most popular product commodity (based on the frequency of the purchase)?
• What is the average of the total transaction price (average basket total price) for each customer?
• You will need to transform the data into a format of: the row represent the basket, the column
will be all product commodity, the value of the column should indicate whether the basket
contains particular product commodity. Name this new dataframe as transaction_product
• You will need to transform the data into a format of: the row represent the unique customer,
the column will be all product commodity, the value of the column should be the frequency
of the purchase on the particular commodity cross entire data. Name this new dataframe as
customer_product_freq
• Using the customer_product_freq to find the top 5 similar customers for each customer.
(Check out the KNN)
Question 3.2
Using the dataframe transaction_product to conduct association rule analysis (you are recommended
to use mlxtend package). You will need to find out:
• The itemsets(basket) having length more than 1 and minimum support of 5%
• The association rules with minimum support of 2% and having lift more than 1.
The definition of the support and lift is in M05E, lecture slides and also Association rule learning.
Answered 8 days After Sep 15, 2022

Solution

Raavikant answered on Sep 23 2022
59 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here