Problem 1: Clustering
A leading bank wants to develop a customer segmentation to give
promotional offers to its customers. They collected a sample that
summarizes the activities of users during the past few months. You are
given the task to identify the segments based on credit card usage.
1.1 Read the data, do the necessary
initial steps, and exploratory data analysis (Univariate, Bi-variate,
and multivariate analysis).
1.2 Do you think scaling is necessary for clustering in this case? Justify
1.3 Apply hierarchical clustering to scaled data.
Identify the number of optimum clusters using Dendrogram and briefly
describe them
1.4 Apply K-Means clustering on scaled data and determine optimum clusters.Apply elbow curve and silhouette score. Explain the results properly.Interpret and write inferences on the finalized clusters.
1.5 Describe cluster profiles for the clusters defined. Recommend different promotional strategies for different clusters.
Dataset for Problem 1:bank_marketing_part1_Data.csv
Data Dictionary for Market Segmentation:
- spending: Amount spent by the customer per month (in 1000s)
- advance_payments: Amount paid by the customer in advance by cash (in 100s)
- probability_of_full_payment: Probability of payment done in full by the customer to the bank
- current_balance: Balance amount left in the account to make purchases (in 1000s)
- credit_limit: Limit of the amount in credit card (10000s)
- min_payment_amt : minimum paid by the customer while making payments for purchases made monthly (in 100s)
- max_spent_in_single_shopping: Maximum amount spent in one purchase (in 1000s)
Problem 2: CART-RF-ANN
An Insurance firm providing tour insurance is facing higher claim
frequency. The management decides to collect data from the past few
years. You are assigned the task to make a model which predicts the
claim status and provide recommendations to management. Use CART, RF
& ANN and compare the models' performances in train and test sets.
2.1 Read the data, do the necessary initial
steps, and exploratory data analysis (Univariate, Bi-variate, and
multivariate analysis).
2.2 Data Split: Split the data into test and train, build classification model CART, Random Forest, Artificial Neural Network
2.3
Performance Metrics: Comment and Check the performance of Predictions
on Train and Test sets using Accuracy, Confusion Matrix, Plot ROC curve
and get ROC_AUC score,classification reports for each model.
2.4 Final Model: Compare all the models and write an inference which model is best/optimized.
2.5 Inference: Based on the whole Analysis, what are the business insights and recommendations
Dataset for Problem 2:insurance_part2_data-1.csv
Attribute Information:
1. Target: Claim Status (Claimed)
2. Code of tour firm (Agency_Code)
3. Type of tour insurance firms (Type)
4. Distribution channel of tour insurance agencies (Channel)
5. Name of the tour insurance products (Product)
6. Duration of the tour (Duration in days)
7. Destination of the tour (Destination)
8. Amount worth of sales per customer in procuring tour insurance policies in rupees (in 100’s)
9. The commission received for tour insurance firm (Commission is in percentage of sales)
10.Age of insured (Age)