Background and Context
You are a Data Scientist for a tourism company named "Visit with us". The Policy Maker of the company wants to enable and establish a viable business model to expand the customer base.
A viable business model is a central concept that helps you to understand the existing ways of doing the business and how to change the ways for the benefit of the tourism sector.
One of the ways to expand the customer base is to introduce a new offering of packages.
Currently, there are 5 types of packages the company is offering - Basic, Standard, Deluxe, Super Deluxe, King. Looking at the data of the last year, we observed that 18% of the customers purchased the packages.
The company in the last campaign contacted the customers at random without looking at the available information. However, this time companyis now planning to launch a new product i.e. Wellness Tourism Package. Wellness Tourism is defined as Travel that allows the traveler to maintain, enhance or kick-start a healthy lifestyle, and support or increase one's sense of well-being, and wants to harness the available data of existing and potential customers to make the marketing expenditure more efficient.
You as a Data Scientist at "Visit with us" travel company has to analyze the customers' data and information to provide recommendations to the Policy Maker and Marketing Team and also build a model to predict the potential customer who is going to purchase the newly introduced travel package.
Objective
To predict which customer ismore likely to purchase thenewly introducedtravel package.
Data Dictionary
Customer details:
- CustomerID: Unique customer ID
- ProdTaken:Whether the customer has purchased a package or not (0: No, 1: Yes)
- Age: Age of customer
- TypeofContact: How customer was contacted (Company Invited or Self Inquiry)
- CityTier:City tier depends on the development of a city, population, facilities, and living standards. The categories are ordered i.e. Tier 1 > Tier 2 > Tier 3
- Occupation: Occupation of customer
- Gender: Gender of customer
- NumberOfPersonVisiting: Total number of persons planning to take the trip with the customer
- PreferredPropertyStar: Preferred hotel property rating by customer
- MaritalStatus: Marital status of customer
- NumberOfTrips:Average number of trips in a year by customer
- Passport:The customer has a passport or not (0: No, 1: Yes)
- OwnCar:Whether the customers own a car or not (0: No, 1: Yes)
- NumberOfChildrenVisiting:Total number of children with age less than 5 planning to take the trip with the customer
- Designation:Designation of the customer in the current organization
- MonthlyIncome: Gross monthly income of the customer
Customerinteraction data:
- PitchSatisfactionScore:Sales pitch satisfaction score
- ProductPitched: Product pitched by the salesperson
- NumberOfFollowups:Total number of follow-ups has been done by the salesperson after the sales pitch
- DurationOfPitch:Duration of the pitch by a salesperson to the customer
Note:
Please note XGBoost can take a significantly longer time to run, so if you have time complexity issues then you can avoid tuning XGBoost. No marks will be deducted if XGBoost tuning is not attempted.
Best Practices for Notebook :
- The notebook should be well-documented, with inline comments explaining the functionality of code and markdown cells containing comments on the observations and insights.
- The notebook should be run from start to finish in a sequential manner before submission.
- It is preferable to remove all warnings and errors before submission.
- The notebook should be submitted as an HTML file (.html) and NOT as a notebook file (.ipynb)
Best Practices for Presentation :
Like in real-world projects, the ultimate destination of any project or work is generally an executive or decision-making meeting, where you are supposed to present your solution to the business problem, based on the project/work you have done. The purpose of this presentation is to simulate that kind of experience and to draw the attention of your audience (a business leader like CMO, COO, CFO, or CEO) to the key points of your project, which are
- Business Overview of the problem and solution approach
- Key findings and insights which can drive business decisions
- Model overview and performance summary
- Business recommendations
Please keep the following points in mind while making the presentation:
- Focus on explaining the takeaways in an easy-to-understand manner.
- Inclusion of the potential benefits of implementing the solution will give you the edge.
- Copying and pasting from the notebook is not a good idea, and it is better to avoid showing codes unless they are the focal point of your presentation.
- Please submit the presentation in PDF format only.
Submission Guidelines :
- There are two parts to the submission:
- A well commented Jupyter notebook [format - .html]
- A presentation as you would present to the top management/business leaders [format - .pdf](you have to export/save the .pptx file as .pdf)
- Any assignment found copied/ plagiarized with other groups will not be graded and awarded zero marks
- Please ensure timely submission as any submission post-deadlinewill not be accepted for evaluation
- Kindly refer to the assessment rubric andmake sure you check the detailsof every sectionto get a better understanding of the expectations in this project.
- Submission will not be evaluated if,
- it is submitted post-deadline, or,
- more than 2 files are submitted
Scoring guide (Rubric) -Travel Package Purchase Prediction
Criteria | Points |
---|
Perform an Exploratory Data Analysis on the data- Univariate analysis - Bivariate analysis - Use appropriate visualizations to identify the patterns and insights - Come up with a customer profile (characteristics of a customer) of the different packages - Any other exploratory deep dive | 7 |
---|
Illustrate the insights based on EDAKey meaningful observations on individual variables and the relationship between variables | 4 |
---|
Data Pre-processing- Prepare the data for analysis - Missing value Treatment, - Outlier Detection(treat, if needed- why or why not ), - Feature Engineering, - Prepare data for modeling | 7 |
---|
Model building - Bagging- Build Bagging classifier, Random Forest, and Decision Tree. - Comment on model performance | 4 |
---|
Model performance improvement - Bagging- Comment on which metric is right for model performance evaluation and why? - Comment on the model performance after tuning the Decision Tree, Bagging, and Random Forest classifier to improve the model performance. | 8 |
---|
Model building - Boosting- Build Adaboost, GradientBoost, XGBoost, and Stacking classifiers - Comment on model performance | 6 |
---|
Model performance improvement - Boosting- Comment on which metric is right for model performance evaluation and why? - Comment on the model performance after tuning the AdaBoost, and Gradient Boosting classifier on the appropriate metric to improve the model performance. * Please note XGBoost can take a significantly longer time to run, so if you have time complexity issues then you can avoid tuning XGBoost. | 6 |
---|
Actionable Insights & Recommendations- Compare model performance on various metrics. - Conclude with the key takeaways - What would your advice be to grow the business? | 6 |
---|
Presentation - Overall Quality- Structure and flow - Crispness - Visual appeal - Key insights and recommendations | 8 |
---|
Notebook - Overall- Structure and flow - Well commented code | 4 |
---|
|