Microsoft Word - Assessment-2-T2-2022.docx
SIT720 Machine Learning
Assessment Task
2
:
Problem solving task
.
SIT720 Machine Learning
Assessment Task
2
:
Problem solving task
.
Background
In end user perspective, travel and tourism is mostly explorative in nature and repetitive travels to same locations are minimal. So, travellers have to take decisions regarding their destinations and associated facilities to be consumed without adequate prior or personal knowledge. The best option available is to leverage social media and internet. Tourism recommenders are the best solutions in this scenario.
Dataset
Dataset file name: tripadvisor_review.csv
Dataset description: User’s average feedback
ating information on 10 categories of attractions in East Asia captured from tripadvisor.com. Dataset contains 980 user records with 10 feedback attributes infe
ed from numerous destination reviews.
_____________________________________________________________________________________
Questions
_____________________________________________________________________________________
1. In this dataset (tripadvisor_review.csv), we have traveller’s average feedback
ating information on 10 different categories of attraction. We are interested in finding optimal number of traveller groups based on their attraction ratings.
a. What method shall we use for solving this problem and why? (1 mark)
. Does this data suffer from curse of dimensionality? Explain. (1 mark)
c. Find out optimal number of traveller groups, report the outcome and justify your findings. (2 marks)
2. Implement two alternative solutions of Q1 (c). Compare and report the findings. (2+1=3 marks)
3. Evaluate quality of the groupings that you have reported as a solution of Q1 (c) and Q2. Based on the evaluation outcomes, report the best solution and explain the results. (3+2=5 marks)
4. Quantify and print the relationship among independent variables of this dataset (tripadvisor_review.csv). Calculate two collective variables that represent the same dataset. Create a two-dimensional plot to display the relationship between these new variables and explain the plot XXXXXXXXXX=5 marks)
5. Is there any loss of information due to the transformation performed in Q4? Explain your answer with evidence. (3 marks)
6. Principal component analysis applied on a given dataset, and the percentage of variance for the first N components is X%. How is this percentage of variance computed? (2 marks)
Following questions are D & HD level tasks. You have to do your own research explore cu
ent literatures and solutions for answering this question.
7. Apply component factor- and projection-based dimensionality reduction approaches on the given dataset (tripadvisor_review.csv) for creating three collective variables. Does this new feature space improve the grouping of travellers compared to original dataset? Present your results with appropriate evidences. (3 marks)
8. Let’s consider the data shown in the Figure 1 (see next page).
a. Is it possible to obtain the cluster shown in the figure by k-means clustering (k = 6)? Provide evidence including code and explanation to justify your findings. (2 marks)
. Explore state-of-the-art clustering methods (explore recent research articles) that can produce better results than k-means for this problem? Describe the selected approach, evaluate performance and report your findings. (3 marks)
Figure 1
:
Sca
tter plot with expected clusters (
elliptical shapes)
©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720
©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720