Microsoft Word - Assessment-2-T2-2022.docxSIT720 Machine Learning Assessment Task 2: Problem...

Question

Microsoft Word - Assessment-2-T2-2022.docx
SIT720 Machine Learning

Assessment Task
2
:

Problem solving task
.

SIT720 Machine Learning

Assessment Task
2
:

Problem solving task
.

Background
In end user perspective, travel and tourism is mostly explorative in nature and repetitive travels to same locations are minimal. So, travellers have to take decisions regarding their destinations and associated facilities to be consumed without adequate prior or personal knowledge. The best option available is to leverage social media and internet. Tourism recommenders are the best solutions in this scenario.
Dataset
Dataset file name: tripadvisor_review.csv
Dataset description: User’s average feedback
ating information on 10 categories of attractions in East Asia captured from tripadvisor.com. Dataset contains 980 user records with 10 feedback attributes infe
ed from numerous destination reviews.
_____________________________________________________________________________________
Questions
_____________________________________________________________________________________
1. In this dataset (tripadvisor_review.csv), we have traveller’s average feedback
ating information on 10 different categories of attraction. We are interested in finding optimal number of traveller groups based on their attraction ratings.
a. What method shall we use for solving this problem and why? (1 mark)
. Does this data suffer from curse of dimensionality? Explain. (1 mark)
c. Find out optimal number of traveller groups, report the outcome and justify your findings. (2 marks)

2. Implement two alternative solutions of Q1 (c). Compare and report the findings. (2+1=3 marks)

3. Evaluate quality of the groupings that you have reported as a solution of Q1 (c) and Q2. Based on the evaluation outcomes, report the best solution and explain the results. (3+2=5 marks)

4. Quantify and print the relationship among independent variables of this dataset (tripadvisor_review.csv). Calculate two collective variables that represent the same dataset. Create a two-dimensional plot to display the relationship between these new variables and explain the plot XXXXXXXXXX=5 marks)

5. Is there any loss of information due to the transformation performed in Q4? Explain your answer with evidence. (3 marks)

6. Principal component analysis applied on a given dataset, and the percentage of variance for the first N components is X%. How is this percentage of variance computed? (2 marks)

Following questions are D & HD level tasks. You have to do your own research explore cu
ent literatures and solutions for answering this question.

7. Apply component factor- and projection-based dimensionality reduction approaches on the given dataset (tripadvisor_review.csv) for creating three collective variables. Does this new feature space improve the grouping of travellers compared to original dataset? Present your results with appropriate evidences. (3 marks)

8. Let’s consider the data shown in the Figure 1 (see next page).
a. Is it possible to obtain the cluster shown in the figure by k-means clustering (k = 6)? Provide evidence including code and explanation to justify your findings. (2 marks)
. Explore state-of-the-art clustering methods (explore recent research articles) that can produce better results than k-means for this problem? Describe the selected approach, evaluate performance and report your findings. (3 marks)

Figure 1
:
Sca
tter plot with expected clusters (
elliptical shapes)

©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720
©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720

assessment-2-t2-2022-yypalhb3.docx tripadvisorreview-nfpedck2.csv

Chirag · Accepted Answer

Question1.
a) The silhouette analysis can be used to solve this issue and find the optimal number of clusters. The silhouette analysis in cluster analysis is used to determine how many clusters there are in a set of data. The silhouette score measures how effectively samples are clustered with other samples that are similar to them in order to assess the quality of clusters produced by clustering algorithms like K-Means. Each sample of various clusters receives a Silhouette score.
b)  No, this data is not cursed by dimensionality because there aren't many variables.
c) The optimal number of groups are 2 as found by silhouette analysis.
As in the graph of silhouette analysis no wide fluctuations in the size of the silhouette plots were seen when n=2.
Question2-
The elbow approach, one of the techniques, recommends the existence of  3 groups.
second method is Silhouette Coefficient:
Based on silhouette score, the optimal number of clusters is 2.
Question3
Grouping has been completed for both solutions. One group uses the elbow approach to divide the data into three groups, and another uses the silhouette score to divide the data into two clusters.
Output based on silhoutte method – 
This demonstrates that both groups can be distinguished, despite the fact that there has been some merging between the two.
Below is the grouping based on elbow method-
This data is divided into three clusters, but as can be seen, the clustering cannot be distinguished, and points in clusters blue and green are overlapping.
This analysis demonstrates that the silhouette method is superior to the elbow method for determining the ideal number of clusters.
Question4 –
The correlation coefficients of several variables are listed in a table called a correlation matrix. Every potential relationship between two table values is shown in the matrix. It is an effective tool for analysing large datasets, finding patterns in the data, and visualising those patterns.

Microsoft Word - Assessment-2-T2-2022.docx SIT720 Machine Learning Assessment Task 2 : Problem solving task . SIT720 Machine Learning Assessment Task 2 : Problem solving task . Background In end user...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment