Assignment 03_EN Cluster Analysis
T 03
MSc Business Administration
Research Methods II
Quantitative Research Methods
Applied Data Analysis (with SPSS)
Tasks Assignment 03: Cluster Analysis
Prof. Dr. JÃ¼rg Schwarz Carlota de Miquel
Dr. Heidi Bruderer Enzler Viviane Pfluger
March 2018 XXXXXXXXXX
Task 01: Conducting a Cluster Analysis with SPSS____________________________ 1
Task ............................................................................................................................ XXXXXXXXXX1
Syntax ......................................................................................................................... XXXXXXXXXX1
Step 3: Determining the Number of Clusters ............................................................. XXXXXXXXXX3
Step 4: Display and Save Cluster Membership .......................................................... XXXXXXXXXX4
Step 5: Interpretation of Clusters ................................................................................ XXXXXXXXXX5
Conclusion .................................................................................................................. XXXXXXXXXX5
Additional Information ............................................................................................... XXXXXXXXXX6
Task 02: Clustering Cities around the Globe _________________________________ 7
Divertimento ___________________________________________________________ 8
Appendix ______________________________________________________________ 9
Explanation of the Standardization of Variables in Cluster Analysis ......................... XXXXXXXXXX9
Cluster Analysis with Standardization in SPSS .......................................................... XXXXXXXXXX10
Assignment 03_EN_2018 Cluster Analysis, p. 1
Task 01: Conducting a Cluster Analysis with SPSS
Task
Data set: Data_03a.sav
Create clusters for 15 EU countries based on energy consumption (energy) and the gross domestic
product (gdp).
Do the analysis twice, once using Between-Groups Linkage and once using Ward's method. Compare the
esults.
How many clusters are there? How would you interpret the content of the clusters?
Hints:
â€¢ Ca
y out a cluster analysis with standardization of the variables. An explanation on why and how to
do so can be found in the appendix to this assignment.
â€¢ A scatterplot is useful for interpreting the content when it shows the clusters in color.
Syntax
In this solution, the values are standardized by SPSS during the clustering process.
****** Method 1: Between-groups Linkage ******.
DATASET DECLARE D XXXXXXXXXX.
PROXIMITIES energy gdp
/MATRIX OUT(D XXXXXXXXXX)
/VIEW=CASE
/MEASURE=SEUCLID
/PRINT NONE
/ID=country
/STANDARDIZE=VARIABLE Z.
CLUSTER
/MATRIX IN(D XXXXXXXXXX)
/METHOD BAVERAGE
/ID=country
/PRINT SCHEDULE CLUSTER(2,5)
/PRINT DISTANCE
/PLOT DENDROGRAM VICICLE
/SAVE CLUSTER(2,5).
Dataset Close D XXXXXXXXXX.
* Scatter plot.
Assignment 03_EN_2018 Cluster Analysis, p. 2
GRAPH
/SCATTERPLOT(BIVAR)= energy WITH gdp BY CLU2_1 BY country (NAME)
/MISSING=LISTWISE.
* Mean values of cluster variables by cluster.
SORT CASES BY CLU2_1.
SPLIT FILE SEPARATE BY CLU2_1.
FREQUENCIES VARIABLES=energy gdp
/FORMAT=NOTABLE
/STATISTICS=MEAN
/ORDER=ANALYSIS.
SPLIT FILE OFF.
****** Method 2: Ward's Method ******.
DATASET DECLARE D XXXXXXXXXX.
PROXIMITIES energy gdp
/MATRIX OUT(D XXXXXXXXXX)
/VIEW=CASE
/MEASURE=SEUCLID
/PRINT NONE
/ID=country
/STANDARDIZE=VARIABLE Z.
CLUSTER
/MATRIX IN(D XXXXXXXXXX)
/METHOD WARD
/ID=country
/PRINT SCHEDULE CLUSTER(2,5)
/PRINT DISTANCE
/PLOT DENDROGRAM VICICLE
/SAVE CLUSTER(2,5).
Dataset Close D XXXXXXXXXX.
* Scatter plot.
GRAPH
/SCATTERPLOT(BIVAR)= energy WITH gdp BY CLU2_2 BY country (NAME)
/MISSING=LISTWISE.
* Mean values of cluster variables by cluster.
SORT CASES BY CLU2_2.
SPLIT FILE SEPARATE BY CLU2_2.
FREQUENCIES VARIABLES=energy gdp
/FORMAT=NOTABLE
/STATISTICS=MEAN
/ORDER=ANALYSIS.
SPLIT FILE OFF.
Assignment 03_EN_2018 Cluster Analysis, p. 3
Step 3: Determining the Number of Clusters
In both dendrograms below, the largest increase in heterogeneity is found when stepping from a two-
cluster to a one-cluster solution (red a
ows). This suggests a two-cluster solution.
Between-groups Linkage
Ward's Method
Between-groups Linkage
All countries are in one and the same cluster, except
for "L" (Luxembourg).
Ward's Method
The grouping is different. One difference is that "L"
is grouped with other cases in the second last step.
Assignment 03_EN_2018 Cluster Analysis, p. 4
Step 4: Display and Save Cluster Membership
As it was decided to opt for two clusters, only the column 2 Clusters is considered. The other solutions
(= the other columns) are i
elevant at this moment. (In case of between-groups linkage, it would also
have been possible to decide to look at the three-cluster solution â€“ column 3 Clusters â€“ instead.)
Between-groups Linkage
Ward's Method
The scatter plots illustrate that Luxembourg may be an outlier.
Assignment 03_EN_2018 Cluster Analysis, p. 5
Step 5: Interpretation of Clusters
Between-groups Linkage
Cluster 1 (14 countries):
Cluster 2 (Luxembourg):
Ward's Method
Cluster 1 (9 countries):
Cluster 2 (6 countries):
Luxembourg is in a separate cluster. Its GDP as
well as its energy consumption are higher than
in the other countries.
The question remains whether this solution is
useful. See overall comment at the end of this
task.
Cluster 1: B, DK, D, FIN, F, L, NL, A, S
â†’ Northern Europe. A look at the mean
values reveals a higher GDP as well as a
higher energy consumption compared to
cluster 2.
Cluster 2: EL, UK, IRL, I, P, E