Data
MARK GENDER CLASS STATUS
84.84 1 1 1 GENDER 1 FEMALE
63.64 1 2 1 2 MALE
54.43 1 3 1
55.65 1 3 1 CLASS 1 CLASS 1
66.40 1 2 1 2 CLASS 2
74.51 1 1 1 3 CLASS 3
58.73 1 3 1
43.64 1 3 1 STATUS 1 DOMESTIC
70.20 1 2 1 2 INTERNATIONAL
73.30 1 2 1
98.23 1 1 1
27.90 1 3 1
33.67 1 3 1
58.37 1 2 1
56.72 1 2 1
76.18 1 1 2
76.27 1 1 2
78.13 1 1 2
45.53 1 3 2
37.06 1 3 2
51.96 1 2 2
26.18 1 3 2
56.73 1 1 2
54.24 1 2 2
51.07 1 2 2
39.82 1 3 2
98.66 1 1 2
91.04 1 1 2
90.11 1 1 2
68.00 1 2 2
42.77 2 3 1
43.98 2 3 1
52.37 2 1 1
44.38 2 3 1
48.91 2 2 1
46.57 2 2 1
67.58 2 1 1
55.50 2 2 1
48.15 2 3 1
38.63 2 2 1
64.82 2 1 1
45.24 2 3 1
54.31 2 1 1
53.38 2 1 1
40.48 2 3 1
50.84 2 1 2
52.15 2 3 2
40.60 2 2 2
63.43 2 1 2
45.92 2 3 2
47.89 2 3 2
43.65 2 2 2
47.12 2 2 2
60.18 2 1 2
42.52 2 3 2
46.70 2 2 2
42.44 2 2 2
49.19 2 2 2
59.71 2 1 2
66.11 2 1 2
STATISTICS FOR DECISION MAKING
BUSINESS REPORT
XXXXXXXXXXName: Tejas Kaja
XXXXXXXXXXStudent no: XXXXXXXXXX
XXXXXXXXXXCampus: Wollongong
Executive Summary:
With (F=27.37, p<0.05), one-way ANOVA along with the Post Hoc analysis concludes that there is a significant difference in the mean marks between (class1, class2), (class1, class3), and (class2, class3). With (t=-0.25, p>0.05), the T-test of independent samples indicates that there is 79.9% probability that there is no significant difference in the mean marks between two types of student status namely domestic and international. It is recommended to give equal priority to the domestic and international type of students regarding faculty and teaching methods. With (Chi(2)=1.6, p>0.05), the Chi square test of Independence indicates that there is 44.9% chance that type of class and type of student status are independent of each other. With (z=0.5346, p>5%), one sample Z test indicates that there is 29.6% chance that the mean marks of students are less or equal to 55. adopting a practical method of teaching, proper invigilation on students regarding their homework an assignment might be helpful to improve the proportion of students whose mean marks are greater than 55. With (F=5.93, p<0.05), the f test for variance indicate that there is a significant difference in the variants marks between male and female students. With (t=3.04, p<0.05), the T-test of independent samples indicates that there is sufficient evidence to conclude that there is a significant difference in the mean marks of male and female students. With (z=3.126, p<5%), the Z test for difference in proportion indicates that proportion of Marks greater than 55 for females is higher than male students. Checking attendance on a regular basis and an introduction of the penalty for missing classes can improve the proportion of marks greater than 55 for male students.
Table of Contents
Executive Summary 1
Business Problem 3
Statistical Problem 4
Analysis 5
Hypothesis 1 5
Hypothesis 2 8
Hypothesis 3 10
Hypothesis 4 12
Hypothesis 5 13
Hypothesis 6 15
Hypothesis 7 17
Conclusions 19
Implications 20
Business Problem
The data for marks of 60 students along with their gender, class, and status is to be analyzed. The main area of concern is to test the proportion of students with their marks less than 55. Comparison of the proportion of male and female students with marks greater than 55 is also an objective. Other objectives include knowing the independence of type of student status and gender. Lastly the test for difference in the mean marks between the type of student as well as the type of gender might be beneficial.
Statistical Problem
Various types of parametric hypothesis testing are used to analyze my objective. One way ANOVA is used for testing difference between the mean marks of various classes. T-test for independent samples are helpful for testing the mean difference in marks between two categories of student status as well as between genders. Chi square test of Independence is used to test the dependence of type of class and the type of student status. Sample Z test seems appropriate for testing if mean marks of students are greater than 55. F test to test the equality of variances of marks between male and female students. Z test for difference in proportions is beneficial for testing if the proportion of Marks greater than 55 for females is higher than male students.
Analysis
Hypothesis 1
The null hypothesis, ho: the mean marks of various classes namely class 1, class 2 and class 3 don’t differ significantly. The alternative hypothesis, H1: at least one of the mean marks of various classes namely class 1, class 2 and class 3 differs significantly.
For the application of one way ANOVA, the dependent variable should be measured by ratio scale of measurement. And the independent variable is categorized into three or more than three groups. In this case, my dependent variable is marks of students. The independent variable is different categories of class (1, 2, and 3). The calculation of F test statistic on the basis of ANOVA is done with the help of following formula. Carlin, B.P. and Louis, T.A., 2010.
source
Df
SS
MSS
F
between
dfb = k - 1
SSB = ∑_j {nj (xbar_j-xbar)2}
MSB = SSB / dfb
F = MSB / MSW
within
dfw = n - k
SSW = ∑_j∑_i(xij-xbar_j )2
MSW = SSW / dfw
total
dft = n - 1
SST = ∑_j∑_i(xij-xbar)2
The output as obtained from Excel for the analysis of one way ANOVA is given below.
ANOVA: Single Factor
SUMMARY
Groups
Count
Sum
Average
Variance
Class1_marks
20
1417.43
70.8715
XXXXXXXXXX
Class2_marks
20
1073.21
53.6605
XXXXXXXXXX
Class3_marks
20
876.09
43.8045
70.1333
ANOVA
Source of Variation
SS
df
MS
F
P-value
F crit
Between Groups
XXXXXXXXXX
2
XXXXXXXXXX
27.3753
0.0000
3.1588
Within Groups
XXXXXXXXXX
57
XXXXXXXXXX
Total
XXXXXXXXXX
59
Level of significance
0.05
With (F=27.37, p<0.05), I reject the null hypothesis 5% level of significance. Hence there is enough evidence to support the claim that at least one of the mean marks between three categories of classes differs significantly.
To test which pair of classes differ significantly I use post hoc analysis, Turkey Crammer test. The formula for Turkey Crammer test is shown below.
Critical Range = QU(c,n-c) * sqrt{(MSW/2) * (1/nj +1/nj')}
Where n = total number of values in that group
Nj = number of values in jth group
c = number of groups
A specific pair is said to be significantly different if the absolute mean difference is greater than the critical range. The Excel output of Turkey Crammer analysis is obtained from Excel is given below. Wilcox, R.R., 1996.
Tukey-Kramer Multiple Comparisons
Sample
Sample
Group
Mean
Size
1: Class1_marks
70.8715
20
2: Class2_marks
53.6605
20
3: Class3_marks
43.8045
20
Other Data
Level of significance
0.05
Numerator d.f.
3
Denominator d.f.
57
MSW
XXXXXXXXXX
Q Statistic
3.403
Absolute
Std. Error
Critical
Comparison
Difference
of Difference
Range
Results
Group 1 to Group 2
17.211
XXXXXXXXXX
8.9099
Means are different
Group 1 to Group 3
27.067
XXXXXXXXXX
8.9099
Means are different
Group 2 to Group 3
9.856
XXXXXXXXXX
8.9099
Means are different
There is a significant difference in the mean marks between (class1, class2), (class1, class3), and (class2, class3). Box, G.E., Hunter, J.S. and Hunter, W.G., 2005.
Hypothesis 2
The null hypothesis, Ho: the mean marks between two types of student status namely domestic and international don’t differ significantly. The alternative hypothesis, h1: the mean marks between two types of student status namely domestic and international differs significantly. Box, G.E., Hunter, J.S. and Hunter, W.G., 2005.
T-test for independent samples with unequal variances is applied to test this hypothesis. The dependent variable is marks of students. Independent variable is the type of student status which is categorized as domestic and international. Since domestic and International Group is independent of each other, it is sufficient to apply the T-test for independent samples. The test statistic for T-test of independent samples with unequal variances is given below. Wilcox, R.R., 1996.
T = (Xbar1-Xbar2) / √((s12)/n1 +(s22)/n2 )
Degrees of freedom, df = ((s12)/n1 +(s22)/n2 )2 / (1/(n1-1) ((s12)/n1)2+1/(n2-1) ((s22)/n2)2)
The Excel generated output for testing the difference in mean marks between two types of student status is given below.
Separate-Variances t Test for the Difference Between Two Means
(assumes unequal population variances)
Data
Hypothesized Difference
0
Level of Significance
0.05
Population 1 Sample
Sample Size
30
Sample Mean
XXXXXXXXXX
Sample Standard Deviation
15.1154
Population 2 Sample
Sample Size
30
Sample Mean
XXXXXXXXXX
Sample Standard Deviation
17.2991
Intermediate Calculations
Numerator of Degrees of Freedom
XXXXXXXXXX
Denominator of Degrees of Freedom
5.4313
Total Degrees of Freedom
56.9750
Degrees of Freedom
56
Standard Error
4.1942
Difference in Sample Means
-1.0710
Separate-Variance t Test Statistic
-0.2554
Two-Tail Test
Lower Critical Value
-2.0032
Upper Critical Value
2.0032
p-Value
0.7994
Do not reject the null hypothesis
With (t=-0.25, p>0.05), I fail to reject the null hypothesis at 5% level of significance. There is no evidence to support the claim that the mean marks between two types of student status namely domestic and international differs significantly. The P value is equal to XXXXXXXXXXThere is 79.9% probability that the mean marks between two types of student status namely domestic and international differs significantly. Huck, S.W., Cormier, W.H. and Bounds, W.G., 1974.
Hypothesis 3
The null hypothesis, Ho: types of class and status of student is independent of each other. The alternative hypothesis, H1: types of class and status of student are dependent on each other. Lancaster, H.O., 1969.
Chi-square test of Independence is used to test the above hypothesis. The dependent variable is marks of the student. The independent variables are the type of class and this type of student status. For the application of Chi-square test of Independence, it is required that two variables are measured on the nominal scale of measurement. In this case, the type of student status and the type of class is a categorical variable measured by the nominal scale of measurement. The test statistic for chi-square test of Independence is given below. Lancaster, H.O., 1969.
Chi square=Sum {(Oi-Ei)^2/Ei}
Where, Oi is the observed frequency
Ei is the expected frequency
Ei= (ri_total*ci_total)/(grand_total )
Degrees of