Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Statistics and Probability Assignment on Hypothesis Testing January 30, 2023 Instructions This document contains the questions for your assignment project on Statistical Testing. The questions refer to the data given in the individual worksheets in Excel document ‘Assignment Datasets.xlsx’. Please read the following points. 1. All submissions must be in the form of PDF documents. Spread- sheets exported to PDF will be accepted, but calculations must be annotated or explained. 2. It is up to you how you do the calculations in each question, but you must explain how you arrived at your answer for any given calculation. This can be done with a written explanation and by using the relevant equations, along with showing the results of intermediate stages of the calculations. In other words, you need to show that you know how to do a calculation for a statistic other than using spreadsheet functions. 3. Each one of the questions involves a statistical test. Marks within each question will generally be awarded for: 1 • Deciding which statistical test to use, • Framing your Hypotheses and proper conclusions, • Identifying the parameters for the test and • Showing a reasonable level of clarity, detail and explanation in the calculations needed to carry out the test. 4. The data you have been given is in the worksheets of an Excel spreadsheet. This spreadsheet is locked against editing. Please to not try to circumvent this; if you wish to use a spreadsheet to do your calculations, you should copy and paste your data into your own spreadsheet and work with that. Question 1 The lifetimes (in units of 106 seconds) of certain satellite components are shown in the frequency distribution given in ‘Dataset1’. 1. Draw a frequency polygon, histogram and cumulative frequency polygon for the data. 2. Calculate the frequency mean, the frequency standard deviation, the median and the first and third quartiles for this grouped data. 3. Compare the median and the mean and state what this indicates about the distribution. Comment on how the answer to this ques- tion relates to your frequency polygon and histogram. 4. Explain the logic behind the equations for the mean and standard deviation for grouped data, starting from the original equations for a simple list of data values. (This does not just mean ’explain how the equations are used’.) Page 2 5. Carry out an appropriate statistical test to determine whether the data is normally distributed. Question 2 A manufacturer of metal plates makes two claims concerning the thickness of the plates they produce. They are stated here: • Statement A: The mean is 200mm • Statement B: The variance is 1.5mm2. To investigate Statement A, the thickness of a sample of metal plates produced in a given shift was measured. The values found are listed in Part (a) of worksheet ‘Dataset2’, with millimetres (mm) as unit. 1. Calculate the sample mean and sample standard deviation for the data in Part (a) of ’Dataset2’. Explain why we are using the phrase ’sample’ mean or sample’ standard deviation. 2. Set up the framework of an appropriate statistical test on State- ment A. Explain how knowing the sample mean before carrying out the test will influence the structure of your test. 3. Carry out the statistical test and state your conclusions. To investigate the second claim, the thickness of a second sample of metal sheets was measured. The values found are listed in Part (b) of worksheet ‘Dataset2’, with millimetres (mm) as unit. 1. Calculate the sample mean and then the sample variance and standard deviation for the data in Part (b). Page 3 2. Set up the framework of an appropriate statistical test on State- ment B. Explain how knowing the sample variance before carry- ing out the test would influence the structure of your test. 3. Carry out the statistical test and state your conclusions. Question 3 A manager of an inter-county hurling team is concerned that his team lose matches because they ‘fade away’ in the last ten minutes. He has measured GPS data showing how much ground particular players cover within a given time period; this is the data in list (a) in worksheet ‘Dataset3’. He has acquired the corresponding data from an opposing, more successful team, which is given in list (b). 1. Calculate the sample mean and sample standard deviation for the two sets of data. 2. Set up the frame work of an appropriate statistical test to deter- mine whether there is a difference in the distances covered by the two groups of players. 3. Explain how having the results of the calculations above in ad- vance of doing your statistical test will influence the structure of that test. 4. Carry out the statistical test and state your conclusions. Question 4 A study was carried out to determine whether the resistance of the control circuits in a machine are lower when the machine motor is Page 4 running. To investigate this question, a set of the control circuits was tested as follows. Their resistance was measured while the machine motor was not running for a certain period of time and then again while the motor was running. The values found are listed in worksheet ‘Dataset4’, with kilo-Ohms as the unit of measurement. 1. Set up the structure of an appropriate statistical test to determine whether the resistance of the control circuit in a machine are lower when the machine motor is running. 2. Explain how the order of subtraction chosen to calculate the dif- ferences will influence the structure of the test. 3. Give a reason why the data is measured with the engine not run- ning first and then with the engine running. 4. Explain how knowing the mean of the differences in advance will influence the structure of your statistical test. 5. Carry out the statistical test and state your conclusions. Question 5 A study was carried out to determine the influence of a trace element found in soil on the yield of potato plants grown in that soil, defined as the weight of potatoes produced at the end of the season. A large field was divided up into 14 smaller sections for this experiment. For each section, the experimenter recorded the amount of the trace element found (in milligrams per metre squared) and the corresponding weight of the potatoes produced (in kilograms). This information is presented in the worksheet ‘Dataset5’ in the Excel document. Define X as the trace element amount and Y as the yield. Page 5 1. Draw a scatterplot of your data set. 2. Calculate the coefficients of a linear equation to predict the yield Y as a function of X. 3. Calculate the correlation coefficient for the paired data values. 4. Set up the framework for an appropriate statistical test to estab- lish if there is a correlation between the amount of the trace ele- ment and the yield. Explain how having the scatterplot referred to above and having the value of r in advance will influence the structure of your statistical test. 5. Carry out and state the conclusion of your test on the correlation. 6. Comment on how well the regression equation will perform based on the results above. Question 6 A multinational corporation is conducting a study to see how its em- ployees in five different countries respond to three gifts in an incentive scheme. The numbers of employees who choose each of the three gifts (G1 to G3) in each of the five countries (A to E) are given in the table in ‘Dataset6’ in the Excel document. 1. Set up the structure of an appropriate statistical test to deter- mine whether the data supports a link between choice of gift and country, including the statistic to be used. 2. Carry out this test, showing clearly in your work how the expected values are calculated for your test statistic. Page 6 Dataset 1 Assignment : Hypothesis Testing Type the last three digits of your student number in the green cell: 852 XXXXXXXXXX XXXXXXXXXX 328 Dataset 1 7 20 Groups Frequencies 0.1 300 to 307 12 XXXXXXXXXX 0.9 307 to 314 19 XXXXXXXXXX 63 0.47 314 to 321 42 XXXXXXXXXX 0.9 1 321 to 328 85 XXXXXXXXXX 0.51 328 to 335 85 XXXXXXXXXX 0 335 to 342 42 XXXXXXXXXX XXXXXXXXXX 0.07 342 to 349 19 XXXXXXXXXX XXXXXXXXXX 0.57 349 to 356 11 XXXXXXXXXX XXXXXXXXXX 0.84 XXXXXXXXXX 0.33 XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX &"Helvetica Neue,Regular"&12&K000000&P Dataset 2 XXXXXXXXXX Assignment : Hypothesis Testing XXXXXXXXXX XXXXXXXXXX Dataset 2 20 0.1 Part (a) 204.27 205.23 200.62 198.29 199.44 195.77 0.9 205.40 204.22 200.75 199.43 198.39 194.99 204.36 203.86 201.89 198.36 197.35 195.07 203.26 203.45 200.84 197.64 196.53 194.61 203.40 203.00 199.75 197.39 197.61 195.63 204.48 203.17 199.65 198.08 198.35 196.33 205.06 203.12 199.47 198.29 197.72 196.89 Part (b) 203.96 202.11 199.03 199.39 197.67 198.00 204.10 201.72 199.27 199.31 196.60 197.46 203.11 200.03 200.39 198.67 199.00 205.10 XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX &"Helvetica Neue,Regular"&12&K000000&P Dataset 3 XXXXXXXXXX Assignment : Hypothesis Testing XXXXXXXXXX XXXXXXXXXX Dataset 3 20 0.1 13.5 1500 List (a) 10 1542.69 1552.26 1506.19 1482.92 1494.44 1457.65 1506.03 0.9 1554.05 1542.17 1507.53 1494.29 1483.89 1449.94 1483.91 1543.57 1538.64 1518.88 1483.56 1473.46 1450.71 1501.47 1532.56 1534.47 1508.38 1476.40 1465.26 1446.13 List (b) 1520.51 1516.50 1483.96 1460.37 1462.64 1442.80 1462.44 1531.25 1518.20 1483.00 1467.28 1470.04 1449.79 1486.59 1537.10 1517.66 1481.18 1469.36 1463.69 1455.42 1467.41 1526.10 1507.56 1476.84 1480.44 1463.20 1466.49 1486.77 1527.48 1503.70 1479.23 1479.56 1452.53 1461.06 XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX &"Helvetica Neue,Regular"&12&K000000&P Dataset 4 XXXXXXXXXX XXXXXXXXXX Assignment : Hypothesis Testing 20 Dataset 4 0.1 0.9 Resistance: 14 0.9 Motor running Motor not running 0.84 15.68 15.47 XXXXXXXXXX 0.69 1 16.00 15.78 XXXXXXXXXX 0.68 XXXXXXXXXX 0.85 15.70 15.34 XXXXXXXXXX 0.54 XXXXXXXXXX 0.7 15.40 14.98 XXXXXXXXXX 0.48 XXXXXXXXXX 0.72 15.44 14.87 XXXXXXXXXX 0.33 XXXXXXXXXX 0.87 15.74 15.19 XXXXXXXXXX 0.35 XXXXXXXXXX 0.95 15.90 15.51 XXXXXXXXXX 0.51 XXXXXXXXXX 0.8 15.60 15.06 XXXXXXXXXX 0.36 XXXXXXXXXX 0.82 15.64 14.94 XXXXXXXXXX 0.2 XXXXXXXXXX 0.97 15.94 15.23 XXXXXXXXXX 0.19 XXXXXXXXXX 0.83 15.66 14.93 XXXXXXXXXX 0.17 XXXXXXXXXX 0.78 15.56 14.76 XXXXXXXXXX 0.1 XXXXXXXXXX 0.72 15.44 14.68 XXXXXXXXXX 0.14 XXXXXXXXXX 0.66 15.32 14.42 XXXXXXXXXX 0 XXXXXXXXXX 0.69 15.38 14.64 XXXXXXXXXX 0.16 XXXXXXXXXX 0.68 15.36 14.47 XXXXXXXXXX 0.01 XXXXXXXXXX 0.54 15.08 14.09 XXXXXXXXXX -0.09 XXXXXXXXXX 0.48 14.96 13.93 XXXXXXXXXX -0.13 XXXXXXXXXX 0.72 15.44 14.51 XXXXXXXXXX -0.03 XXXXXXXXXX 0.72 15.44 14.54 XXXXXXXXXX 0 &"Helvetica Neue,Regular"&12&K000000&P Dataset 5 XXXXXXXXXX XXXXXXXXXX Assignment : Hypothesis Testing 0.9 29 Dataset 5 136 0.1 0.075 XXXXXXXXXX 63 Additive Yield XXXXXXXXXX XXXXXXXXXX 68.72 129.51 XXXXXXXXXX 0.5 0.1 XXXXXXXXXX 68.77 129.46 XXXXXXXXXX 0.49 63.1 XXXXXXXXXX 66.26 129.26 XXXXXXXXXX 0.4 XXXXXXXXXX XXXXXXXXXX 66.58 129.84 XXXXXXXXXX 0.54 XXXXXXXXXX XXXXXXXXXX 66.23 129.05 XXXXXXXXXX 0.35 XXXXXXXXXX XXXXXXXXXX 65.01 128.84 XXXXXXXXXX 0.28 XXXXXXXXXX XXXXXXXXXX 67.98 127.84 XXXXXXXXXX 0.1 XXXXXXXXXX XXXXXXXXXX 71.04 127.18 XXXXXXXXXX 0 XXXXXXXXXX XXXXXXXXXX 69.93 127.74 XXXXXXXXXX 0.11 XXXXXXXXXX XXXXXXXXXX 72.57 128.4 XXXXXXXXXX 0.31 XXXXXXXXXX XXXXXXXXXX 75.57 128.82 XXXXXXXXXX 0.46 XXXXXXXXXX 1 78 128.21 XXXXXXXXXX 0.36 XXXXXXXXXX XXXXXXXXXX 75.01 129.21 XXXXXXXXXX 0.54 XXXXXXXXXX XXXXXXXXXX 72.78 128.78 XXXXXXXXXX 0.4 XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX &"Helvetica Neue,Regular"&12&K000000&P Dataset 6 Assignment : Hypothesis Testing 15 Dataset 6 0.12 G1 G2 G3 A 9 24 11 B 15 17 19 School C 25 23 18 D 16 14 15 E 18 8 12 XXXXXXXXXX XXXXXXXXXX 0.9 XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX &"Helvetica Neue,Regular"&12&K000000&P Reference 3 852 Student numbers Last3 Seed value B XXXXXXXXXX 204 1 0.9 B XXXXXXXXXX 224 1 B XXXXXXXXXX 476 1 B XXXXXXXXXX 935 0 B XXXXXXXXXX 479 1 B XXXXXXXXXX 662 1 B XXXXXXXXXX 463 1 B XXXXXXXXXX 837 1 B XXXXXXXXXX 309 1 B XXXXXXXXXX 219 1 B XXXXXXXXXX 85 1 B XXXXXXXXXX 353 1 B XXXXXXXXXX 414 1 B XXXXXXXXXX 800 1 B XXXXXXXXXX 347 1 B XXXXXXXXXX 307 1 B XXXXXXXXXX 44 1 B XXXXXXXXXX 110 1 B XXXXXXXXXX 967 0 B XXXXXXXXXX 464 1 B XXXXXXXXXX 570 1 B XXXXXXXXXX 650 1 B XXXXXXXXXX 882 0 B XXXXXXXXXX 304 1 B XXXXXXXXXX 276 1 B XXXXXXXXXX 488 1 B XXXXXXXXXX 585 1 B XXXXXXXXXX 295 1 B XXXXXXXXXX 300 1 B XXXXXXXXXX 346 1 B XXXXXXXXXX 534 1 B XXXXXXXXXX 448 1 B XXXXXXXXXX 233 1 B XXXXXXXXXX 32 1 B XXXXXXXXXX 458 1 B XXXXXXXXXX 56 1 B XXXXXXXXXX 582 1 B XXXXXXXXXX 439 1 B XXXXXXXXXX 196 1 B XXXXXXXXXX 627 1 B XXXXXXXXXX 455 1 B XXXXXXXXXX 814 1 B XXXXXXXXXX 322 1 B XXXXXXXXXX 901 0 B XXXXXXXXXX 724 1 B XXXXXXXXXX 328 1 B XXXXXXXXXX 853 0 B XXXXXXXXXX 7 1 B XXXXXXXXXX 463 1 B XXXXXXXXXX 522 1 B XXXXXXXXXX 878 0 B XXXXXXXXXX 983 0 B XXXXXXXXXX 503 1 B XXXXXXXXXX 367 1 B XXXXXXXXXX 975 0 B XXXXXXXXXX 51 1 B XXXXXXXXXX 146 1 B XXXXXXXXXX 765 1 B XXXXXXXXXX 210 1 B XXXXXXXXXX 959 0 B XXXXXXXXXX 834 1 B XXXXXXXXXX 572 1 B XXXXXXXXXX 67 1 B XXXXXXXXXX 640 1 B XXXXXXXXXX 863 0 B XXXXXXXXXX 876 0 B XXXXXXXXXX 39 1 B XXXXXXXXXX 956 0 B XXXXXXXXXX 73 1 B XXXXXXXXXX 969 0 B XXXXXXXXXX 10 1 B XXXXXXXXXX 688 1 B XXXXXXXXXX 187 1 B XXXXXXXXXX 882 0 B XXXXXXXXXX 112 1 B XXXXXXXXXX 282 1 B XXXXXXXXXX 654 1 B XXXXXXXXXX 373 1 B XXXXXXXXXX 176 1 B XXXXXXXXXX 24 1 B XXXXXXXXXX 972 0 B XXXXXXXXXX 339 1 B XXXXXXXXXX 312 1 B XXXXXXXXXX 10 1 B XXXXXXXXXX 95 1 B XXXXXXXXXX 610 1 B XXXXXXXXXX 198 1 B XXXXXXXXXX 430 1 B XXXXXXXXXX 754 1 B XXXXXXXXXX 841 1 B XXXXXXXXXX 311 1 B XXXXXXXXXX 946 0 B XXXXXXXXXX 852 0 B XXXXXXXXXX 676 1 B XXXXXXXXXX 770 1 B XXXXXXXXXX 18 1 B XXXXXXXXXX 255 1 B XXXXXXXXXX 502 1 B XXXXXXXXXX 838 1 B XXXXXXXXXX 111 1 B XXXXXXXXXX 30 1 &"Helvetica Neue,Regular"&12&K000000&P
Answered Same Day May 02, 2023

Solution

Atul answered on May 03 2023
25 Votes
1
Dataset
1
7
Groups Frequencies
300 to 307 12
307 to 314 19
0.47 314 to 321 42
1 321 to 328 85
0.51 328 to 335 85
0 335 to 342 42
0.07 342 to 349 19
0.57 349 to 356 11
Question1

The lifetimes (in units of 106 seconds) of certain satellite components are shown in the
frequency distribution given in ‘Dataset1’.
1. Draw a frequency polygon, histogram and cumulative frequency polygon for the data.
To draw the frequency polygon, we first need to calculate the midpoints of each group:
Groups Frequencies Midpoints
300-307 12 303.5
307-314 19 310.5
314-321 42 317.5
321-328 85 324.5
328-335 85 331.5
335-342 42 338.5
342-349 19 345.5
349-356 11 352.5
2


Histogram


Finally, to draw the cumulative frequency polygon, we need to calculate the cumulative
frequencies:
Groups Frequencies Cumulative Frequencies
300-307 12 12
307-314 19 31
314-321 42 73
321-328 85 158
328-335 85 243
335-342 42 285
342-349 19 304
349-356 11 315
Cumulative Frequency Pentagon
3


2. Calculate the frequency mean, the frequency standard deviation, the median and the
first and third quartiles for this grouped data.
To calculate the mean, we need to use the midpoint of each group and the frequency of each
group:
mean = (303.5 * 12 + 310.5 * 19 + 317.5 * 42 + 324.5 * 85 + 331.5 * 85 + 338.5 * 42 + 345.5 *
19 + 352.5 * 11) / 315
= 327.09
To calculate the standard deviation, we first need to calculate the variance:
(317.5 - 327.09)^2 * 42 + (324.5 - 327.09)^2 * 85 + (331.5 - 327.09)^2 * 85 + (338.5 -
327.09)^2 * 42 + (345.5 - 327.09)^2 * 19 + (352.5 - 327.09)^2 * 11) / 315
= 183.51
Then, the standard deviation is the square root of the variance:
standard deviation = sqrt(183.51)
= 13.54
To find the median, we need to find the cumulative frequency that co
esponds to the middle of
the data set. In this case, the median falls in the 158th position. Since the cumulative frequency
of the group 321-328 is 158, the median is in that group. The midpoint of the 321-328 group is
324.5. Therefore, the median is 324.5.
4
To find the first and third quartiles, we need to find the cumulative frequencies that co
espond
to the 25th and 75th percentiles. The 25th percentile falls in the 79th position, and the 75th
percentile falls in the 236th position. Since the cumulative frequency of the group 314-321 is 31
and the cumulative frequency of the group 328-335 is 243, the 25th percentile is in the 314-321
group, and the 75th percentile is in the 328-335 group.
To find the first quartile, we need to interpolate within the 314-321 group. The width of the 314-
321 group is 7 units, and the frequency of the group is 42. The 25th percentile co
esponds to the
79th position, which is 48% of the way through the 42 units in the group. Therefore, the first
quartile is:
Q1 = 314 + 7 * 0.48
= 317.36
To find the third quartile, we need to interpolate within the 328-335 group. The width of the 328-
335 group is 7 units, and the frequency of the group is 85. The 75th percentile co
esponds to the
236th position, which is 64% of the way through the 85 units in the group. Therefore, the third
quartile is:
Q3 = 328 + 7 * 0.64
= 332.48
3. Compare the median and the mean and state what this indicates about the distribution.
Comment on how the answer to this question relates to your frequency polygon and
histogram.
Comparison of Median and Mean
The median of the data set is 324.5, and the mean is 327.09. The fact that the mean is slightly
larger than the median indicates that the distribution is slightly skewed to the right. This is
consistent with what we see in the frequency polygon and histogram, where there are more
values on the right side of the distribution.
4. Explain the logic behind the equations for the mean and standard deviation for grouped
data, starting from the original equations for a simple list of data values. (This does not just
mean ’explain how the equations are used’.)
The equations for the mean and standard deviation for grouped data are modifications of the
equations for the mean and standard deviation for a simple list of data values. The main
difference is that the grouped data is divided into intervals, and the frequency of each interval is
used to determine the weight of each interval in the calculation of the mean and standard
deviation.
For the mean, the equation for grouped data is:
5
mean = Σ (midpoint * frequency) / Σ frequency
where midpoint is the midpoint of each interval, and frequency is the frequency of each interval.
The numerator represents the sum of the products of the midpoint and frequency of each interval,
while the denominator represents the total frequency of all intervals. This equation is used to
calculate the weighted average of the midpoints of the intervals, where the weight of each
interval is its frequency.
For the standard deviation, the equation for grouped data is:
standard deviation = sqrt(Σ [(x - mean)^2 * frequency] / (Σ frequency - 1))
where x is the midpoint of each interval, mean is the mean of the data set, and frequency is the
frequency of each interval. The numerator represents the sum of the products of the squared
differences between the midpoint and the mean and the frequency of each interval, while the
denominator represents the total frequency of all intervals minus one. This equation is used to
calculate the weighted average of the squared deviations of the midpoints from the mean, where
the weight of each interval is its frequency.
The modification of the equations is necessary because grouped data provides less information
about the individual data points than a simple list of values. The midpoint of each interval is used
to represent all the data points within the interval, and the frequency of each interval is used to
determine the weight of each interval in the calculation of the mean and standard deviation.
5.Ca
y out an appropriate statistical test to determine whether the data is normally
distributed.
We can use the following method Anderson-Darling test for the given grouped data, we first
need to calculate the expected frequencies for a normal distribution with the same mean and
standard deviation as the data. We can use the following formula to calculate the expected
frequency for an interval:
Expected frequency = (Φ(upper bound) - Φ(lower bound)) * N
where Φ() is the cumulative distribution function of the standard normal distribution, upper
ound and lower bound are the upper and lower bounds of the interval, and N is the total sample
size.
Using the given data, we can calculate the sample mean and sample standard deviation as
follows:
mean = (300+307)*12/2 + (307+314)*19/2 + (314+321)*42/2 + (321+328)*85/2 +
(328+335)*85/2 + (335+342)*42/2 + (342+349)*19/2 + (349+356)*11/2
= 325.515
standard deviation = sqrt([(300-325.515)^2*12 + (307-325.515)^2*19 + ... + (349-
325.515)^2*11]/(428-1))
= 16.481
Using these values, we can calculate the expected frequencies for each interval:
6
Interval Expected Frequency
300 to 307 7.34
307 to 314 16.16
314 to 321 34.55
321 to 328 63.87
328 to 335 82.35
335 to 342 61.31
342 to 349 25.29
349 to 356 8.93
We can now calculate the Anderson-Darling test statistic using the formula:
A^2 = -N - Σ[(2i-1)(ln(Fi) + ln(1 - Fi'))]
where N is the total sample size, i is the index of the interval, Fi is the cumulative frequency for
the observed data up to the upper bound of the interval, and Fi' is the cumulative frequency for
the expected data up to the upper bound of the interval.
Using the given data and the expected frequencies calculated above, we can calculate the
Anderson-Darling test statistic as follows:
Interval Observed Freq Expected Freq Cumulative Obs Freq Cumulative Exp Freq ln(Fi)
ln(1-Fi') (2i-1)(ln(Fi) + ln(1-Fi'))
300 to 307 12 7.34 12 12 0.000 1.624 3.248
307 to 314 19 16.16 31 31 0.281 0.434 1.825
314 to 321 42 34.55 73 105 0.745 0.275 0.962
321 to 328 85 63.87 158 169 1.602 0.079 3.228
328 to 335 85 82.35 243 252 1.602 -0.086 -3.107
335 to 342 42 61.31 285 313 0.745 0.296 1
Based on the Anderson-Darling test, the p-value is less than the significance level of 0.05, which
suggests that we reject the null hypothesis that the data is normally distributed. Therefore, we
can conclude that the data is not normally distributed.
7
Question 2...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here