Business Statistics 1
(a) Using the sample data attached, calculate the sample mean and standard deviation for the variables: -
Growth (annual population growth rate)
(b) Is there any evidence of skewness in these data sets? Which data set(s) displays negative skewness? Which data set(s) display positive skewness? Which data set is least skewed? In answering this question, you should use Pearson’s Coefficient of skewness developed in lectures, and not simply the skewness measure generated in the PhStat2 or Excel output. Nonetheless, you may provide comparison on both methods displaying and comment on any differences between them (if applicable).
(c) Check for normality for ALL variables in the data set. Are there any variables that could be considered NOT approximately normally distributed? Use PhStat to determine your answer here. Include your printouts from PhStat supporting your reasoning
(d) Using the sample data, calculate the sample proportion and the standard deviation of:
i. stores whose median age of customer base is less than 32 years (3 marks) ii. stores whose customer base have at least 80% High School qualification (3 marks)
(e) Set up and interpret the following confidence intervals -
i. a 95% confidence interval for the true population mean sales (4 marks)
ii. a 99% confidence interval for the true population mean annual growth rate of the
customer base over the past 10 years. (4 marks)
iii. a 90% confidence interval for the true population proportion of all stores whose
median age of the customer base is less than 32 years. (4 marks)
iv. A 98% confidence interval to estimate the true population proportion of all stores whose customer base have at least 80% high school qualifications. (4 marks)
(f) A follow-up study will provide a point estimate of the population proportion of stores whose median age of the customer base is less than 32 years. The study must provide 90% confidence that the point estimate is within 0.10 of the population proportion. If no previous proportion estimate is available (not even that calculated in (d) above), how large a sample would you recommend for this study?
(g) Demographic studies claim that on average the growth rate of the customer base of stores is more than 1.3. Do the data provide significant support for this claim? Use a 5% significance level and the critical value approach (classical approach) to test this claim.
(h) The Chamber of Commerce has claimed that more than 30% of all stores customer base is less than 32 years old. How much evidence do the data provide to support this claim? Use a 0.05 level of significance and the p-value approach to test this claim.
(i) Imagine that your sample data set only included the first fifteen stores listed in your original data set. For this new data set, you are now told that the growth rate of the customer base followed the normal distribution in the past. With this reduced data set, test the claim that on average the growth rate of the customer base of all stores is more than 1.3 at the 1% significance level.
Compare this result to your answer if A(g). Can you suggest any reasons for the variation, if one exists.
[Total Marks = 72 marks]
Using the computer software package Phstat (or any Excel based software) to confirm your answers with the appropriate printout attached for: -
(a) All confidence intervals in Part A(e) above. (8 marks)
(b) The sample size calculation in Part A(f) above (3 marks) (c) The hypothesis test of the mean in Part A(g) (3 marks)
(d) The p-value approach to the hypothesis test of the proportion in Part A(h) (3 marks)
(e) The hypothesis test with the reduced data set in Part A(i) (3 marks)
[Total Marks = 20 marks]
· the complete data set AND
· simple ordinary least squares regression formulae based solutions AND - the regression package on the PhStat software (computer printout).
(a) Develop two linear models to explain the distribution costs as a function of their
(i) Median Family Income (Income)
(ii) Percentage of customer base with a Higher School Certificate (HS)
(b) Which model best describes the behaviour of sales?
Explain your reasons here (5 marks)
(c) Develop a multiple regression analysis to test the distribution costs as a function of both Sales and the Number of orders.
Distribution Cost = f (Income, HS, Age)
· comment on the significance of each of the independent variables in this new model and whether or not the model is improved by including these variables jointly and
· discuss statistical measures that support your belief. Which variable appears to most influence sales? Explain how you have a
ived at your decision.
· Setup appropriate tests
In this section you are required to use computer printout only to support your argument and refer to tests of hypotheses on each of the coefficients.
[Total Mark= 25 marks]
At the Lifestyle Furniture Manufacturing Company, an application of the test of the difference between small sample means arises. New employees are expected to attend a three-day seminar to learn about the company. At the end of the seminar, they are tested to measure their knowledge about the company. The traditional training method has been a lecture and a question-and-answer session. Management decided to experiment with a different training procedure, which processes new employees in two days by using DVDs and having no question-and-answer session. If this procedure works, it could save the company thousands of dollars over a period of several years. However, there is some concern about the effectiveness of the two-day method, and company managers would like to know whether there is any difference between the effectiveness of the two training methods.
a) At the 0.05 level of significance, is there a difference in the variability in training methods A and B?
Use the PhStat software to analyse the problem but give the full hypothesis testing steps in your final presentation.
) To test the difference in the two training methods, the managers randomly select one group of 15 newly hired employees to take the three-day seminar (method A) and a second group of 12 new employees for the two-day DVD method (method B). The test scores of the two groups are shown above.
Using α = 0.05, determine whether there is a significant difference in the mean scores of the two groups. Assume that the scores for this test are normally distributed and that the population variances are approximately equal.
c) One group of researchers set out to determine whether there is a difference between ‘average citizens’ and those who are ‘phone survey respondents’.
(Note: This is part of a much larger study. The results in this portion of the study are about the same as those of the actual study except that the sample sizes were 500 to 600.)
Their study was based on a well-known personality survey that attempted to assess the personality profile of both average citizens and phone survey respondents. Suppose they sampled nine phone survey respondents and 10 average citizens in this survey and obtained results for one personality factor, conscientiousness, which are displayed below:
Develop a 99% confidence interval for the true difference in population mean personality scores for conscientiousness between phone survey respondents and average citizens.
NOTE: Attach the appropriately labelled PhStat / Excel computer printouts that are required for section a), b) and c) above.
[Total Mark= 33 marks]