This project gives you the chance to use the statistical techniques and concepts covered in Chapters 7-9, and Chapter 14. Overview: The U.S. Department of Agriculture maintains a nutrient database for 100 nutrients of over 7300 foods. As part of this assignment, there is an Excel file in Canvas of cereal data from the 2004 USDA National Nutrient Database for Standard Reference, Release 17. * The dataset contains the fat, calorie, carbohydrate, fiber, and sugar data for all 278 cereal items in the nutrient database (at that time) that contain standard serving size information. From this data, you will ESTIMATE a population mean and determine an appropriate sample size. With the help of Excel, you can then take a random sample, test a hypothesis, calculate a confidence interval, and compare the results with the actual population data. You will also investigate the correlation between fat and calories. Data Collection: In the files library in Canvas and attached to the assignment writeup is the Excel file for this project. It is called “Copy of cereal_usda_data_stat_proj2.xls”. Although the file can be considered a population, you MUST NOT use it in its entirety until part 4! POPULATION DATA IS ONLY TO BE USED FOR PART 4. Optional Data Analysis: You may choose to analyze data of your choice under the following conditions: You must pre-approve your data selection with the instructor. The population should contain between XXXXXXXXXXobservations. Data for at least two different variables must be available for correlation analysis. The data must be accessible to the instructor electronically (from any data source, including data entry by yourself). Project Requirements Project Elements: There are 5 parts to the project and they must be done in order! Label each part as you would a multiple part homework problem, example 1a) 1. Take a guess a. Begin with a visual review of data in the fat, calorie, carbohydrate, fiber, or sugar columns. Estimate the population mean for ONE of these variables. Do this without using a calculator or any Excel formula or function. You may only scan the date file visually in this step. State the variable name, your estimated mean, and describe how you arrived at it. * Reference: U.S. Department of Agriculture, Agricultural Research Service XXXXXXXXXXUSDA National Nutrient Database for Standard Reference, Release 17. Nutrient Data Laboratory Home Page, http://www.nal.usda.gov/fnic/foodcomp GBS221 Bulriss 2. Construct a hypothesis test (and test with an appropriate sample size) Base your hypothesis test on your ESTIMATED population mean in Step 1 above. a. State the hypotheses in mathematical and written terms. b. Specify your chosen level of significance and state why you chose it. c. Determine an appropriate sample size ➔ When you determine sample size, use the method outlined in Section 8.3 of the text. You must decide what Z (confidence level) to use, the allowable error, and how to estimate the standard deviation (refer to page 384 for ways of estimating the standard deviation). Show how you calculated the sample size (show formula), and explain how/why you chose the values you used for Z, e, and σ. d. State which form of the hypothesis test you will use & why. (Z or t, 1or 2 tail) e. Calculate the Critical Value. f. Use Excel to select a random sample. Be sure to document, in detail, your process for selecting the sample. Run Excel’s Descriptive Statistics on the sample. Include your random sample and descriptive statistics as an attachment to the project. g. Calculate your test statistic, p-value, and state your conclusion. 3. Construct a confidence interval a. Estimate the population mean for your variable using your sample data. b. Comment on the estimate for your population mean. c. Construct a confidence interval of your choice and interpret the results. 4. Compute the population mean and standard deviation a. Compute the population mean and standard deviation using Excel’s descriptive statistics and compare them with the mean and standard deviation of your sample that you got in part 2f. Comment on the comparison. b. Compare the population mean with your confidence interval from 3c and comment on the comparison. 5. Use simple linear regression on the random variables of fat and calories a. Specify which is the independent variable and which is the dependent variable. b. Write down what you think the relationship is between the two variables. What do you believe regarding the strength of the correlation? (Is it strong or weak, positive, or negative?) c. Prepare a scatter diagram and least squares trend line, using your sample data. Compute the coefficient of correlation (r) and the coefficient of determination (r2). Refer to instructions in the files library in Canvas titled “Using Excel Chart Tools to Create a Scatter Diagram and Determine Regression Line”. d. Interpret your results and compare them to what you expected. Format of the Report ➢ Project elements should be in numerical order and identified by number and letter. [example: 1a) The estimated mean fiber content for cereal is 3g.] ➢ The report should be typed using a word processor (single spaced, 12pt font) and calculations, graphs, random number tables, and charts should be prepared using excel