Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

This project gives you the chance to use the statistical techniques and concepts covered in Chapters 7-9, and Chapter 14. Overview: The U.S. Department of Agriculture maintains a nutrient database for...

1 answer below »
This project gives you the chance to use the statistical techniques and concepts covered in Chapters 7-9, and Chapter 14. Overview: The U.S. Department of Agriculture maintains a nutrient database for 100 nutrients of over 7300 foods. As part of this assignment, there is an Excel file in Canvas of cereal data from the 2004 USDA National Nutrient Database for Standard Reference, Release 17. * The dataset contains the fat, calorie, carbohydrate, fiber, and sugar data for all 278 cereal items in the nutrient database (at that time) that contain standard serving size information. From this data, you will ESTIMATE a population mean and determine an appropriate sample size. With the help of Excel, you can then take a random sample, test a hypothesis, calculate a confidence interval, and compare the results with the actual population data. You will also investigate the correlation between fat and calories. Data Collection: In the files library in Canvas and attached to the assignment writeup is the Excel file for this project. It is called “Copy of cereal_usda_data_stat_proj2.xls”. Although the file can be considered a population, you MUST NOT use it in its entirety until part 4! POPULATION DATA IS ONLY TO BE USED FOR PART 4. Optional Data Analysis: You may choose to analyze data of your choice under the following conditions:  You must pre-approve your data selection with the instructor.  The population should contain between XXXXXXXXXXobservations.  Data for at least two different variables must be available for correlation analysis.  The data must be accessible to the instructor electronically (from any data source, including data entry by yourself). Project Requirements  Project Elements: There are 5 parts to the project and they must be done in order! Label each part as you would a multiple part homework problem, example 1a) 1. Take a guess a. Begin with a visual review of data in the fat, calorie, carbohydrate, fiber, or sugar columns. Estimate the population mean for ONE of these variables. Do this without using a calculator or any Excel formula or function. You may only scan the date file visually in this step. State the variable name, your estimated mean, and describe how you arrived at it. * Reference: U.S. Department of Agriculture, Agricultural Research Service XXXXXXXXXXUSDA National Nutrient Database for Standard Reference, Release 17. Nutrient Data Laboratory Home Page, http://www.nal.usda.gov/fnic/foodcomp GBS221 Bulriss 2. Construct a hypothesis test (and test with an appropriate sample size) Base your hypothesis test on your ESTIMATED population mean in Step 1 above. a. State the hypotheses in mathematical and written terms. b. Specify your chosen level of significance and state why you chose it. c. Determine an appropriate sample size ➔ When you determine sample size, use the method outlined in Section 8.3 of the text. You must decide what Z (confidence level) to use, the allowable error, and how to estimate the standard deviation (refer to page 384 for ways of estimating the standard deviation). Show how you calculated the sample size (show formula), and explain how/why you chose the values you used for Z, e, and σ. d. State which form of the hypothesis test you will use & why. (Z or t, 1or 2 tail) e. Calculate the Critical Value. f. Use Excel to select a random sample. Be sure to document, in detail, your process for selecting the sample. Run Excel’s Descriptive Statistics on the sample. Include your random sample and descriptive statistics as an attachment to the project. g. Calculate your test statistic, p-value, and state your conclusion. 3. Construct a confidence interval a. Estimate the population mean for your variable using your sample data. b. Comment on the estimate for your population mean. c. Construct a confidence interval of your choice and interpret the results. 4. Compute the population mean and standard deviation a. Compute the population mean and standard deviation using Excel’s descriptive statistics and compare them with the mean and standard deviation of your sample that you got in part 2f. Comment on the comparison. b. Compare the population mean with your confidence interval from 3c and comment on the comparison. 5. Use simple linear regression on the random variables of fat and calories a. Specify which is the independent variable and which is the dependent variable. b. Write down what you think the relationship is between the two variables. What do you believe regarding the strength of the correlation? (Is it strong or weak, positive, or negative?) c. Prepare a scatter diagram and least squares trend line, using your sample data. Compute the coefficient of correlation (r) and the coefficient of determination (r2). Refer to instructions in the files library in Canvas titled “Using Excel Chart Tools to Create a Scatter Diagram and Determine Regression Line”. d. Interpret your results and compare them to what you expected.  Format of the Report ➢ Project elements should be in numerical order and identified by number and letter. [example: 1a) The estimated mean fiber content for cereal is 3g.] ➢ The report should be typed using a word processor (single spaced, 12pt font) and calculations, graphs, random number tables, and charts should be prepared using excel
Answered Same Day Dec 09, 2021

Solution

Rajeswari answered on Dec 09 2021
140 Votes
48792 assignment
1. Take a guess
I selected the ca
ohydrates column and visually went through all the entries and my guess for average ca
ohydrates is 30.00 .
Variable name – Ca
ohydrates
Mean I guessed – 30
I guessed this mean because I thought most of the entries are around 30 ranging from roughly 20’s more and 30’s to 40’s more. SO I guess it would be approximately around 30
2. Construct a hypothesis test (and test with an appropriate sample size)
Base your hypothesis test on your ESTIMATED population mean in Step I
a) H_0: \bar x = 30
H_a: \bar x ≠30
H_0: Sample mean will be equal to 30 against
H_a: Sample mean will not be equal to 30
) Significance level I chose = 5% because I felt 1% is too low and 10% is too high. So 5% is reasonable for the hypothesis test.
c) I want a margin of e
or of 2. Standard deviation I got from the data given. It was equal to 10.412.
Since population standard deviation is calculated we can use Z test and hence critical value to be used is 1.96
Using the fact that margin of e
or = 1.96* std e
or <2
Or
So sample size > 113, hence sample size = 139, e
or <2, and significance level 5% and Z critical value 1.96 and population std deviation as 10.412
d) Since population standard deviation is calculated we can use Z test and hence critical value to be used is 1.96. Two tailed because alternate is not equal to.
e) Critical value for 95% z test is 1.96 from standard normal distribution table.
f) From 278 entries we have to select 138 items. I selected every alternate entry from the given table to get spread data and also to the required sample size.
Sample descriptive statistics is as follows:
Sample of 139 entries:
    Ca
ohydrates
    
    
    Mean
    30.53372
    Standard E
o
    0.864351
    Median
    26.244
    Mode
    26.1
    Standard Deviation
    10.19055
    Sample Variance
    103.8474
    Kurtosis
    8.11431
    Skewness
    2.165625
    Range
    77.7482
    Minimum
    12.2878
    Maximum
    90.036
    Sum
    4244.188
    Count
    139
    Confidence Level(95.0%)
    1.709085
g) Sample mean = 30.534g: Hypothesised mean = 30g
Mean difference = 0.534g
Std e
or = Population std dev/sq rt n =
Test statistic Z = Mean difference /std e
or = 0.6046
P value = 0.545179
Since p value >0.05, our significant value, we accept null hypothesis
There is statistical evidence at 5% significance level to prove that sample mean is 30 i.e. equal to population mean, hypothesized
3) Constructing a confidence interval:
a) Population mean estimate = sample mean = 30.534 g
For our significance level of 5% we get confidence level = 95%
) Confidence interval = (mean – margin of e
or, mean +margin of e
or)
= (mean – 1.96*std e
or, mean +1.96*std e
or)
= (30.534-1.7309, 30.534+1.7309)
= (28.8031, 32.2649)
c) We are 95% confident that for large samples randomly drawn sample mean will fall within this interval.
4) Compute the population mean and standard deviation using descriptive statistics
a) Population
    Mean
    30.90118
    Standard E
o
    0.624474
    Median
    26.95115
    Mode
    31.1454
    Standard Deviation
    10.41206
Sample
    Mean
    30.53372
    Standard E
o
    0.864351
    Median
    26.244
    Mode
    26.1
    Standard Deviation
    10.19055
We find that population mean is 30.901 against sample mean 30.534 which apparently shows both means are almost equal and the difference is very negligible. Next std deviation also is 10.412 for population against 10.191 in sample. This shows that sample accurately represents the population and the sample is randomly drawn.
) Confidence interval we had in 3f was = (28.8031, 32.2649)
This contains the population mean of 30.901 also showing that our confidence interval contains the population mean also.
5. Use simple linear regression on the random variables of fat and calories
a. Specify which is the independent variable and which is the dependent variable.
Normally the food we take and contents in it determine our fat. Hence we say that calories is independent and fat is dependent variable.
. Write down what you think the relationship is between the two variables. What do you believe regarding the strength of the co
elation? (Is it strong or weak, positive, or negative?)
By normal convention,...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here