Summer 2020 Semester Project Detailed Instructions
Due Date: I must receive your project no later than 11:59 PM on the due date in the syllabus.
Given the early advanced notice and the amount of in-class time devoted to this project, there
will be no extensions granted. Projects turned in after the deadline will receive a grade of 0.
Instructo
ACE Assistance: This project will be treated like a take-home final exam. This
means that I and the ACE tutors will be happy to answer general questions about these
instructions or about any of the techniques and procedures you need to use when completing
the project. However, we will not help you produce the required content, nor will we review
your report prior to submission to provide "feedback." You will be expected to do the analysis
and write the project report on your own. Please refer to the sample project in the Semester
Project module if you need an example of what a successful (100%) project looks like.
Collaboration between Classmates: This is not only allowed but encouraged. However, you
are not permitted to use the Semester Project discussion board to collaborate, and all
discussion threads of that nature will be deleted. If you collaborate, you may do so offline.
Please remember to submit your own report; duplicate reports will receive grades of 0 and will
e subject to sanction for academic dishonesty.
Document Format: You must submit your project report in a single file through Canvas. The
acceptable formats are Microsoft Word (*.xlsx) or PDF (*.pdf) â€“ no exceptions. The submission
page is in the Semester Project module. Projects submitted in multiple parts, in a format other than
Word or PDF, or via email/hardcopy will be rejected.
Style Requirements: The Semester Project module contains a sample project that would
eceive a 100% grade. Your report should be formatted similarly.
âˆ’ The first page of your report must be a title page containing your name, the course and
section number, the title "Semester Project," and the submission date.
âˆ’ Use a font suitable for an official business document. Any standard typeface is
acceptable as long as it is readable and presents a professional appearance (Cali
i and
Times New Roman are good examples, but not the only possibilities). The size should
e no smaller than 12 point, and the color should be black.
âˆ’ Do not include any borders, decorative images/illustrations, or watermarking.
âˆ’ Embed all graphics directly into your project file. I will not accept separate files
containing graphics.
Data Set: All students will use the same data set: Spring 2020 Semester Project Raw Data. The
data set is located in the StatCrunch MTH 245 Homework Group. The data come from a study
of the behavior of fruit flies. The two variables of interest are Percent Time Asleep and
Longevity (measured in days).
Technology Requirements: Except where required to build graphs or charts, all numerical
calculations must be performed using StatCrunch. Do not use a graphing calculator, Excel,
standard normal tables, or any other method for your numerical calculations.
Graphics Requirements: All graphics must be constructed using StatCrunch, Excel, or other
computer-based graphics program. Hand-drawn plots, cell phone pictures of graphics, etc., are
not acceptable. All graphics must include an informative title and (except for boxplots) co
ect
labels for both axes. Orient all boxplots horizontally.
Rounding Rules:
âˆ’ All upper and lower class bounds for the Section 1 histograms should be integers.
âˆ’ Round the two p-valuesâ€”one each in Sections 4 and 5â€”to three decimal places.
âˆ’ Round all other calculated values to one decimal place.
âˆ’ Add trailing zeroes to any rounded value as needed.
âˆ’ Do not simply paste screen shots of StatCrunch output into your report.
âˆ’ Warning: do not use the sample project as an example of how to round! The data used in that
particular project is different from yours and is therefore rounded differently!
Required Content: Organize your report in five separate sections using the following numbers
and titles. The required elements for each section are as follows:
Section 1 â€“ Visual Data Assessment. Create a histogram for each variable of interest: Percent
Time Asleep and Longevity. For Percent Time Asleep, use a "Start at:" value of 10 and a
"Width:" value of 10; for Longevity, use a "Start at:" value of 10 and a "Width:" value of 15.
It is not necessary to display frequency counts above the bars. For each histogram, include
a paragraph that answers each of the following questions:
a. Is the histogram symmetric, left-skewed, right-skewed, or uniform?
. How many peaks does the histogram have, and in which class(es) are they located
(must include the co
ect lower and upper bounds for each class listed)?
c. Does the histogram have any gaps between classes? If so, where are they?
Section 2 â€“ Descriptive Statistics.
a. For each variable, find the mean, range, variance, standard deviation, and five-
number summary. Display these numbers in a format that is easy to understand.
(Do not simply copy screencaps of the StatCrunch output!)
. Construct a regular boxplot for each variable. For each boxplot, include a
ief
statement containing an assessment of whether the data appear to be symmetric,
left-skewed, right-skewed, or uniform.
c. For each variable, construct a modified boxplot and use it to identify potential
outliers. If any exist, list them by value; if none exist, say so.
Section 3 â€“ Confidence Intervals. Construct a 95% confidence interval for the mean Î¼ of each
variable (two intervals total). Use alge
aic format for each interval (?????????? ?????????? < ??
?????????? ??????????). State the distribution you used for each interval (?? or normal).
Section 4 â€“ Hypothesis Test. Using the p-value method, conduct a formal hypothesis test of
the claim that ??, the mean longevity of fruit flies, is less than 57 days. Use ?? = 0.01. Include
the following in your written summary of the results:
a. Your null and alternate hypotheses in the proper format using standard notation.
. The type of distribution you used (?? or normal).
c. The p-value and its logical relationship to ?? (â‰¤ or >).
d. Your decision regarding the null hypothesis: reject or fail to reject.
e. A statement interpreting your decision: reject/fail to reject (or support/fail to
support) the original claim that the mean longevity of fruit flies is less than 57 days.
Note: Section 4 only applies to Longevity. There is no hypothesis test related to Percent
Time Asleep.
Section 5 â€“ Co
elation/Regression Analysis.
a. Construct a linear regression model with Percent Time Asleep as the predictor and
Longevity as the response. State the equation in co
ect alge
aic format as shown in
the course notes.
. Create a scatter plot of the data with a plot of the least squares line included.
(StatCrunch should generate this when you calculated the model in 5a.) The plot
must include an informative title and co
ect labels for both axes.
c. Use the coefficient of determination to identify the percentage of the variation in
Longevity explained by the variation in Percent Time Asleep.
d. Identify the following points (list them as ordered pairs in the form (Percent Time
Asleep, Longevity)). If none exist in a particular category, say so.
1) Outliers (all points with studentized residuals greater than 3.0 or less
than âˆ’3.0).
2) High-leverage points (all points with leverage greater than 0.048).
3) Likely influential points (all points with Cook's Distance > 1.0).
e. Conduct a formal hypothesis test at ?? = 0.05 to determine if there is sufficient
evidence of co
elation between Percent Time Asleep and Longevity. Include the
following:
1) The p-value and its logical relationship to ?? (â‰¤ or >).
2) Your decision regarding the null hypothesis: reject or fail to reject.
3) A statement regarding the sufficiency of the evidence for a linear relationship
etween Percent Time Asleep and Longevity.
f. State whether the equation in 5a satisfies the following LINE criteria (assume the
esiduals are independent):
Linear Relationship: Using the scatterplot with fitted line, determine if a linear
model is appropriate based on the model's visual fit to the data.
Normally-Distributed Residuals: Determine if the residuals fit a normal distribution
using a residual histogram and a Q-Q plot. (Do not use a boxplot.)
Equal Variances of the Residuals: Assess the residuals for constant variance using a
plot of the residuals versus Percent Time Asleep.
g. Use the results from 5e and 5f to determine if the model you built in 5a provides
valid estimates of Longevity as a function of Percent Time Asleep. Justify your
decision.
h. Provide a valid estimate of ????????, a new observation of Longevity for an individual
fruit fly whose Percent Time Asleep = 20. Either use the regression model you
constructed in 5a or calculate the value using the Longevity column by itself,
whichever is appropriate.
i. If you use the regression model from 5a to calculate the estimate in 5h, calculate a
95% prediction interval estimate of ????????. If the model in 5a is invalid, include a
statement that a prediction interval estimate is not applicable.