Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Project 1 Example 1 Project 1 Example 1 Introduction The client, Pima County, requested that we analyze a set of water quality data to determine if water quality samples from 2 different water quality...

1 answer below »
Project 1 Example 1
Project 1 Example 1
Introduction
The client, Pima County, requested that we analyze a set of water quality data to determine
if water quality samples from 2 different water quality meters were significantly different
from each other. One meter recently purchased by the county was cheaper than the other
and appeared to be giving readings that were different.The county requested we determine
if it was worth it to buy only the older more expensive water quality meters or if the cheap
ones would suffice.
30 pH samples for each water meter were provided by Pima County then analyzed in R
using a two-sample student t-test. All data and analyses are provided in Appendix I at the
end of this document.
Results and Discussion
We found that the two different
water quality meters were
significantly different (p = 0.031).
The old water quality data meter
had averages that were higher
than the new quality meter (0.28
units of pH on average) (See
Figure 1).
Recommendations
We recommend testing both water
quality meters against a known standard. The two meters are different in their
measurements, but it is unclear which one is co
ect at this time. Given that the average
difference was small between measurements it is possible that the change in accuracy
etween the two meters is unimportant for the use of the county and the significance of
this difference may be i
elevant for their purposes.
Without information on accuracy, or an idea of permissible level of inaccuracy from the
county, we cannot determine whether one water meter is better than the other.
Figure 1:Box plot of the data provided by the County
Appendix 1: R Script and Simulated Data
#Data Simulation
oldmeter <- data.frame(x = rnorm(n = 30, mean = 6.5, sd = 1), y = "Old Meter"
)
newmeter <- data.frame(x = rnorm(n = 30, mean = 6.2, sd = 1), y = "New Meter"
)
meter <-
ind(oldmeter, newmeter)

#P value
ound(t.test(oldmeter$x, newmeter$x, alternative = "two.sided")$p.value,3)

#Effect Size
ound(mean(oldmeter$x) - mean(newmeter$x),2)

#Figure
oxplot(x~y, data = meter, col = "blue", main = "Boxplot of pH Data", xlab =
"", ylab = "Measured pH")
    Introduction
    Results and Discussion
    Recommendations
    Appendix 1: R Script and Simulated Data

Learning Outcomes
This project is meant to demonstrate that you are capable of doing the following:
· Create a fake dataset (or find a real one) in R that follows the format required for a one or two-sample t-test.
· Conduct a one or two-sample t-test in R on a dataset
· Interpret p-values
· Create graphics in R of univariate data
· Describe results and implications of a two-sample t-test
· Identify scenarios where a two-sample t-test would be pertinent
Additionally, you will be reviewing the work of other students. In your reviews you are expected to demonstrate mastery over the following learning outcomes:
· Identify mistakes in R that could lead to code not running
· Identify misinterpretations of p values or effect sizes
· identify misapplications of statistical tests
Project Requirements
The project itself is a mock data analysis project for a client of your choosing. You will make up the data (or find real data on the Internet to use), make up the clients and their question, and then write a short paper as if it were a real analysis for the client. There are several examples available on D2L for you to model your analyses after. Your paper should be written in third person past tense, as this is typical for analytics projects.
For full credit you must have all of the following pieces:
· data (either real or created by you) that we have not used in class
· if the data is real, you need to provide a link to it as your appendix
· if the data is fake, you need to provide your code for its creation
· if the data is fake, but you made it by hand, you need to provide it as a nicely formatted table
· a figure showing that data in a way that is meaningful for the test you are running with:
· legible and meaningful titles and axis labels
· a caption describing the figure
· no issues with X axes not lining up etc.
· histograms, density diagrams, box plots, dot plots – all are fine just make sure you use them co
ectly
· an R script showing all of your steps for analyses and figure creation
· which runs with only a file path change
· which accurately replicates your results
· one of the following analyses used and described co
ectly, with the co
ect alternative specified in your code:
· a one sample student t-test
· a two sample student t-test
· a p value interpreted co
ectly, and rounded to no more than three decimal places
· some measure of effect size rounded to no more than three decimal places, like differences between the averages of two samples for difference between the average of a sample and the expected population mean
· a one page written report in size 12 font that contains the following sections:
· Introduction: a section introducing the clients requesting the project and a little bit of information about the project itself
· Results and Description: discussing the method of analyses used, the results, and a graphic of the data
· Recommendations: a section dictating what the client should do next, depending on what their hypothetical question was
· the one page paper does not include the data or the R script in the page length requirements. The R script and the data should be pasted on as consecutive pages in one document.
Peer Review Requirements
In addition to creating a one page write up, you will also be required to review the project of two other students. The peer reviews of your classmates will not be used in the calculation of your project grade. Instead your grade will come from your own project, and the two peer reviews that you provide.
In your review you will evaluate each other’s projects for accuracy and clarity. You will be asked to evaluate each other’s projects on:
· whether or not they did the analysis co
ectly
· whether or not they made a recommendation that was in line with their data
· whether or not their figure was appropriate for their data set
All comments should be constructive and polite and areas of confusion should be pointed out kindly.
The goal of the peer review process is to help you learn how to critique other people’s work and identify weaknesses or failures in their analyses. As such if the section leaders or myself determined that there are flaws in the data analysis of a project and your peer review does not pick it up your peer review will be graded down.
Ru
ics
The following ru
ics are meant as guidelines but may change once papers have been submitted, if there is a common problem that I did not anticipate but that I feel is fair to grade students down on.
Project Ru
ic
Those items indicated in the peer review column are ones that peer reviewers will be expected to look at. If not indicated, the peer reviewer is not expected to evaluate that item. If you have helpful feedback on that item that is acceptable, but you do not have to provide it.
    Ru
ic Item
    Peer review?
    High Score
    R script is provided as an appendix and runs without major modifications other than changing the file path name
    Yes
    5
    Code for the analyses and figures are provided in the R script
    Yes
    5
    Data, or a link to the data are provided and can be replicated easily
    Yes
    3
    A figure is provided with an informative caption
    
    2
    The figure is legible and professional, and has meaningful labels
    
    2
    There is not a figure that would have been a significantly better choice (i.e. you put two histograms on top of each other and now we cannot see the data in the back, when you should’ve used two separate rows for each histogram)
    Yes
    2
    The figure axes are adjusted for maximum legibility i.e. if you have two panels they should both have the same limits and
eaks in a histogram
    Yes
    2
    The figure is called out inline in the text (i.e. see distributions in figure 1)
    
    1
    The paper is only one page long, and has not obviously changed the margins or the font or the font size in order to meet requirements
    
    2
    The paper has the three sections required and the information in each section is reflective of the section heading
    
    3
    The introduction describes the client and the data and the question
    Yes
    3
    The introduction specifically states what statistical test is being used
    Yes
    1
    The statistical test is co
ect for the data used (i.e. when you are comparing a sample to a population mean, you use a one sample t-test)
    Yes
    3
    The statistical test is run co
ectly in the R script, and is the statistical test listed in the introduction
    Yes
    2
    The statistical test code specifies what alternative the test is, and this is accurate according to the question in the paper (i.e. if the question is whether sample two is bigger than sample one, the alternative should be greater and the samples should be in the right order)
    Yes
    2
    The results and discussion provide a P value with no more than three decimal places (i.e. 0.001, not XXXXXXXXXX)
    
    2
    The results and discussions provide a measure of effect size with no more than three decimal places (i.e. sample one is .567 cm longer than sample two on average)
    
    2
    The results and discussion section interprets the P value co
ectly for the statistical test run
    Yes
    3
    The recommendation section co
ectly advocates for action or inaction depending on the question asked and the P value received
    Yes
    3
    The overall report is written in a professional tone in third person past tense
    
    2
    If there are grammar or spelling mistakes they do not interfere with the legibility of the pape
    
    3
    Total:
    
    50
Note: because the peer review process is important, papers that are turned in late will be docked up to 15 points, five points for every day it is delayed.
Peer Review Ru
ic
    Ru
ic Item
    Points
    Mistakes in the analyses are identified and described if present (i.e. if the paper ran a one sample t-test but should have done a two sample t-test)
    3
    Mistakes in the interpretation of the P value or the effect size are identified and described if present
    2
    E
ors in the recommendations are identified and described if present
    1
    Issues in the figure like legible axes or inappropriate uses of histograms are identified and described if present
    2
    Issues in the R script running co
ectly or the data set being easy to use are identified and described if present
Answered Same Day Oct 01, 2021

Solution

Sudharsan.J answered on Oct 02 2021
139 Votes
Project-1
Introduction:
A dataset is based on the factors on forest fires in Northeast Portugal. Which includes weather factors and categorical variables months, haif year and weeks days. 14 variables, 517 observations. Here we determine whether there is high swing of wind in 1st half year when compared with swing of wind in 2nd half year. 1st half year indicates month of January-June, whereas 2nd half year indicated month of July- Dec.
The data and code used for analysis are provided in appendix-1 at the end of the document.
Test of Hypothesis:
Ho: ρ=0
HA: ρ>0, level of significance=0.05
Null hypothesis: there was no significant...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here