Find two bivariate data sets (at least 30 data points, the more, the better), one that you think will have positive correlation and one that you think will have negative correlation.
I like to use this website (https://data.world/) to find data sets by searching topics I'm interested in. You can "sign in with google" using your go.shoreline email account. Sometimes finding a usable data set is the hardest part of doing this kind of analysis. Please let me know if you need help turning something you find on this website into a usable spreadsheet. I'm happy to help you with this.
Please note, do not use time as one of your variables. Find a data set where neither x nor y are measured in time.
Report what your data sets are into the WH08 Survey in Canvas. This will help me check in to see that your data sets will work and that you're using data sets unique to you. Save your data sets in Google Sheets, and send me a link by using the "share" button in the upper right corner (or send me a direct link to the data set you'd like to use if you need help converting it to a spreadsheet).
Along with sending me the links, answer these questions for each of your data sets in the survey. Be detailed.
- What is your x-variable and what is your y-variable?
- Why do you predict that the correlation will be positive or negative?
- How strong do you think the correlation will be: weak, medium, or strong? Why do you think so?
Task #2 (Due by Monday, June 8)
Do the following for each of your data sets, once I've approved the sets you'd like to use:
- Generate scatter plot of the entire data set andestimatethe line of best fit. Do not use the program to generate it.
- Describe the slope and intercept of your estimate for the line of best fit, and construct the equation of this line.
- Identify any outliers. Decide whether you will remove them or not, and justify your choice.
- Once your outliers are removed, generate a scatter plot of your reduced data set and estimate a new line of best fit.
- Compute the slope and intercept of your adjusted data set, and construct the regression line equation. Describe what each of these numbers tells you about your data set. Be specific and use units.
- Compute the correlation of your adjusted data set and describe what this number means.
- Choose one data point in your data set and compute its residual. Describe what this number means.
- Conjecture about the possible relationships between the variables. Is there cause and effect? Reverse Cause and Effect? Lurking Variables? Coincidence? Justify your conjecture.