LAB 1
Due date: Mon 02/26/24 by end of day.
Instructions: You are required to work individually. Use R Markdown and Knit to render a pdf or a word document. Upload the document back in Brightspace.
Reference: For this lab, we will be using the â€œOpenIntro Statistics: Labs for R" (by Andrew Bray et al), available under a Creative Commons Attribution-ShareAlike license, in the link below:
https:
nulib.github.io/kuyper-stat202
Structure: Lab contains 2 mandatory sections (Section 1 and Section 2) outlined below.
Section 1: This is just a practice (warm up) section (no need to submit Section 1).
Practice with R Studio by mimicking all the syntax given in Chapter 8- Introduction to Linear Regression (sec 8.1, 8.2, 8.3, 8.4).
Section 2: On Your Own (Submission required).
This section builds on Section 1 and it requires you to answer the following questions posed in sec 8.7 in Chapter 8 (Intro to Linear Regression) in â€œOpenIntro Statistics: Labs for R" (by Andrew Bray et al). Mark each question clearly using # and interpret the results.
1. Choose another traditional variable fromÂ mlb11Â that you think might be a good predictor ofÂ runs. Produce a scatterplot of the two variables and fit a linear model. At a glance, does there seem to be a linear relationship?
2. How does this relationship compare to the relationship betweenÂ runsÂ andÂ at_bats? Use the R22Â values from the two model summaries to compare. Does your variable seem to predictÂ runsÂ better thanÂ at_bats? How can you tell?
3. Now that you can summarize the linear relationship between two variables, investigate the relationships betweenÂ runsÂ and each of the other five traditional variables. Which variable best predictsÂ runs? Support your conclusion using the graphical and numerical methods weâ€™ve discussed (for the sake of conciseness, only include output for the best variable, not all five).
4. Now examine the three newer variables. These are the statistics used by the author ofÂ MoneyballÂ to predict a teams success. In general, are they more or less effective at predicting runs that the old variables? Explain using appropriate graphical and numerical evidence. Of all ten variables weâ€™ve analyzed, which seems to be the best predictor ofÂ runs? Using the limited (or not so limited) information you know about these baseball statistics, does your result make sense?
5. Check the model assumptions for the regression model with the variable you decided was the best predictor for runs.