Solution
Sudharsan.J answered on
Jun 11 2021
On Your Own:
Question1:
fit<-lm(runs~homeruns,data = mlb11)
summary(fit)
output:
Call:
lm(formula = runs ~ homeruns, data = mlb11)
Residuals:
Min 1Q Median 3Q Max
-91.615 -33.410 3.231 24.292 104.631
Coefficients:
Estimate Std. E
or t value Pr(>|t|)
(Intercept) 415.2389 41.6779 9.963 1.04e-10 ***
homeruns 1.8345 0.2677 6.854 1.90e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard e
or: 51.29 on 28 degrees of freedom
Multiple R-squared: 0.6266, Adjusted R-squared: 0.6132
F-statistic: 46.98 on 1 and 28 DF, p-value: 1.9e-07
Observation:
The above table says that runs~ homeruns are significant and good relationship.
Based on the test of significance and adjusted R-squared value it is found model.1 to be “best model” when compared to other model with adj-R square (0.6132).
from the above output table---, for the best model f-statistic is reported as 46.98 on 1 and 28 degrees of freedom, with p-values <0.0001(9e-07)
Since the p-value is less than 0.05, it indicates the variable is statistically significant and R2 that tell us how well a model build represents the given data. here,R-squared value is 0.6132, which say the fitted model is best fit.
Scatterplot output actual vs predict:
ggplot(mlb11,aes(x = runs,y = pred))+geom_point(size=1.2,col="blue",shape="circle")+
geom_smooth(method = "lm",se = T,col="red")
From the above scatterplot we can conclude that runs vs predicted having good relationship between each other.
Question2:
fit<-lm(runs~homeruns+hits,data = mlb11)
summary(fit)
Call:
lm(formula = runs ~ homeruns + hits, data = mlb11)
Residuals:
Min 1Q Median 3Q Max
-47.134 -24.852 0.975 19.706 64.234
Coefficients:
Estimate Std. E
or t value Pr(>|t|)
(Intercept) -228.29781 97.99600 -2.330 0.0275 *
homeruns 1.23374 0.18748 6.581 4.65e-07 ***
hits 0.52147 0.07662 6.806 2.61e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard e
or: 31.7 on 27 degrees of freedom
Multiple R-squared: 0.8625, Adjusted R-squared: 0.8523
F-statistic: 84.68 on 2 and 27 DF, p-value: 2.33e-12
Runs vs At_bats output:
fit1<-lm(runs~at_bats,data=mlb11)
summary(fit1)
Call:
lm(formula = runs ~ at_bats, data = mlb11)
Residuals:
Min 1Q Median 3Q Max
-125.58 -47.05 -16.59 54.40 176.87
Coefficients:
Estimate Std. E
or t value Pr(>|t|)
(Intercept) -2789.2429 853.6957 -3.267 0.002871 **
at_bats 0.6305 0.1545 4.080 0.000339 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard e
or: 66.47 on 28 degrees of freedom
Multiple R-squared: 0.3729, Adjusted R-squared: 0.3505
F-statistic: 16.65 on 1 and 28 DF, p-value: 0.0003388
From the above two output we came to know the model1 gives the best model.
Based on the test of significance and adjusted R-squared value it is found model.1 to be “best model” when compared to other model with adj-R square (0.8523).
from the above output table---, for the best model f-statistic is reported as 86.48 on 2 and 27 degrees of freedom, with p-values <0.0001(2.33e-12)
Since the p-value is less than 0.05, it indicates the variable is statistically significant and R2 that tell us how well a model build represents the given data. here,R-squared value is 0.8523, which say the fitted model is best...