MATH211 Spring 2020NAME:Favorite Celeity? Least Favorite Celeity?:Complete this exam on your own...

Question

MATH211 Spring 2020NAME:Favorite Celeity? Least Favorite Celeity?:Complete this exam on your own paper, and submit your work as a PDF into the Exam01 assignmentin Canvas. In case this doesn’t work, email me a copy at  XXXXXXXXXX. In order to computeintegrals, use only techniques that we have addressed in this class.Include this Honor Code statement in your submission:The work attached represents my own efforts to respond to the writing prompts. I did not use anyesources other than worksheets, lecture notes, the textbook, or tools posted in Canvas. I did not lookup answers on the internet nor did I get the answer from another person.Name: Signature: Date:PART 1 Do questions 1 - 3.1A Use dataset #1 posted in Exam 03 Send-Out Survey in Canvas. Be sure to read the description of thedataset and what we are trying to predict.(a) Choose the variable that you think is the most significant and use R to generate a linear regressionmodel for that variable.(b) Construct a confidence interval for the slope and explain what this means.1B Still using dataset #1 from Exam 03 Send-Out Survey, and still predicting the same variable:(a) Using R, create a regression model using the Backwards Elimination strategy. Eliminate variablesuntil you are left with three significant variables. Show your results as you eliminate variablesone-by-one.(b) Write the final equation for your regression model and explain what each slope means.(c) Check the conditions necessary for this regression model to be valid. Are you convinced that alinear model is appropriate?(d) Find the residual for the Data Point shown in the Canvas survey. Explain what this residualmeans.2 (a) Using R, continue to create a multiple regression model using the Forward Selection Strategy.Your final model should involve three predictor variables. Choose the variables one at a time. Ateach step, show the summary results from R. If you find that a variable is not significant, explainwhy, then do not use that variable, but choose a different one instead.(b) Write the final equation for your regression model and explain whether each variable increases odecreases the probability of the variable we are trying to predict.(c) Find the residual for the Data Point shown in the Canvas survey. Explain what this residualmeans in this context.pg. 1 of 3MATH211 Spring 2020PART 2 You must do one question from each section that will count towards Exam 3. You maychoose to do more than one. Any credit you earn on these questions will count as extra credit towardsyour previous exam scores.Exam01 Do at least one of these questions.Tree diagrams? Independence vs Mutually exclusive (record shop with information and computation?)Sampling techniques?Q1 You are in charge of keeping the catalogue for a large Jazz Record store, and you collect the followinginformation about your inventory:• 10% of the albums have only one solo musician.• 42% of the albums have a saxophone.• 82% of albums have a piano.• 34% of albums have both saxophone and piano.• Of the albums with only one solo musician, 77% of them have a piano.You choose a record at random from the store1. What is the probability that you choose an album that has either saxophone or piano?2. What is the probability that you choose an album with only one solo musician playing piano?3. Given that you chose an album with piano, what is the probability that it also has saxophone?4. Are the events Saxophone and Piano disjoint? Justify your answer.5. Are the events Saxophone and Piano independent? Justify your answer.Q2 For each of the following pairs of hypothetical data sets, decide which mean is larger and which standarddeviation is larger. Explain your reasoning.{price of menu items at a fancy restaurant in Seattleprice of menu items at a fast food restaurant in Seattle{Weight of pet catsWeight of pet dogs{Salary of teachers in SeattleSalary of tech workers in SeattleExam02 Do at least one of these questions. For both, refer to the data sets in Exam 03 Send-Out Survey inCanvas. Be sure toQ3 Invent a possible research question that can be answered using a hypothesis test on difference ofproportions using Data Set #3. Clearly state that question, then use your sample to cay out aformal hypothesis test. Analyze the types of eors and their consequences.pg. 2 of 3MATH211 Spring 2020Q4 Invent a possible research question that can be answered using a hypothesis test on difference of meansusing Data Set #4. Clearly state that question, then use your sample to cay out a formal hypothesistest. Analyze the types of eors and their consequences.pg. 3 of 3 Data Set #1Here is a dataset about the performance of Professional Baseball Teams from XXXXXXXXXX.aseballaw.githubusercontent.com/trevorpelletie2020Spring/masteaseball_lin.csv")Build a model that predicts the number of wins the team gets.Use your model to predict the number of wins for a team with the following stats:League = ALunsuns = 513on_base = 0.298atting_average = 0.339opp_runs = 698opp_on_base = 0.313opp_sluggig = 0.401%======================%Data Set #2Here is a data set showing Seattle information about Officer Involved Shootings.spdoisaw.githubusercontent.com/trevorpelletie2020Spring/mastespd_ois.csv")Build a model that predicts the probability that the subject was killed.Use your model to predict the probability of the subject being killed in the following situation:Officer: White Male, 8 years of experience, not injured.Subject: NonWhite Male, 25 years old, no weapon.%======================%Data Set #3 A "Tey Stop" is a rule in the US that allows police officers to iefly detain a person based on "reasonable suspicion" of involvement in criminal activity.  This is commonly known as "stop and frisk."  Here is information about Tey Stops in Seattleteyaw.githubusercontent.com/trevorpelletie2020Spring/mastetey_stops.csv")This is a very large data set, so before you use it, generate a sample with the code tey_sample y[sample(nrow(tey),N),].  Choose a suitable sample size N.Use your sample to build a model that predicts the probability that the subject was aested.%======================%Data Set #4Here is a data set showing education, crime, and political information about each US state.stateaw.githubusercontent.com/trevorpelletie2020Spring/mastestate_info.csv") Data DocumentationChurchesid variables church_id church identificationmodel variables volume volume in cubic meterslength length in meterswidth width in metersavg_height average height in meterssurface_area inside total surface area in square metersground_surface_areaground surface area in square meterseve_time Reveeration (Echo) time in secondsCerealmodel variables Shelf recommended grocery store display shelfCalories calories per servingProtien grams of protien per servingFat grams of fat per servingSodium miligrams of sodium per servingFiber grams of fiber per servingCaohydrates grams of caohydrates per servingSugars grams of sugar per servingPotassium miligrams of potassium per servingServing_size Number of cups per servingBaseballid variables Team Which teamYear which yeamodel variables League Either National League or American LeaguePlayoffs Inicates if the team made the playoffswins Games wonuns total runs (points)on_base how often players get on baseslugging how often players get a good hitatting_avg how often players get any hitopp_runs runs (points) scored by opponentsopp_on_base how often opponent gets on baseopp_slugging how often opponent gets a good hitState Info (data from 2014~2016)id variables State Which Statemodel variables median_household_incomemedian household incomeavg_teacher_salaryaverage teacher salarypct_hs_deg percent with High School Degreepct_unemployed percent unemployedpct_cities percent living in citiespct_nonwhite percent non-whitepct_trump percent voted for Trump in 2016 electioncrime_rate crimes per XXXXXXXXXXpeoplevcrime_rate violent crimes per XXXXXXXXXXpeoplehcrime_rate hate crimes per XXXXXXXXXXpeopleTey Stopsmodel variables officer_gender Gender of Officeofficer_race Race of Officer (reported as white or non-white)subject_gender Subject Percieved Gendesubject_race Subject Percieved Racesubject_age Subject Age Rangeweapon Subject Weaponfrisk_flag Was the subject frisked?aest_flag Ws the subject aested?Seattle PD Officer Involved in Shootingmodel variables officer_gender Gender of Officeofficer_race Officer Racespd_years Officer Years Experienceofficer_injured Was the officer Injured?subject_gender Subject Gendesubject_race Subject Racesubject_age Subject Agesubject_weapon Did the subject have a weapon?subject_fatal was the subject killed? Useful R Commands Here are a list of commands for R that will be useful for you on this test.  Text in all capital letters is text that you will edit for your specific problem. Generate and Count Subsets and Samples subset(DATA, CONDITION) #creates a subset of a data set based on a condition #For example: april nrow(DATA) #counts the total number of rows in a data setDATA[sample(nrow(DATA),SIZE),] #creates a random sample of a dataset of a determined size #For example: rsample sum(CONDITION) #counts the number of data points that meet a given condition #For example: sum(april$TMAX > 70) counts the number of april days whose high temperature was higher than 70.   Measure Statistics mean(DATA) #computes mean of a data setsd(DATA) #computes standard deviation of a data settable(DATA$VARIABLE) #creates a table showing levels in a column along with counts for those levelsprop.table(table(DATA$VARIABLE)) #creates a table showing levels in a column along with proportions for those levels.   Compute with Distributions pnorm(ZSCORE, 0, 1) #computes the area to the left of a given ZSCORE under the standard normal distributionqnorm(AREA, 0 ,1) #gives the zscore that contains a given AREA to the left.pt(TSCORE,DF) #computes the area to the left of a given TSCORE under the t-distribution with given Degrees of Freedomqt(AREA,df) #computes the tscore that contains a given AREA to the left under the t-distribution with given Degrees of Freedompchisq(CHISQUARE, DF) #Computes the area to the left of a chi-square test statistic with a given number of Degrees of Freedom Generate Regression Models lm(DATA$y ~ DATA$x1 + DATA$x2 + ...) #generates linear regression modelglm(DATA$y ~ DATA$x1 + DATA$x2 + ..., family = binomial) #generates generalized linear regression modelsummary(MODEL) #prints coefficient information and model measurements  Plot Scatterplots plot(DATA$y ~ DATA$x) #generates a scatter plot of x vs y.  Use MODEL$residuals for first term for residual plots.    MATH211 Spring 2020NAME:Favorite Celeity? Least Favorite Celeity?:Complete this exam on your own paper, and submit your work as a PDF into the Exam01 assignmentin Canvas. In case this doesn’t work, email me a copy at  XXXXXXXXXX. In order to computeintegrals, use only techniques that we have addressed in this class.Include this Honor Code statement in your submission:The work attached represents my own efforts to respond to the writing prompts. I did not use anyesources other than worksheets, lecture notes, the textbook, or tools posted in Canvas. I did not

Biswajit · Accepted Answer

Part 1 :
1A. (a)
summary(Model2)
Call:
lm(formula = wins ~ runs, data = baseball)
Residuals:
     Min       1Q   Median       3Q      Max 
-24.9326  -7.4241   0.6818   7.1930  24.6393 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22.260427   4.335213   5.135 4.34e-07 ***
runs         0.077279   0.005674  13.621                         6306.1 1153.8
- opp_on_base   1     199.2  6505.2 1164.8
- opp_runs      1    1963.4  8269.5 1265.6
- runs          1    4125.2 10431.3 1363.2
Step:  AIC=1151.79
wins ~ League + runs + on_base + batting_avg + opp_runs + opp_on_base
              Df Sum of Sq     RSS    AIC
- batting_avg  1       0.7  6306.8 1149.8
- League       1       2.7  6308.8 1150.0
- on_base      1      30.0  6336.1 1151.8
                      6306.1 1151.8
- opp_on_base  1     199.1  6505.2 1162.8
- opp_runs     1    3444.2  9750.3 1332.8
- runs         1    4129.5 10435.6 1361.3
Step:  AIC=1149.83
wins ~ League + runs + on_base + opp_runs + opp_on_base
              Df Sum of Sq     RSS    AIC
- League       1       3.5  6310.3 1148.1
                      6306.8 1149.8
- on_base      1      32.7  6339.5 1150.0
- opp_on_base  1     201.3  6508.1 1161.0
- opp_runs     1    3444.1  9750.9 1330.8
- runs         1    4238.7 10545.5 1363.7
Step:  AIC=1148.07
wins ~ runs + on_base + opp_runs + opp_on_base
              Df Sum of Sq     RSS    AIC
                      6310.3 1148.1
- on_base      1      39.1  6349.4 1148.7
- opp_on_base  1     200.4  6510.7 1159.2
- opp_runs     1    3783.4 10093.7 1343.4
- runs         1    4512.8 10823.1 1372.7
> Model4  Model4
Call:
lm(formula = wins ~ runs + on_base + opp_runs + opp_on_base, 
    data = baseball)
Coefficients:
(Intercept)         runs      on_base     opp_runs  opp_on_base  
   97.31711      0.09031     49.46939     -0.08505   -110.75203  
> summary(Model4)
Call:

MATH211 Spring 2020 NAME: Favorite Celebrity? Least Favorite Celebrity?: Complete this exam on your own paper, and submit your work as a PDF into the Exam01 assignment in Canvas. In case this doesn’t...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment