MATH211 Spring 2020
NAME:
Favorite Cele
ity? Least Favorite Cele
ity?:
Complete this exam on your own paper, and submit your work as a PDF into the Exam01 assignment
in Canvas. In case this doesn’t work, email me a copy at XXXXXXXXXX. In order to compute
integrals, use only techniques that we have addressed in this class.
Include this Honor Code statement in your submission:
The work attached represents my own efforts to respond to the writing prompts. I did not use any
esources other than worksheets, lecture notes, the textbook, or tools posted in Canvas. I did not look
up answers on the internet nor did I get the answer from another person.
Name: Signature: Date:
PART 1 Do questions 1 - 3.
1A Use dataset #1 posted in Exam 03 Send-Out Survey in Canvas. Be sure to read the description of the
dataset and what we are trying to predict.
(a) Choose the variable that you think is the most significant and use R to generate a linear regression
model for that variable.
(b) Construct a confidence interval for the slope and explain what this means.
1B Still using dataset #1 from Exam 03 Send-Out Survey, and still predicting the same variable:
(a) Using R, create a regression model using the Backwards Elimination strategy. Eliminate variables
until you are left with three significant variables. Show your results as you eliminate variables
one-by-one.
(b) Write the final equation for your regression model and explain what each slope means.
(c) Check the conditions necessary for this regression model to be valid. Are you convinced that a
linear model is appropriate?
(d) Find the residual for the Data Point shown in the Canvas survey. Explain what this residual
means.
2 (a) Using R, continue to create a multiple regression model using the Forward Selection Strategy.
Your final model should involve three predictor variables. Choose the variables one at a time. At
each step, show the summary results from R. If you find that a variable is not significant, explain
why, then do not use that variable, but choose a different one instead.
(b) Write the final equation for your regression model and explain whether each variable increases o
decreases the probability of the variable we are trying to predict.
(c) Find the residual for the Data Point shown in the Canvas survey. Explain what this residual
means in this context.
pg. 1 of 3
MATH211 Spring 2020
PART 2 You must do one question from each section that will count towards Exam 3. You may
choose to do more than one. Any credit you earn on these questions will count as extra credit towards
your previous exam scores.
Exam01 Do at least one of these questions.
Tree diagrams? Independence vs Mutually exclusive (record shop with information and computation?)
Sampling techniques?
Q1 You are in charge of keeping the catalogue for a large Jazz Record store, and you collect the following
information about your inventory:
• 10% of the albums have only one solo musician.
• 42% of the albums have a saxophone.
• 82% of albums have a piano.
• 34% of albums have both saxophone and piano.
• Of the albums with only one solo musician, 77% of them have a piano.
You choose a record at random from the store
1. What is the probability that you choose an album that has either saxophone or piano?
2. What is the probability that you choose an album with only one solo musician playing piano?
3. Given that you chose an album with piano, what is the probability that it also has saxophone?
4. Are the events Saxophone and Piano disjoint? Justify your answer.
5. Are the events Saxophone and Piano independent? Justify your answer.
Q2 For each of the following pairs of hypothetical data sets, decide which mean is larger and which standard
deviation is larger. Explain your reasoning.
{
price of menu items at a fancy restaurant in Seattle
price of menu items at a fast food restaurant in Seattle{
Weight of pet cats
Weight of pet dogs{
Salary of teachers in Seattle
Salary of tech workers in Seattle
Exam02 Do at least one of these questions. For both, refer to the data sets in Exam 03 Send-Out Survey in
Canvas. Be sure to
Q3 Invent a possible research question that can be answered using a hypothesis test on difference of
proportions using Data Set #3. Clearly state that question, then use your sample to ca
y out a
formal hypothesis test. Analyze the types of e
ors and their consequences.
pg. 2 of 3
MATH211 Spring 2020
Q4 Invent a possible research question that can be answered using a hypothesis test on difference of means
using Data Set #4. Clearly state that question, then use your sample to ca
y out a formal hypothesis
test. Analyze the types of e
ors and their consequences.
pg. 3 of 3
Data Set #1
Here is a dataset about the performance of Professional Baseball Teams from XXXXXXXXXX.
aseball<-read.csv("https:
aw.githubusercontent.com/trevorpelletie
2020Spring/maste
aseball_lin.csv")
Build a model that predicts the number of wins the team gets.
Use your model to predict the number of wins for a team with the following stats:
League = AL
unsuns = 513
on_base = 0.298
atting_average = 0.339
opp_runs = 698
opp_on_base = 0.313
opp_sluggig = 0.401
%======================%
Data Set #2
Here is a data set showing Seattle information about Officer Involved Shootings.
spdois<-read.csv("https:
aw.githubusercontent.com/trevorpelletie
2020Spring/maste
spd_ois.csv")
Build a model that predicts the probability that the subject was killed.
Use your model to predict the probability of the subject being killed in the following situation:
Officer: White Male, 8 years of experience, not injured.
Subject: NonWhite Male, 25 years old, no weapon.
%======================%
Data Set #3
A "Te
y Stop" is a rule in the US that allows police officers to
iefly detain a person based on "reasonable suspicion" of involvement in criminal activity. This is commonly known as "stop and frisk." Here is information about Te
y Stops in Seattle
te
y<-read.csv("https:
aw.githubusercontent.com/trevorpelletie
2020Spring/maste
te
y_stops.csv")
This is a very large data set, so before you use it, generate a sample with the code te
y_sample <- te
y[sample(nrow(te
y),N),]. Choose a suitable sample size N.
Use your sample to build a model that predicts the probability that the subject was a
ested.
%======================%
Data Set #4
Here is a data set showing education, crime, and political information about each US state.
state<-read.csv("https:
aw.githubusercontent.com/trevorpelletie
2020Spring/maste
state_info.csv")
Data Documentation
Churches
id variables church_id church identification
model variables volume volume in cubic meters
length length in meters
width width in meters
avg_height average height in meters
surface_area inside total surface area in square meters
ground_surface_areaground surface area in square meters
eve
_time Reve
eration (Echo) time in seconds
Cereal
model variables Shelf recommended grocery store display shelf
Calories calories per serving
Protien grams of protien per serving
Fat grams of fat per serving
Sodium miligrams of sodium per serving
Fiber grams of fiber per serving
Ca
ohydrates grams of ca
ohydrates per serving
Sugars grams of sugar per serving
Potassium miligrams of potassium per serving
Serving_size Number of cups per serving
Baseball
id variables Team Which team
Year which yea
model variables League Either National League or American League
Playoffs Inicates if the team made the playoffs
wins Games won
uns total runs (points)
on_base how often players get on base
slugging how often players get a good hit
atting_avg how often players get any hit
opp_runs runs (points) scored by opponents
opp_on_base how often opponent gets on base
opp_slugging how often opponent gets a good hit
State Info (data from 2014~2016)
id variables State Which State
model variables median_household_incomemedian household income
avg_teacher_salaryaverage teacher salary
pct_hs_deg percent with High School Degree
pct_unemployed percent unemployed
pct_cities percent living in cities
pct_nonwhite percent non-white
pct_trump percent voted for Trump in 2016 election
crime_rate crimes per XXXXXXXXXXpeople
vcrime_rate violent crimes per XXXXXXXXXXpeople
hcrime_rate hate crimes per XXXXXXXXXXpeople
Te
y Stops
model variables officer_gender Gender of Office
officer_race Race of Officer (reported as white or non-white)
subject_gender Subject Percieved Gende
subject_race Subject Percieved Race
subject_age Subject Age Range
weapon Subject Weapon
frisk_flag Was the subject frisked?
a
est_flag Ws the subject a
ested?
Seattle PD Officer Involved in Shooting
model variables officer_gender Gender of Office
officer_race Officer Race
spd_years Officer Years Experience
officer_injured Was the officer Injured?
subject_gender Subject Gende
subject_race Subject Race
subject_age Subject Age
subject_weapon Did the subject have a weapon?
subject_fatal was the subject killed?
Useful R Commands
Here are a list of commands for R that will be useful for you on this test. Text in all capital letters is text that you
will edit for your specific problem.
Generate and Count Subsets and Samples
subset(DATA, CONDITION) #creates a subset of a data set based on a condition
#For example: april <- subset(sea, month == "4")
nrow(DATA) #counts the total number of rows in a data set
DATA[sample(nrow(DATA),SIZE),] #creates a random sample of a dataset of a determined
size
#For example: rsample <- april[sample(nrow(april),50),]
sum(CONDITION) #counts the number of data points that meet a given condition
#For example: sum(april$TMAX > 70) counts the number of april days whose high
temperature was higher than 70.
Measure Statistics
mean(DATA) #computes mean of a data set
sd(DATA) #computes standard deviation of a data set
table(DATA$VARIABLE) #creates a table showing levels in a column along with counts
for those levels
prop.table(table(DATA$VARIABLE)) #creates a table showing levels in a column along
with proportions for those levels.
Compute with Distributions
pnorm(ZSCORE, 0, 1) #computes the area to the left of a given ZSCORE under the
standard normal distribution
qnorm(AREA, 0 ,1) #gives the zscore that contains a given AREA to the left.
pt(TSCORE,DF) #computes the area to the left of a given TSCORE under the
t-distribution with given Degrees of Freedom
qt(AREA,df) #computes the tscore that contains a given AREA to the left under the
t-distribution with given Degrees of Freedom
pchisq(CHISQUARE, DF) #Computes the area to the left of a chi-square test statistic
with a given number of Degrees of Freedom
Generate Regression Models
lm(DATA$y ~ DATA$x1 + DATA$x2 + ...) #generates linear regression model
glm(DATA$y ~ DATA$x1 + DATA$x2 + ..., family = binomial) #generates generalized
linear regression model
summary(MODEL) #prints coefficient information and model measurements
Plot Scatterplots
plot(DATA$y ~ DATA$x) #generates a scatter plot of x vs y. Use MODEL$residuals for
first term for residual plots.
MATH211 Spring 2020
NAME:
Favorite Cele
ity? Least Favorite Cele
ity?:
Complete this exam on your own paper, and submit your work as a PDF into the Exam01 assignment
in Canvas. In case this doesn’t work, email me a copy at XXXXXXXXXX. In order to compute
integrals, use only techniques that we have addressed in this class.
Include this Honor Code statement in your submission:
The work attached represents my own efforts to respond to the writing prompts. I did not use any
esources other than worksheets, lecture notes, the textbook, or tools posted in Canvas. I did not