Case 2
Predicting the Salary of Defensive Linemen in the NFL
Spring 2019
In this case, I would like for you to build a model to predict the salary of a rookie defensive linemen in the NFL using all of the players in this sample data set that I have provided. This data set is made up of a sample of 49 players that are still in their first contract out of college.
Case Deliverables USE MINITAB 18
I. (10 Pts) Please submit an Executive Summary of the findings from your data analysis. Also, in the executive summary, provide your model and the significant variables. Discuss the adequacy of your model, and if there are any weakness in the model (refer to your model assumptions). How much of the variation in Y (Salary) does your model explain? How much variation does it not explain? What could be some other variables that you would like to have that could explain the variation that is unaccounted for? If a person could do more to increase his salary, what would that be (use your model to make this suggestion)? This section should be very concise and to the point.
II. (80 Pts) Create a predictive model using the 5-step methodology used in class and in the text. Step through the process and cut and paste your results from minitab in your report. Throughout your methodology, explain your thought-process and why you make the decisions you make. Give me a na
ative of why you are doing what you are doing. I am more interested in your process for model-building than the final model that you a
ive at.
Although we are interested in using the model for the prediction of “Salary”, we also want to be able to interpret the contribution of each independent variable in the model. Therefore, make sure you provide an interpretation of each coefficient that is in your final model. If you find any weaknesses in your final model after you do your residual analysis in Step 4, be sure to discuss these, and perhaps, suggest ways to overcome the weakness and make the model stronger.
Create this model using the variables provided (in Red below). Do not do Step 5. Step 5 is “Validation” and to do this you would have to go out and collect more data.
There is not a single co
ect answe
model for this assignment, so I do not expect any two teams to a
ive at the same model. If you happen to a
ive at the same model, then I wouldn’t expect you to take the same path to that specific model.
I retrieved most of this data from the website: www.pro-football-reference.com
Restrictions:
Use only the variables from their college tackle performance (Solo, Ast, Total, Loss, and Sack), League, Ht, Wt, and their combine results (40YD, Vertical, BenchReps, Broad Jump, 3Cone, and Shuttle). The objective is to use these variables to predict their annual salary for their first contract, which is in the column, Salary.
As you will recall, there was a significant number of players that did not participate in all of the combine activities/evaluations. Rather than throw out those players that did not participate in all of the activities, I assigned a mean value of that particular variable. Here are the results of the activities of the players that did participate in the combine events:
Variable
N
Mean
StDev
Minimum
Maximum
40YD
95
5.0975
0.1670
4.6800
5.6400
Vertical
78
29.103
2.779
20.500
35.000
Bench Reps
76
27.697
5.309
18.000
49.000
Broad Jump
75
105.63
5.15
92.00
118.00
3Cone
70
7.6681
0.2405
7.0700
8.2800
Shuttle
71
4.6300
0.1340
4.3700
4.9100
I’m not so concerned about the R-squared value of your final model, but I do think it will be fairly easy for you to get an R-squared value in the 70% plus range.
III. (5 Pts) Discuss any variable you might like to add to the above model that is not included in the dataset. Think about what other variable could help you explain more of the variability in Y (Salary).
IV. (5 Pts) From the model that you create, determine the most under-payed player and the most over-payed player. (Hint: use your residuals to make this determination). In other words, if you were the general manager for a team, how would you know if you got a good deal?
Use any of the descriptive statistics tools that you wish from the first portion of the class, such as dot plots and interval plots for categorical variables, and histograms and scatter plots for continuous variables, to make your case. Provide accurate interpretations of your findings and try to explain them in simple terms.
Use the same teams for this case as you did for Case 1. I think each person should try to build a model independently and then compare notes and present one model for Part II. Each team should submit one pdf document to Canvas. Please put your name on the front page. Again, I expect that each team member will contribute equally to the case analysis and write-up, but in the event that one team member does more than 50% of the work, please specify that in your write-up. I will make grade assignments accordingly.
And finally, the purpose of this case is not to make everyone a sports analyst. However, you should be able to visualize how you can do something similar in most any situation where you are trying to predict something, whether it is the demand of a commodity or how long a person is expected to stay in the hospital for a specific surgery. Regression is one of the most widely used tools in predictive analytics.
Here is a list of the data that is in the spreadsheet. Since we want to predict the players starting salary out of college, we will only look at the data that would be available up to that point. Obviously, the player would have built a record of performance in college by how many tackles he has made. Also we will his combine results. For those of you that might not be familiar with the NFL Combine, college players get invited to participate in a series of events so they can show-off their physical skills. These skills are demonstrated in 6 events that are meant to measure the players’ speed and agility. I have put those in Red below. If you want to know more about the combine, google “NFL Scouting Combine”, and you will find a plethora of information. Hopefully, I have provided enough information to describe what these variable are.
1. Player Identification Numbe
2. Year drafted
3. Player name
4. Position
5. Contract Length
6. Contract Total Dollars
7. Annual Salary
8. Years in the NFL
9. Approximate Value (I think this is the evaluation from a scouting organization)
10. College
11. College Solo Tackles
12. College Tackle Assists
13. Total College tackles
14. College Tackles for Loss
15. College Sacks
16. Cu
ent Team
17. Cu
ent League
18. Pick in the draft
19. Height
20. Weight
21. Combine 40 Yd time (How fast they run the 40 Yd Dash)
22. Combine vertical jump (How high they can jump)
23. Combine number of bench presses (the number of times they can bench press 225 pounds repetitively)
24. Combine Broad Jump (How far they can jump)
25. Combine 3 Cone (How fast they can run through a series of strategically placed cones)
26. Combine Shuttle (How fast they can move from side to side)
27. Draft Info
28. Games Played in 2017
29. Total tackles in 2017
30. Solos in 2017
31. Assists in 2017
32. Sacks in 2017
33. Career Games Played
34. Career Total Tackles
35. Career Solo Tackles
36. Career Assists
37. Career Sacks