Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

NFL Project Analysis The goal is to select appropriate models and model specifications, and apply the respective methods to enhance data-driven decision making related to the business problem. Format:...

1 answer below »

NFL Project Analysis

The goal is to select appropriate models and model specifications, and apply the respective methods to enhance data-driven decision making related to the business problem.

Format:RMarkdown (word) – using RStudio

Datasets:

Starts with an initial data collection which we have four datasets as follows below:

NFL Arrests XXXXXXXXXX:

https://www.kaggle.com/patrickmurphy/nfl-arrests

NFL Trends Over Time:

https://www.kaggle.com/dasbootstrapping/nfl-trends-over-time

NFL Passing Statistics:

2009-2018https://www.kaggle.com/omzqwonxei/nfl-passing-statistics XXXXXXXXXX

Detailed NFL Play-by-Play Data:

XXXXXXXXXXhttps://www.kaggle.com/maxhorowitz/nflplaybyplay2009to2016

Requirements:

oThe paper should be 8-10 pages in length not including figures and tables.

oStart with a one paragraph abstract, followed by an intro/background of the problem, methods, results, discussion/conclusion and acknowledgments, references, in that order.

oList the resources and reference all sources you used to complete the project.

oAnalysis Requirements need predictive modeling and graphing: (display historical facts and predict in the next couple of years):

A.High School vs. College players: who performs better in scoring and yards

B.Who is more likely to be arrest and during what year High School vs. College players

C.Percentage of completions per attempt

D.Average yards gained per attempt

E.Percentage of touchdown passes per attempt

F.Percentage of interceptions per attempt

G.Average Production vs. Experience for NFL Players:example is on Kaggle

H.Do the more passes quarterback throws improve game winning stats


Answered Same Day Sep 22, 2021

Solution

Pritam answered on Sep 26 2021
130 Votes
Untitled
Untitled
Untitled
26 September 2019
Required li
aries:
li
ary(rpart.plot)
li
ary(caTools)
li
ary(ggpu
)
li
ary(scatterplot3d)
li
ary(reshape2)
li
ary(lu
idate)
li
ary(scales)
li
ary(plotly)
li
ary(shiny)
li
ary(tidyverse)
li
ary(feather)
li
ary(readr)
li
ary(dplyr)
li
ary(ggplot2)
li
ary(ggthemes)
li
ary(gganimate)
li
ary(gifski)
li
ary(png)
li
ary(transformr)
li
ary(treemapify)
Abstract:
The full research has been done on some particular data set regarding the players of the NFL from 2000 to 2017. One can expect the motive of this project is to give some useful information to the team management or any team franchise who are keen to take some player on their fitness or criminal record basis. The clearer the player is the more the chance of the team management to consider him as a potential player, specifically, a valuable and dedicated team of the team. Not only just team management requires such information to choose a player, there are huge betting or pools come along in the season of upcoming matches and hence a clarified and informative research on the players could be helpful to lots of areas. The passing stats, the criminal record, playing strategies lots of data have been taken regarding this and hence required huge determination to come along with some models which could explain the scoring potential of a team or a particular player. The visualization and the predictive modeling part has been introduced sequentially to have a better grip of the data and further interpretability of the results.
Introduction:
Its football season, people are gearing up for weekly games and some are participating in football pools. Data can help with football pools, in fact, it can show football fans statistically how good or poorly their teams perform due to injuries or environmental elements. Do fans purely based their information on the previous season stats including core efficiency rates, turnover rates, penalty rates, and who was selected in the cu
ent NFL draft? NFL and MLB team’s employee data scientist to track statistical information daily, convert it to insights and send it upstream to stakeholders. This is the goal of our project to understand what they analyze and how to not just replicate the storytelling but improve the element they may not include or thought of analyzing.
Methods:
Basically, the methods we have applied here are mainly visualization and regression for predictive modeling. Regression analysis is the statistical method which actually determines the impact of other variable on a particular variable called response variable. The predictor variables that impact or explain the variance of the response variable are taken through a procedure. There are different measures for evaluating the model and they are R-squared, goodness of fit, etc. but before getting into the analysis one should be careful about the assumptions of the regression models. The assumptions are very vital to remember since violating the assumptions one could have serious implications in the inference part and thus the results can’t be relied upon as they might be e
oneous.
NFL A
est data analysis:
Firstly, we proceed to analyze the a
est data analysis and the best way to do so is to check the data by visualization technique. So, the top 25 teams have been a
anged based on the a
est activities.
d1 = read.csv("A
estIncidents.csv", header = T)
d1$DATE = as.Date(d1$DATE, "%m/%d/%Y")
f1 = as.data.frame(table(d1$TEAM))
f2 = head(a
ange(f1,desc(Freq)), n = 25)
head(f2)
## Var1 Freq
## 1 MIN 49
## 2 DEN 47
## 3 CIN 44
## 4 TB 36
## 5 TEN 36
## 6 IND 35
f3 = rename(f2, Team = Var1, A
est_count = Freq)
p1 = ggplot(f3, aes(x = reorder(factor(Team), -A
est_count), y = A
est_count))+
geom_bar(stat = "identity", fill = "#FF6678") +
ggtitle("Frequency of A
est by Team")+
xlab("Teams")
p1
From the graph one can see that Minnesota is at the topmost position as far as criminal activity or the a
est incidents are concerned for the NFL teams. New York, on the other hand, can be seen to pop up at the end section of the first 25 teams. Also, the topmost team has a frequency rate of 49 as and that for the other top three teams seem to be quite close to that.
A
ests based on position:
f4 = group_by(d1, POSITION)
f5 = summarise(f4, count = n())
pos_plot = ggplot(f5, aes(x = POSITION, y = count)) + geom_bar(stat= "identity", fill = "#CC99CC") + ylab("Count of A
est") +ggtitle("Frequency of A
est by Position")
pos_plot
From the graph it is quite obvious that WIDE RECEIVERS get a
ested the most. But this result seems to be biased for some obvious reasons as the NFL teams contain players who are mostly Wide Receivers. So, one could check the percentage frequency of for the position based analysis.
The year when the most a
est took place:
Since the data contains date as a potential attribute, one could have the question in mind that how time actually impacts a
est in some way. The most important thing that comes into mind when calculating such time series related things, is the trend present in the data. Since the a
est data only contains the date as a single parameter we need to extract the year and month for the analysis.
d1$YEAR = format(as.Date(d1$DATE, format="%m/%d/%Y"),"%Y")
f8 = group_by(d1, YEAR)
f9 = summarise(f8, count = n())
year_plot = ggplot(f9, mapping = aes(x = YEAR, y = count, group = 1)) +
geom_line(color = "#993399", size= 2) +
geom_point(color = "#FFCC00", size = 3) +
ggtitle("A
est by Year")
year_plot
One can see that there are almost 70 a
ests throughout the year of 2006 and that is the highest also for the range of the analysis...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here