Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Read in a data set and clean it up, describing your process as you go. Produce a minimum of four visual summaries Apply an appropriate modelling strategy to help answer your proposed question....

1 answer below »
  1. Read in a data set and clean it up, describing your process as you go.
  2. Produce a minimum of four visual summaries
  3. Apply an appropriate modelling strategy to help answer your proposed question.
  4. Describe and diagnose your models.
  5. Explain how you’ve answered your questions with the data from your models and plots.
Answered Same Day Sep 24, 2021 Monash University

Solution

Subhanbasha answered on Sep 25 2021
153 Votes
R Notebook
R Notebook
# installing packages
# install.packages("devtools")
# install.packages("dplyr")
# install.packages("caret")
# install.packages("Metrics")
# devtools::install_github("Saraswathi-Analytics/R/SA")
# calling packages
li
ary(devtools)
## Loading required package: usethis
li
ary(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
li
ary(SA)
li
ary(caret)
## Loading required package: lattice
## Loading required package: ggplot2
li
ary(Metrics)
##
## Attaching package: 'Metrics'
## The following objects are masked from 'package:caret':
##
## precision, recall
Reading data, cleaning and procedure: Generally we read the data into R from the Excel and then we check for dimensions. Here in the data we have 45629 records and 17 features. We have the several data types in the data so, we need to change the data types as we required which is acceptable for the modeling.
In data we have NA values so we can replace the null values with zero’s but then we loss the model accuracy so, here we are replacing the NA values with co
esponding column mean value. First we need to aggregate the data by country and date to get the unique data set. Then we go for some assumptions of the regression we will check these assumptions by plotting the features.
If it is satisfy all the assumptions then we will check for the co
elation between the features. Here we are going to predict the positive cases by using all other variables as independent variables. Then we use co
elation matrix for knowing which variable is highly related to the dependent variable which is positive cases. So, now we select some of the useful features for modeling.
Here we will build two or three models by changing the parameters and we will check for the accuracy of each model then finally we wills elect the one model which is giving good amount of accuracy.
# reading data
df <- read.csv('covid-wongwqap-adbfl2oa.csv', header = TRUE, sep = ",")
# changing the date format
df$date <- as.Date(df$date, format = "%d-%m-%Y")
# first six records of the data
head(df)
## location date total_cases new_cases total_deaths new_deaths
## 1 Afghanistan 2019-12-31 0 0 0 0
## 2 Afghanistan 2020-01-01 0 0 0 0
## 3 Afghanistan 2020-01-02 0 0 0 0
## 4 Afghanistan 2020-01-03 0 0 0 0
## 5 Afghanistan 2020-01-04 0 0 0 0
## 6 Afghanistan 2020-01-05 0 0 0 0
## total_cases_per_million new_cases_per_million total_deaths_per_million
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## new_deaths_per_million population population_density median_age aged_65_olde
## 1 0 38928341 54.422 18.6 2.581
## 2 0 38928341 54.422 18.6 2.581
## 3 0 38928341 54.422 18.6 2.581
## 4 0 38928341 54.422 18.6 2.581
## 5 0 38928341 54.422 18.6 2.581
## 6 0 38928341 54.422 18.6 2.581
## aged_70_older handwashing_facilities hospital_beds_per_thousand
## 1 1.337 37.746 0.5
## 2 1.337 37.746 0.5
## 3 1.337 37.746 0.5
## 4 1.337 37.746 0.5
## 5 1.337 37.746 0.5
## 6 1.337 37.746 0.5
# dimension of the data
dim(df)
## [1] 45629 17
# replacing NA values
df <- Fill_NA(df, replace = "MEAN")
# ag
egating data frame
all_countries = aggregate( .~ date, data = df[,c(2,3,5),drop=FALSE], FUN = mean)
# plotting total cases
par(mfrow=c(1,2))
plot(all_countries$total_cases,
col = "blue", type = 'l', xlab = "Date",
xaxt = "n", ylab = "Total Cases",
main = "Avg No.of COVID cases")
lines(all_countries$total_deaths,col="red")
legend('topleft',
legend = c('Positive Cases',"Death Cases"),
fill = c('blue','red'),
col = c('blue','red'),
title = "COVID Cases")
From the above plot the death cases are constant over the days and the positive cases are at first are low but as time going the cases are increasing rapidly. So, we can say that total positive cases are increasing over the days.
# plotting total cases along the coountries
tot_cases = aggregate( .~ location, data = df[,c(1,3,5),drop=FALSE], FUN = mean)
tot_cases <- tot_cases %>% a
ange(total_cases) %>% as.data.frame() %>% tail(11)
arplot(total_cases ~ location, data = tot_cases[-11,],
col = rainbow(10),
main = "Top 10 Max COVID Cases")
The above plot will show the top positive cases of the countries. Here we can see that United States having high number of positive cases and next will be the Brazil also next position is India. Apart from these countries all hare having the less number of positive cases.
# ag
egating by mean
df1 = aggregate( .~ location, data = df[,c(1,12:ncol(df)),drop=FALSE], FUN = mean)
two_Countries<-df[grepl("India|Australia",df$location),]
two_Countries<-two_Countries[,c(11,14,17),]
# co
elation plot
co
plot::co
plot(cor(two_Countries))
The plot is called co
elation plot among all the variables. from the above plot there is negative co
elation between population with aged_65_older and hospital_beds_per_thousand that mens if the population increase then there is a chance of decreasing the aged_65_older and hospital_beds_per_thousand.
and there is positive co
elation between aged_65_older and hospital_beds_per_thousand that means if there is increase in aged_65_older then there is a chance of increase hospital_beds_per_thousand.
# histogram of the median age
hist(df1$median_age,xlab = "Meadian Age",
main = "Histogram of Median Age",
col = rainbow(7))
The above plot shows that the positive cases of the median age group people. Here we can see that most of 30-35 age group people are getting positive and the low positive cases are coming from 45-50 age group people.
# Splitting the data into train and test
set.seed(1234)
Sample <- sample(1:nrow(df), round(nrow(df)*.7))
Train_df <- df[Sample,,drop=FALSE]
Test_df <- df[-Sample,,drop=FALSE]
#fitting a linear regression model to whole data
model1<-lm(total_cases~.,data=Train_df)
#summary of the model
summary(model1)
##
## Call:
## lm(formula = total_cases ~ ., data = Train_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3809908 -4875 93 4957 5430740
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. E
or t value Pr(>|t|)
## (Intercept) 3.224e+13 2.298e+14 0.140 0.88841
## locationAlbania 2.989e+13 2.130e+14 0.140 0.88841
## locationAlgeria -2.196e+13 1.565e+14 -0.140 0.88841
## locationAndo
a 6.477e+13 4.616e+14 0.140 0.88841
## locationAngola -1.809e+13 1.289e+14 -0.140 0.88841
## locationAnguilla 1.814e+14 1.293e+15 0.140 0.88841
## locationAntigua and Ba
uda 1.051e+14 7.491e+14 0.140 0.88841
## locationArgentina -2.266e+13 1.615e+14 -0.140 0.88841
## locationArmenia 2.874e+13 2.048e+14 0.140 0.88841
## locationAruba 3.142e+14 2.239e+15 0.140 0.88841
## locationAustralia -3.034e+13 2.163e+14 -0.140 0.88841
## locationAustria 3.100e+13 2.209e+14 0.140 0.88841
## locationAze
aijan 3.844e+13 2.740e+14 0.140 0.88841
## locationBahamas -8.842e+12 6.302e+13 -0.140 0.88841
## locationBahrain 1.115e+15 7.944e+15 0.140 0.88841
## locationBangladesh 7.172e+14 5.111e+15 0.140 0.88841
## locationBa
ados 3.614e+14 2.576e+15 0.140 0.88841
## locationBelarus -4.481e+12 3.194e+13 -0.140 0.88841
## locationBelgium 1.903e+14 1.356e+15 0.140 0.88841
## locationBelize -2.251e+13 1.604e+14 -0.140 0.88841
## locationBenin 2.648e+13 1.887e+14 0.140 0.88841
## locationBermuda 7.432e+14 5.296e+15 0.140 0.88841
## locationBhutan -1.969e+13 1.403e+14 -0.140 0.88841
## locationBolivia -2.620e+13 1.867e+14 -0.140 0.88841
## locationBonaire Sint Eustatius and Saba 1.814e+14 1.293e+15 0.140 0.88841
## locationBosnia and Herzegovina 8.338e+12 5.942e+13 0.140 0.88841
## locationBotswana -2.985e+13 2.127e+14 -0.140 0.88841
## locationBrazil -1.741e+13 1.241e+14 -0.140 0.88841
## locationBritish Virgin Islands 9.097e+13 6.483e+14 0.140 0.88841
## locationBrunei 1.595e+13 1.137e+14 0.140 0.88841
## locationBulgaria 6.374e+12 4.542e+13 0.140 0.88841
## locationBurkina Faso 9.319e+12 6.641e+13 0.140 0.88841
## locationBurundi 2.184e+14 1.556e+15 0.140 0.88841
## locationCambodia 2.148e+13 1.531e+14 0.140 0.88841
## locationCameroon -2.095e+12 1.493e+13 -0.140 0.88841
## locationCanada -2.985e+13 2.127e+14 -0.140 0.88841
## locationCape Verde 4.808e+13 3.427e+14 0.140 0.88841
## locationCayman Islands 1.197e+14 8.532e+14 0.140 0.88841
## locationCentral African Republic -2.781e+13 1.982e+14 -0.140 0.88841
## locationChad -2.523e+13 1.798e+14 -0.140 0.88841
## locationChile -1.786e+13 1.273e+14 -0.140 0.88841
## locationChina 5.525e+13 3.937e+14 0.140 0.88841
## locationColombia -6.042e+12 4.306e+13 -0.140 0.88841
## locationComoros 2.269e+14 1.617e+15 0.140 0.88841
## locationCongo -2.312e+13 1.647e+14 -0.140 0.88841
## locationCosta Rica 2.468e+13 1.759e+14 0.140 0.88841
## locationCote d'Ivoire 1.302e+13 9.279e+13 0.140 0.88841
## locationCroatia 1.144e+13 8.151e+13 0.140 0.88841
## locationCuba 3.317e+13 2.364e+14 0.140 0.88841
## locationCuracao 1.826e+14 1.301e+15 0.140 0.88841
## locationCyprus 4.339e+13 3.092e+14 0.140 0.88841
## locationCzech Republic 4.903e+13 3.494e+14 0.140 0.88841
## locationDemocratic Republic of Congo -1.099e+13 7.829e+13 -0.140 0.88841
## locationDenmark 4.864e+13 3.466e+14 0.140 0.88841
## locationDjibouti -7.783e+12 5.547e+13 -0.140 0.88841
## locationDominica 2.615e+13 1.864e+14 0.140 0.88841
## locationDominican Republic 9.980e+13 7.112e+14 0.140 0.88841
## locationEcuador 7.416e+12 5.285e+13 0.140 0.88841
## locationEgypt 2.582e+13 1.840e+14 0.140 0.88841
## locationEl Salvador 1.501e+14 1.070e+15 0.140 0.88841
## locationEquatorial Guinea -5.467e+12 3.896e+13 -0.140 0.88841
## locationEritrea -5.994e+12 4.272e+13 -0.140 0.88841
## locationEstonia -1.386e+13 9.875e+13 -0.140 0.88841
## locationEthiopia 2.994e+13 2.134e+14 0.140 0.88841
## locationFaeroe Islands -1.132e+13 8.070e+13 -0.140 0.88841
## locationFalkland Islands 1.814e+14 1.293e+15 0.140 0.88841
## locationFiji -2.879e+12 2.052e+13 -0.140 0.88841
## locationFinland -2.150e+13 1.532e+14 -0.140 0.88841
## locationFrance 4.038e+13 2.878e+14 0.140 0.88841
## locationFrench Polynesia 1.357e+13 9.670e+13 0.140 0.88841
## locationGabon -2.759e+13 1.966e+14 -0.140 0.88841
## locationGambia 9.073e+13 6.466e+14 0.140 0.88841
## locationGeorgia 6.286e+12 4.480e+13 0.140 0.88841
## locationGermany 1.082e+14 7.710e+14 0.140 0.88841
## locationGhana 4.283e+13 3.053e+14 0.140 0.88841
## locationGi
altar 2.016e+15 1.437e+16 0.140 0.88841
## locationGreece 1.721e+13 1.227e+14 0.140 0.88841
## locationGreenland -3.216e+13 2.292e+14 -0.140 0.88841
## locationGrenada 1.556e+14 1.109e+15 0.140 0.88841
## locationGuam 1.479e+14 1.054e+15 0.140 0.88841
## locationGuatemala 6.127e+13 4.366e+14 0.140 0.88841
## locationGuernsey 1.814e+14 1.293e+15 0.140 0.88841
## locationGuinea -1.580e+12 1.126e+13 -0.140 0.88841
## locationGuinea-Bissau 6.972e+12 4.969e+13 0.140 0.88841
## locationGuyana -2.990e+13 2.131e+14 -0.140 0.88841
## locationHaiti 2.038e+14 1.453e+15 0.140 0.88841
## locationHonduras 1.682e+13 1.198e+14 0.140 0.88841
## locationHong Kong 4.138e+15 2.949e+16 0.140 0.88841
## locationHungary 3.177e+13 2.264e+14 0.140 0.88841
## locationIceland -3.023e+13 2.154e+14 -0.140 0.88841
## locationIndia 2.346e+14 1.672e+15 0.140 0.88841
## locationIndonesia 5.409e+13 3.855e+14 0.140 0.88841
## locationInternational 1.814e+14 1.293e+15 0.140 0.88841
## locationIran -2.720e+12 1.938e+13 -0.140 0.88841
## locationIraq 1.997e+13 1.423e+14 0.140 0.88841
## locationIreland 9.154e+12 6.524e+13 0.140 0.88841
## locationIsle of Man 5.536e+13 3.946e+14 0.140 0.88841
## locationIsrael 2.063e+14 1.470e+15 0.140 0.88841
## locationItaly 8.972e+13 6.394e+14 0.140 0.88841
## locationJamaica 1.259e+14 8.970e+14 0.140 0.88841
## locationJapan 1.738e+14 1.239e+15 0.140 0.88841
## locationJersey 1.814e+14 1.293e+15 0.140 0.88841
## locationJordan 3.250e+13 2.316e+14 0.140 0.88841
## locationKazakhstan -2.828e+13 2.016e+14 -0.140 0.88841
## locationKenya 1.949e+13 1.389e+14 0.140 0.88841
## locationKosovo 6.738e+13 4.802e+14 0.140 0.88841
## locationKuwait 1.053e+14 7.503e+14 0.140 0.88841
## locationKyrgyzstan -1.309e+13 9.326e+13 -0.140 0.88841
## locationLaos -1.464e+13 1.043e+14 -0.140 0.88841
## locationLatvia -1.375e+13 9.800e+13 -0.140 0.88841
## locationLebanon 3.200e+14 2.281e+15 0.140 0.88841
## locationLesotho 1.134e+13 8.081e+13 0.140 0.88841
## locationLiberia -3.137e+12 2.236e+13 -0.140 0.88841
## locationLibya -3.010e+13 2.145e+14 -0.140 0.88841
## locationLiechtenstein 1.082e+14 7.709e+14 0.140 0.88841
## locationLithuania -5.502e+12 3.921e+13 -0.140 0.88841
## locationLuxembourg 1.049e+14 7.474e+14 0.140 0.88841
## locationMacedonia 1.669e+13 1.190e+14 0.140 0.88841
## locationMadagascar -6.203e+12 4.421e+13 -0.140 0.88841
## locationMalawi 8.478e+13 6.042e+14 0.140 0.88841
## locationMalaysia 2.478e+13 1.766e+14 0.140 0.88841
## locationMaldives 8.294e+14 5.911e+15 0.140 0.88841
## locationMali -2.324e+13 1.656e+14 -0.140 0.88841
## locationMalta 8.292e+14 5.909e+15 0.140 0.88841
## locationMauritania -2.970e+13 2.117e+14 -0.140 0.88841
## locationMauritius 3.368e+14 2.400e+15 0.140 0.88841
## locationMexico 7.122e+12 5.076e+13 0.140 0.88841
## locationMoldova 4.102e+13 2.923e+14 0.140 0.88841
## locationMonaco 1.143e+16 8.146e+16 0.140 0.88841
## locationMongolia -3.107e+13 2.214e+14 -0.140 0.88841
## locationMontenegro -4.824e+12 3.438e+13 -0.140 0.88841
## locationMontse
at 1.814e+14 1.293e+15 0.140 0.88841
## locationMorocco 1.520e+13 1.083e+14 0.140 0.88841
## locationMozambique -9.890e+12 7.049e+13 -0.140 0.88841
## locationMyanmar 1.617e+13 1.153e+14 0.140 0.88841
## locationNamibia -3.042e+13 2.168e+14 -0.140 0.88841
## locationNepal 8.887e+13 6.334e+14 0.140 0.88841
## locationNetherlands 2.690e+14 1.917e+15 0.140 0.88841
## locationNew Caledonia -2.315e+13 1.650e+14 -0.140 0.88841
## locationNew Zealand -2.146e+13 1.529e+14 -0.140 0.88841
## locationNicaragua -1.632e+12 1.163e+13 -0.140 0.88841
## locationNiger -2.220e+13 1.582e+14 -0.140 0.88841
## locationNigeria 9.193e+13 6.551e+14 0.140 0.88841
## locationNorthern Mariana Islands 3.878e+13 2.764e+14 0.140 0.88841
## locationNorway -2.367e+13 1.687e+14 -0.140 0.88841
## locationOman -2.337e+13 1.665e+14 -0.140 0.88841
## locationPakistan 1.192e+14 8.493e+14 0.140 0.88841
## locationPalestine 4.288e+14 3.056e+15 0.140 0.88841
## locationPanama 4.212e+11 3.002e+12 0.140 0.88841
## locationPapua New Guinea -2.145e+13 1.529e+14 -0.140 0.88841
## locationParaguay -2.209e+13 1.574e+14 -0.140 0.88841
## locationPeru -1.735e+13 1.237e+14 -0.140 0.88841
## locationPhilippines 1.762e+14 1.256e+15 0.140 0.88841
## locationPoland 4.124e+13 2.939e+14 0.140 0.88841
## locationPortugal 3.433e+13 2.447e+14 0.140 0.88841
## locationPuerto Rico 1.907e+14 1.359e+15 0.140 0.88841
## locationQatar 1.024e+14 7.300e+14 0.140 0.88841
## locationRomania 1.819e+13 1.297e+14 0.140 0.88841
## locationRussia -2.701e+13 1.925e+14 -0.140 0.88841
## locationRwanda 2.609e+14 1.860e+15 0.140 0.88841
## locationSaint Kitts and Nevis 9.387e+13 6.690e+14 0.140 0.88841
## locationSaint Lucia 1.415e+14 1.008e+15 0.140 0.88841
## locationSaint Vincent and the Grenadines 1.347e+14 9.600e+14 0.140 0.88841
## locationSan Marino 2.976e+14 2.121e+15 0.140 0.88841
## locationSao Tome and Principe 9.385e+13 6.689e+14 0.140 0.88841
## locationSaudi Arabia -2.316e+13 1.651e+14 -0.140 0.88841
## locationSenegal 1.653e+13 1.178e+14 0.140 0.88841
## locationSe
ia 1.533e+13 1.092e+14 0.140 0.88841
## locationSeychelles 9.120e+13 6.499e+14 0.140 0.88841
## locationSie
a Leone 2.979e+13 2.123e+14 0.140 0.88841
## locationSingapore 4.657e+15 3.319e+16 0.140 0.88841
## locationSint Maarten (Dutch part) 6.841e+14 4.875e+15 0.140 0.88841
## locationSlovakia 3.478e+13 2.479e+14 0.140 0.88841
## locationSlovenia 2.855e+13 2.035e+14 0.140 0.88841
## locationSomalia -1.832e+13 1.306e+14 -0.140 0.88841
## locationSouth Africa -4.543e+12 3.238e+13 -0.140 0.88841
## locationSouth Korea 2.805e+14 1.999e+15 0.140 0.88841
## locationSouth Sudan 1.814e+14 1.293e+15 0.140 0.88841
## locationSpain 2.292e+13 1.633e+14 0.140 0.88841
## locationSri Lanka 1.703e+14 1.214e+15 0.140 0.88841
## locationSudan -1.846e+13 1.316e+14 -0.140 0.88841
## locationSuriname -3.010e+13 2.145e+14 -0.140 0.88841
## locationSwaziland 1.485e+13 1.059e+14 0.140 0.88841
## locationSweden -1.760e+13 1.254e+14 -0.140 0.88841
## locationSwitzerland 9.468e+13 6.748e+14 0.140 0.88841
## locationSyria 1.814e+14 1.293e+15 0.140 ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here