Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

1. Clean the excel data set using Rstudio.-Utilize the forcats package for reducing categories-Utilize the mice package for imputingThe HOR_state and UnitST columns of States needs to be reduced to 4...

1 answer below »
1. Clean the excel data set using Rstudio.
-Utilize the forcats package for reducing categories
-Utilize the mice package for imputing

The HOR_state and UnitST columns of States needs to be reduced to 4 regions of West, Midwest, South and Northeast. The Branch categories needs to be reduced to unrestricted and restricted:
Unrestricted - Armor, Air Defense Artillery, Ammunition, Aviation, Field Artillery, Infantry, Logistics, Mechanical Maintenance, Military Police, Special Forces.
Restricted - Adjacent General, Army Medical Specialist Corps, Army Nurse Corps, Behavioral Sciences, CBRN, Chaplain, Civil Affairs, CMF Immaterial, Corps of Engineers, Cyber, Dental Corps, Electronic Maintenance, Financial Management, Force Management, Health Services, Information Operations, Information Systems Engineer, Judge Advocate Generals Corps, Laboratory Sciences, Medical Corps, Military Intelligence, Nuclear & Counterproliferation, Operations Research/Systems Analysis, Personnel Special Reporting Codes, Preventative Medical Sciences, Psychological Operations, Public Affairs, Quartermaster Corps, Recruitment & Reenlistment, Research/Development/Acquisition, Signal Corps, Simulations Operations, Space Operations, Strategist Intelligence, Strategist, Systems Automation Officer, Telecommunications Systems Engineers, Transportation Corps, Veterinary Corps.

2. After the data has been cleaned. Fit a random forest model using the of the unvac_pop column as the response variable to the Branch column and UnitST column using the supporting material word document as guidance.

3.Then estimate the AUC value of the random forest model using the supporting material word document.
Answered 2 days After Nov 08, 2022

Solution

Mohd answered on Nov 11 2022
57 Votes
-
-
-
2022-11-11
Importing the dataset
li
ary(readxl)
data <- read_excel("data.xlsx", col_types = c("text",
"text", "text", "text", "date", "date",
"numeric", "numeric", "text", "text",
"numeric", "text", "text", "text", "text",
"text", "text", "numeric", "numeric",
"text", "text", "text", "text", "date",
"text", "text", "text", "text", "text",
"numeric", "text", "text", "text", "text"))
First look of the Data
skimr::skim(data)
Data summary
    Name
    data
    Number of rows
    176485
    Number of columns
    34
    _______________________
    
    Column type frequency:
    
    characte
    25
    numeric
    6
    POSIXct
    3
    ________________________
    
    Group variables
    None
Variable type:...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here