Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

--- title: "Assignment Six" author: "Professor Lepore" date: "11/06/2022" output: html_document --- # *Task(s):* - ***[1]*** Clean the data frames [2 Points] - ***[2]*** Merge the data frames...

1 answer below »
---
title: "Assignment Six"
author: "Professor Lepore"
date: "11/06/2022"
output: html_document
---
# *Task(s):*
- ***[1]*** Clean the data frames [2 Points]
- ***[2]*** Merge the data frames [3 Points]
- ***[3]*** Run uni variate and bivariate statistics [5 Points]
- ***[4]*** When turning in your assignment please attach both this RMD file
and the html knitted file. I've already set the specific chunk settings for
this assignment. Also be sure to rename both as assignment_six_first_lastname.
### Extra points (3)
- (2): Run bivariate statistics on both residential_commercial_ind and
eviction_possession using the merged data set
- (1): Describe the logic behind your mutates and merging
Partial credit points are possible with tasks two and three.
# R [PART ONE]
```{r setup, include = TRUE}
# Chunk Options ----------------------------------------------------------------
knitr::opts_chunk$set(
    echo = TRUE
)
# R PACKAGES -------------------------------------------------------------------
if (!require('tidyverse'))
install.packages('tidyverse',
XXXXXXXXXXrepos = "http:
cran.us.r-project.org");
li
ary('tidyverse')
if (!require('reticulate'))
install.packages('reticulate',
XXXXXXXXXXrepos = "http:
cran.us.r-project.org");
li
ary('reticulate')
```
```{r data_import}
homebase_data <- jsonlite::fromJSON("https:
data.cityofnewyork.us
esource/ntcm-2w4k.json?$limit=200")
eviction_data <- jsonlite::fromJSON("https:
data.cityofnewyork.us
esource/6z8x-wfk4.json?$limit=100000")
```
```{r data_inspection}
head(homebase_data)
tail(homebase_data)
summary(homebase_data)
colnames(homebase_data)
head(eviction_data)
tail(eviction_data)
summary(eviction_data)
colnames(eviction_data)
```
```{r data_cleaning}
# Create the object homebase_data_clean and make sure to:
# 1. Select the following columns/variables:
# (homebase_office, service_area_zip_code, postcode, borough)
# 2. Rename: service_area_zip_code to servicing_zipcodes and
# postcode to homebase_location_zipcode
# 4. homebase_office capitalized
# 5. Remove the I, II, and III from the homebase office names
# 6. Replace the servicing_zipcodes with any co
ections
# 7. Find the number of servicing_zipcodes
# Create the object eviction_data_clean and make sure to:
# 1. Select the following columns/variables:
# (residential_commercial_ind, eviction_possession, borough, eviction_zip)
# 2. Rename: residential_commercial_ind to building_type,
# eviction_possession to wa
ant_execution_type, and eviction_zip to eviction_zipcode
# 2. Replace Unspecified with NA in wa
ant_execution_type
# 3. Remove all NA from data set
homebase_data_clean <- homebase_data
eviction_data_clean <- eviction_data_clean
```
```{r merging}
# Merge the two data sets based on the same logic of zipcode matching that we did in class
combo_homebase_evicition_dataset <-

```
```{r stats}
# Run statistics to get the mean, sd, median, IQR, min and max for evictions each homebase location. Interpret these numbers (checking for skew as well)
# Run statistics to get the mean, sd, median, IQR, min and max for evictions each borough using the homebase and borough variables. Interpret these numbers (checking for skew as well)
```
# Analysis of evictions for each homebase:
# Analysis of evictions for each borough:
# Python [PART TWO]
```{python setup_python}
# Python PACKAGES --------------------------------------------------------------
import json
import requests
import pandas as pd
pd.options.mode.chained_assignment = None # default='warn'
```
```{python data_import_python}
json_link = requests.get('https:
data.cityofnewyork.us
esource/ntcm-2w4k.json?$limit=200')
json_loaded = json.loads(json_link.text)
homebase_data_python = pd.DataFrame(json_loaded)
del(json_link)
del(json_loaded)
json_link = requests.get('https:
data.cityofnewyork.us
esource/6z8x-wfk4.json?$limit=100000')
json_loaded = json.loads(json_link.text)
eviction_data_python = pd.DataFrame(json_loaded)
del(json_link)
del(json_loaded)
```
```{python data_inspection_python}
homebase_data_python.head(5)
homebase_data_python.tail(5)
homebase_data_python.dtypes
list(homebase_data_python.columns)
eviction_data_python.head(5)
eviction_data_python.tail(5)
eviction_data_python.dtypes
list(eviction_data_python.columns)
```
```{python data_cleaning_python}
# Create the object homebase_data_clean and make sure to:
# 1. Select the following columns/variables:
# (homebase_office, service_area_zip_code, postcode, borough)
# 2. Rename: service_area_zip_code to servicing_zipcodes and
# postcode to homebase_location_zipcode
# 4. homebase_office capitalized
# 5. Remove the I, II, and III from the homebase office names
# 6. Replace the servicing_zipcodes with any co
ections
# 7. Find the number of servicing_zipcodes
# Create the object eviction_data_clean and make sure to:
# 1. Select the following columns/variables:
# (residential_commercial_ind, eviction_possession, borough, eviction_zip)
# 2. Rename: residential_commercial_ind to building_type,
# eviction_possession to wa
ant_execution_type, and eviction_zip to eviction_zipcode
# 2. Replace Unspecified with NA in wa
ant_execution_type
# 3. Remove all NA from data set

```
```{python stats}
# Run statistics to get the mean, sd, median, IQR, min and max for evictions each homebase location.
# Run statistics to get the mean, sd, median, IQR, min and max for evictions each borough using the homebase and borough variables.
```
Answered 2 days After Nov 02, 2022

Solution

Aashna answered on Nov 05 2022
42 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here