Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Brooklyn Housing Analysis Dataset: CSV here Provide a short narrative describing on the Brooklyn Housing Analysis problem. You can use any methods or tools you think are most appropriate. Write the...

1 answer below »

Brooklyn Housing Analysis

Dataset:CSV here

Provide a short narrative describing on the Brooklyn Housing Analysis problem. You can use any methods or tools you think are most appropriate. Write the step-by-step instructions for completing the Dimensionality, Feature Reduction, Model Evaluation and Selection part of your case study.

Add the last remaining steps XXXXXXXXXXto the current file Jupyter Notebook.

Provide a short narrative describing on the Brooklyn Housing Analysis problem.

1.I want to see if I can create a map to display divided geographical areas or regions that are colored, shaded or patterned in relation to a data variable.

Dimensionality and Feature Reduction

2.Some of my questions have been answered by seeing the charts but in some ways, looking at this much data has created even more questions.

a.Now it’s time to reduce some of the features so we can concentrate on the things that matter!There features we will get rid of are:"Unnamed", "apartment_number", "Ext ", "Landmark", etc.

b.Fill in missing values.(apartment_number has some missing values but we are dropping that feature.)If there is a missing value in a column representing the year in which alterations where carried out on a property, it may make more sense assuming no alteration had been carried out.

3.If you go back and look at the histograms of sales, you’ll see that it is very skewed…many low real estate sales, not very many high real estate sales.Log Transformation is a good method to use on highly skewed data.

4.Convert your categorical data into numbers.For other categorical columns, I filled the missing data with the modal value of their respective columns and for the rest of the numerical variables I used a mixture of a soft impute imputation and filling missing data using the median value.

Model Evaluation and Selection

5.Training – split the data into two sets: Training and Testing.

6.Evaluation: remember we are trying to predict selling prices of houses

Format:The completed task must bein Jupyter Notebook with run & displayed results.

Resources:

https://www.kaggle.com/tianhwu/brooklynhomes2003to2017

https://hackernoon.com/predicting-the-price-of-houses-in-brooklyn-using-python-1abd7997083b

https://towardsdatascience.com/closing-the-sale-predicting-home-prices-via-linear-regression-2eac62c72818

https://medium.com/geoai/house-hunting-the-data-scientist-way-b32d93f5a42f

Answered Same Day Oct 08, 2021

Solution

Ximi answered on Oct 12 2021
159 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import string\n",
"import re\n",
"import matplotlib.pyplot as plt\n",
"from collections import Counter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 1: Load data into a dataframe"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stde
",
"output_type": "stream",
"text": [
"/us
local/li
python3.7/site-packages/IPython/core/interactiveshell.py:3020: DtypeWarning: Columns (40,41,43,45,46,47,86) have mixed types. Specify dtype option on import or set low_memory=False.\n",
" interactivity=interactivity, compiler=compiler, result=result)\n"
]
}
],
"source": [
"housing_data = pd.read_csv('
ooklynhomes2003to2017
ooklyn_sales_map.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 2: Check the dimension of the table and view the data"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The dimension of the table is: (390883, 111)\n"
]
},
{
"data": {
"text/html": [
"
\n",
"