Brooklyn Housing AnalysisDataset:CSV hereProvide a short narrative describing on the Brooklyn...

Question

Brooklyn Housing AnalysisDataset:CSV hereProvide a short narrative describing on the Brooklyn Housing Analysis problem. You can use any methods or tools you think are most appropriate. Write the step-by-step instructions for completing the Dimensionality, Feature Reduction, Model Evaluation and Selection part of your case study.Add the last remaining steps XXXXXXXXXXto the current file Jupyter Notebook.Provide a short narrative describing on the Brooklyn Housing Analysis problem.1.I want to see if I can create a map to display divided geographical areas or regions that are colored, shaded or patterned in relation to a data variable.Dimensionality and Feature Reduction2.Some of my questions have been answered by seeing the charts but in some ways, looking at this much data has created even more questions.a.Now it’s time to reduce some of the features so we can concentrate on the things that matter!There features we will get rid of are:"Unnamed", "apartment_number", "Ext ", "Landmark", etc.b.Fill in missing values.(apartment_number has some missing values but we are dropping that feature.)If there is a missing value in a column representing the year in which alterations where carried out on a property, it may make more sense assuming no alteration had been carried out.3.If you go back and look at the histograms of sales, you’ll see that it is very skewed…many low real estate sales, not very many high real estate sales.Log Transformation is a good method to use on highly skewed data.4.Convert your categorical data into numbers.For other categorical columns, I filled the missing data with the modal value of their respective columns and for the rest of the numerical variables I used a mixture of a soft impute imputation and filling missing data using the median value.Model Evaluation and Selection5.Training – split the data into two sets: Training and Testing.6.Evaluation: remember we are trying to predict selling prices of housesFormat:The completed task must bein Jupyter Notebook with run & displayed results.Resources:https://www.kaggle.com/tianhwu/brooklynhomes2003to2017https://hackernoon.com/predicting-the-price-of-houses-in-brooklyn-using-python-1abd7997083bhttps://towardsdatascience.com/closing-the-sale-predicting-home-prices-via-linear-regression-2eac62c72818https://medium.com/geoai/house-hunting-the-data-scientist-way-b32d93f5a42f

Ximi · Accepted Answer

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd
",
    "import numpy as np
",
    "import string
",
    "import re
",
    "import matplotlib.pyplot as plt
",
    "from collections import Counter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 1: Load data into a dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3020: DtypeWarning: Columns (40,41,43,45,46,47,86) have mixed types. Specify dtype option on import or set low_memory=False.
",
      "  interactivity=interactivity, compiler=compiler, result=result)
"
     ]
    }
   ],
   "source": [
    "housing_data = pd.read_csv('brooklynhomes2003to2017/brooklyn_sales_map.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 2: Check the dimension of the table and view the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The dimension of the table is:  (390883, 111)
"
     ]
    },
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      Unnamed: 0
",
       "      borough
",
       "      neighborhood
",
       "      building_class_category
",
       "      tax_class
",
       "      block
",
       "      lot
",
       "      easement
",
       "      building_class
",
       "      address
",
       "      ...
",
       "      EDesigNum
",
       "      APPBBL
",
       "      APPDate
",
       "      PLUTOMapID
",
       "      FIRM07_FLA
",
       "      PFIRM15_FL
",
       "      Version
",
       "      MAPPLUTO_F
",
       "      SHAPE_Leng
",
       "      SHAPE_Area
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      1
",
       "      3
",
       "      DOWNTOWN-METROTECH
",
       "      28  COMMERCIAL CONDOS
",
       "      4
",
       "      140
",
       "      1001
",
       "      NaN
",
       "      R5
",
       "      330 JAY STREET
",
       "      ...
",
       "      NaN
",
       "      NaN
",
       "      NaN
",
       "      NaN
",
       "      NaN
",
       "      NaN
",
       "      NaN
",
       "      NaN
",
       "      NaN
",
       "      NaN
",
       "    
",
       "    
",
       "      1
",
       "      2
",
       "      3
",
       "      DOWNTOWN-FULTON FERRY
",
       "      29  COMMERCIAL GARAGES
",
       "      4
",
       "      54
",
       "      1
",
       "      NaN
",
       "      G7
",
       "      85 JAY STREET
",
       "      ...
",
       "      NaN
",
       "      3.000540e+09
",
       "      12/06/2002
",
       "      1.0
",
       "      NaN
",
       "      NaN
",
       "      17V1.1
",
       "      0.0
",
       "      1559.889144
",
       "      140131.577176
",
       "    
",
       "    
",
       "      2
",
       "      3
",
       "      3
",
       "      BROOKLYN HEIGHTS
",
       "      21  OFFICE BUILDINGS
",
       "      4
",
       "      204
",
       "      1
",
       "      NaN
",
       "      O6
",
       "      29 COLUMBIA HEIGHTS
",
       "      ...
",
       "      NaN
",
       "      0.000000e+00
",
       "      NaN
",
       "      1.0
",
       "      NaN
",
       "      NaN
",
       "      17V1.1
",
       "      0.0
",
       "      890.718521
",
       "      34656.447240
",
       "    
",
       "    
",
       "      3
",
       "      4
",
       "      3
",
       "      MILL BASIN
",
       "      22  STORE BUILDINGS
",
       "      4
",
       "      8470
",
       "      55
",
       "      NaN
",
       "      K6
",
       "      5120 AVENUE U
",
       "      ...
",
       "      NaN
",
       "      0.000000e+00
",
       "      NaN
",
       "      1.0
",
       "      1.0
",
       "      1.0
",
       "      17V1.1
",
       "      0.0
",
       "      3729.786857
",
       "      797554.847834
",
       "    
",
       "    
",
       "      4
",
       "      5
",
       "      3
",
       "      BROOKLYN HEIGHTS
",
       "      26 OTHER HOTELS
",
       "      4
",
       "      230
",
       "      1
",
       "      NaN
",
       "      H8
",
       "      21 CLARK STREET
",
       "      ...
",
       "      NaN
",
       "      0.000000e+00
",
       "      NaN
",
       "      1.0
",
       "      NaN
",
       "      NaN
",
       "      17V1.1
",
       "      0.0
",
       "      620.761169
",
       "      21360.147631
",
       "    
",
       "  
",
       "
",
       "5 rows × 111 columns
",
       ""
      ],
      "text/plain": [
       "   Unnamed: 0  borough           neighborhood building_class_category  \
",
       "0           1        3     DOWNTOWN-METROTECH   28  COMMERCIAL CONDOS   
",
       "1           2        3  DOWNTOWN-FULTON FERRY  29  COMMERCIAL GARAGES   
",
       "2           3        3       BROOKLYN HEIGHTS    21  OFFICE BUILDINGS   
",
       "3           4        3             MILL BASIN     22  STORE BUILDINGS   
",
       "4           5        3       BROOKLYN HEIGHTS         26 OTHER HOTELS   
",
       "
",
       "  tax_class  block   lot  easement building_class              address  \
",
       "0         4    140  1001       NaN             R5       330 JAY STREET   
",
       "1         4     54     1       NaN             G7        85 JAY STREET   
",
       "2         4    204     1       NaN             O6  29 COLUMBIA HEIGHTS   
",

Brooklyn Housing Analysis Dataset: CSV here Provide a short narrative describing on the Brooklyn Housing Analysis problem. You can use any methods or tools you think are most appropriate. Write the...

Solution