Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

INFO5502 Assignment 8: Develop a Data Analysis Project November 27, 2019 In this assignment, you pick your own questions and datasets to build a data analysis project following data science workflow....

1 answer below »
INFO5502 Assignment 8: Develop a Data
Analysis Project
November 27, 2019
In this assignment, you pick your own questions and datasets to build
a data analysis project following data science workflow. Specifically, you need
complete the following tasks:
1. Develop a question of your choice that can be addressed by identifying,
collecting, and analyzing relevant data. You need find relevant data by
yourself, and describe the data such as the source, attributes, size, how
the data were collected, is the dataset sample data or population data?,
etc. The dataset should have at least six distinct variables (i.e. columns)
and a sample size (i.e. rows) of 500 or more. (3 points)
2. Perform exploratory data analysis (EDA). Describe the EDA process and
esult with at least four data visualizations. Explain whether the data is
sufficient to answer the question you developed based on EDA result. If
it is not sufficient, how did you address the issue? (3 points)
3. Describe any data cleaning or transformations that you perform and why
they are motivated by your EDA? (2 point)
4. Apply relevant inference or predication methods such as linear regres-
sion or K-nearest neighborhood (KNN) to analyze your processed
data, and validate the analysis results using cross-validation. Explain the
training process, and the loss functions used in the analysis. Using ex-
amples (i.e. the values of the loss functions) to explain how the minimal
value(s) of the loss function is/are found. (7 points)
5. Summarize and interpret your results including at least four data visualiza-
tions. Provide an evaluation of your approach and discuss any limitations
of the methods you used. (2 points)
6. Write a project report to describe all tasks. (1 point)
7. Submit the original datasets (or public links to the datasets, make sure
they are accessible), the processed datasets (or public links to the datasets,
make sure they are accessible), Python code, and the project report. (2
points)
This assignment is developed based on content from www.ds100.org/fa19/gradproject
1
Answered Same Day Dec 06, 2021

Solution

Ximi answered on Dec 08 2021
124 Votes
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "eda_nyc.ipynb",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemi
or_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.16"
}
},
"cells": [
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "zcAROhjiOnas",
"colab": {}
},
"source": [
"import pandas as pd\n",
"df = pd.read_csv('AB_NYC_2019.csv')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "6m-89_0nsPle"
},
"source": [
"We will do and EDA over NYC AirBnB data to know about the city and regional prices of the hotels. \n",
"\n",
"At the end of this exploration task, we will be able to find out best and economical place to stay and make an informed decision of our stay. \n",
"\n",
"Finally we will build a price predictor model which will use our calculated features and predict the price of the region given its features."
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "RP80L81rS9Va",
"colab": {}
},
"source": [
""
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "gnI0mZjSOna0",
"outputId": "8f69236c-509c-4fa4-9bdb-07607ddb7141",
"colab": {
"base_uri": "https:
localhost:8080/",
"height": 35
}
},
"source": [
"df.shape"
],
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(48895, 16)"
]
},
"metadata": {
"tags": []
},
"execution_count": 2
}
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "DPLCTYzyOna6",
"outputId": "712e674d-21be-4713-a969-b2e52574a4d6",
"colab": {
"base_uri": "https:
localhost:8080/",
"height": 121
}
},
"source": [
"df.columns"
],
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index([u'id', u'name', u'host_id', u'host_name', u'neighbourhood_group',\n",
" u'neighbourhood', u'latitude', u'longitude', u'room_type', u'price',\n",
" u'minimum_nights', u'number_of_reviews', u'last_review',\n",
" u'reviews_per_month', u'calculated_host_listings_count',\n",
" u'availability_365'],\n",
" dtype='object')"
]
},
"metadata": {
"tags": []
},
"execution_count": 3
}
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "PZNFOc9POna_",
"outputId": "5f8143c1-02fa-43a0-b35a-4614cfa0d1fc",
"colab": {
"base_uri": "https:
localhost:8080/",
"height": 408
}
},
"source": [
"df.head()"
],
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"