Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

DeVry University Data Mining and Analytics Course Project Introduction For your course project, you will complete a data analysis project using the Jupyter Notebook used in the course lessons as a...

1 answer below »

DeVry University
Data Mining and Analytics
Course Project
Introduction
For your course project, you will complete a data analysis project using the Jupyter Notebook used in the course lessons as a model.
You will select your own data for your analysis project. It may come from one or more sources. See the Resources area for ideas and places to find data for analysis.
The project will be in the form of a Jupyter Notebook file. It will also include support files such as the .csv original data, database files for data storage, .html report files, your environment file, and any other files needed to duplicate your project.
You may use your own computer for your project, or you can use the Azure Labs virtual machine provided for you.
Resources
This section includes resources for getting ideas for your project. It also includes resources that can help with using Python and the various li
aries and tools related to data science.
Project Ideas and Finding Data
One of your first steps will be to decide on a dataset or datasets for your project. There are many places to look for data for your course project. A few are listed below but feel free to search the web for other ideas. Think about data that you might be interested in, or that might be helpful for your organization.
When considering your data, think about the capabilities of the computer you will be using for your data analysis (whether your own computer or the Azure Labs virtual machine). Be sure to do a few initial tests of loading and doing some simple processing of data to make sure your environment is suitable for the size of your dataset. You can always reduce the size of your dataset if you need to.
USAFacts – Large source of US government data.
https:
usafacts.org
Kaggle Datasets – Over 59,000 public datasets for use.
https:
www.kaggle.com/datasets
OpenML – Open Machine Learning. Includes over 21,000 data sets you can use.
https:
www.openml.org/home
Microsoft Research Open Data – Free datasets from Microsoft in the areas of biology, computer science, earth science, education, healthcare, information science, mathematics, physics, social science, and other.
https:
msropendata.com/categories
Find Free Public Data Sets for Your Data Science Project – Article with over 30 sites with public data for your project.
https:
www.springboard.com
log/free-public-data-sets-data-science-project
21 Places to Find Free Datasets for Data Science Projects – Article on sources of free data for data science projects.
https:
www.dataquest.io
log/free-datasets-for-projects
Documentation, Tutorials, Guides
You will find many great references and tutorials for virtually any part of your project. A few are listed below but be sure to do a web search and explore YouTube.com and other resources if you need ideas, examples, walkthroughs, tutorials, documentation, etc.
Python and Basic Li
aries
Python for Beginners – This course from Microsoft consists of 44 short videos on various Python concepts. Use this to refresh your Python skills or to review specific topics.
https:
www.youtube.com/watch?v=jFCNu1-Xdsw&list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6
Python Documentation – Official Python guides. Includes beginner’s guides, complete documentation, and tutorials.
https:
www.python.org/doc
w3schools.com Python – This site from the popular w3schools.com has tutorials, examples, and references for Python, NumPy, Matplotlib, SciPy, Machine Learning, and more.
https:
www.w3schools.com/python
NumPy Documentation – Official NumPy documentation including quickstart tutorials, references, examples, and more.
https:
numpy.org/doc/stable
Pandas Documentation – Official Pandas documentation including getting started, user guides, developer documentation, and more.
https:
pandas.pydata.org/docs
Pandas Cheat Sheet – Quick reference for Pandas
https:
www.dataquest.io
log/pandas-cheat-sheet
Matplotlib – Official Site for Matplotlib. Includes examples and documentation for your data visualization.
https:
matplotlib.org
SciPy – Official Site for SciPy. Includes getting started, documentation, examples, and more.
https:
www.scipy.org
SQLite – Official site for SQLite.
https:
www.sqlite.org/index.html
Data Preprocessing using Pandas – Simple tutorial for preprocessing data using Pandas.
https:
www.analyticsvidhya.com
log/2020/09/pandas-speed-up-preprocessing
Visual Studio Code
Visual Studio Code Documentation – Official Visual Studio Code documentation. Includes setup guides, getting started, user guides, languages, and more.
https:
code.visualstudio.com/docs
Anaconda and Jupyte
Anaconda Documentation – Official Anaconda documentation. Includes installation, user guides, references, and more.
https:
docs.anaconda.com
Managing Packages (Li
aries) in Anaconda
https:
docs.anaconda.com/anaconda/user-guide/tasks/install-packages
Jupyter Documentation – Official Jupyter documentation.
https:
jupyter.org/documentation
Additional Tools and Li
aries
Pandas Profiling – Create reports based on your data.
https:
pandas-profiling.github.io/pandas-profiling/docs/maste
td
Folium – Create maps based on your data.
https:
python-visualization.github.io/folium
SandDance for Visual Studio Code – Allows you to easily visualize your raw or processed .csv files.
https:
marketplace.visualstudio.com/items?itemName=msrvida.vscode-sanddance
Video Tutorial on SandDance – Video from Microsoft on using SandDance for data visualization.
https:
www.youtube.com/watch?v=ID5JOc73h4M
Python Seaborn Tutorial for Beginners – Simple tutorial for using Seaborn for Data Visualization.
https:
www.datacamp.com/community/tutorials/seaborn-python-tutorial
Data Science
Complete Jupyter Notebook for Data Science – Complete video tutorial on doing a data science project with Jupyter Notebooks.
https:
www.youtube.com/watch?v=8O_COC9xtJw
Data Science for Beginners – Series of short video tutorials on Python, Anaconda, and Data Science.
https:
www.youtube.com/watch?v=JL_grPUnXzY&list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV
Data Science in Visual Studio Code – Simple tutorial for doing a data science project in Visual Studio Code.
https:
code.visualstudio.com/docs/python/data-science-tutorial
Setting up a Data Science workspace with Visual Studio Code and Anaconda – Simple tutorial for setting up your workspace. (Requires login with e-mail or Google account)
https:
towardsdatascience.com/setting-up-your-own-data-science-workspace-with-visual-studio-code-and-anaconda-python-22237590b4ed
Python – Data Science Tutorial - https:
www.tutorialspoint.com/python_data_science/index.htm
Introduction to Time Series Analysis in Python
https:
www.kdnuggets.com/2020/09/introduction-time-series-analysis-python.html
Exploratory Data Analysis with Pandas Profiling – Tutorial for working with Pandas Profiling to generate a report using your data. (Requires login with e-mail or Google account)
https:
towardsdatascience.com/exploratory-data-analysis-with-pandas-profiling-de3aae2ddff3
Making 3 Easy Maps With Python – Simple tutorial for using Folium to create maps. (Requires login with e-mail or Google account)
https:
towardsdatascience.com/making-3-easy-maps-with-python-fb7dfb1036
Visualizing Data at the Zip Code Level with Folium – Tutorial for using Folium to create zip code level maps. (Requires login with e-mail or Google account)
https:
towardsdatascience.com/visualizing-data-at-the-zip-code-level-with-folium-d07ac983db20
Mapping Data with Folium – Another tutorial for using Folium to create maps.
https:
medium.com/@soste
urg/mapping-data-with-folium-356f0d6f88a9
Part 1 – Environment Setup and Selection of Project Data
Summary
In this part of the project, you will set up your data analysis environment and select your project data.
Points: 60
Due: Module 1
Deliverables: PDF of Jupyter Notebook project. Zipped project folder.
Steps
1. Select one or more data files for your project. See the section Project Ideas and Finding Data above. Download the file(s) (be sure to remember where you downloaded the files to. You may need to convert your data to a format suitable for your project (such as .csv) from another format such as JSON or XML.
2. Decide where to host your project. You can host your project on your own computer or in the MS Azure Labs virtual environment.
a. You have access to a Microsoft Azure Labs virtual machine. This VM has MS Office, Anaconda, and Visual Studio. It also has an Anaconda environment available that is used to run the Jupyter Notebook used in the lessons. You can use this environment and add additional li
aries if necessary or create your own.
. You can use your own computer. Anaconda and Visual Studio are available for Windows, Mac, and Linux operating systems. The Anaconda environment is available in the course Files area and you can import this to get started or you can create your own environment.
3. Set up your environment.
a. Install any software such as Anaconda, Visual Studio Code, etc. (if necessary – these are pre-installed in the Azure VM).
. Install any Extensions (if using Visual Studio Code) (if necessary – these are pre-installed in the Azure VM). Recommended: Python. Anaconda Extension Pack (may be automatically installed with Anaconda). SQLite. SandDance for VSCode.
c. Create Anaconda Environment (if necessary). You can do this by importing the CEIS480 environment available in the Files area of the course (this environment is pre-installed in the Azure VM). Note, the CEIS480 Environment in the files area will work on Windows. If you are using a different platform you can create your own environment and add your li
aries to it. You may also create your own environment from scratch.
4. Create your project folder.
a. Create a folder for your project in a suitable location on your computer or in the Azure VM. Be sure to remember where your project folder is so that you can find it again.
. Have a plan to back up your project folder on a regular basis so that you will not lose work if something happens to your project files.
c. On Windows, Jupyter Labs can only access files on your C: drive. Visual Studio can access Jupyter Notebook folders on any drive.
5. Create your Jupyter Notebook file in your project folder.
a. Make Sure you select the co
ect environment for your project to run in.
6. Add your project data file to your project folder.
7. Create a markdown cell in your project with your project heading. Include: Course Number, Course Name, Course Session (Month and Year), Student Name, Project Name.
8. Create a markdown cell in your project with a
ief description of your project including what type of data is being analyzed as well as the source of the data.
9. Add cells (markdown cell for step/explanation and code cell) for your import statements.
10. Add cells (markdown cell for step/explanation and code cell) to load your project data file.
11. Add cells (markdown cell for step/explanation and code cell) to preview the data using the head() method.
12. Run all cells.
13. Export your environment so that it can be duplicated on another computer.
a. Go to the Anaconda prompt.
. Activate your environment.
Example conda activate CEIS480
c. Export your environment.
Example conda env export > myenvironment.yaml (you can use any filename with a .yaml extension for your environment name).
d. Copy the file your project folder (if you cannot find the file, search for it in Windows Explorer).
14. Submit Deliverables (pdf and zipped project folder).
Deliverables
You should submit both a .pdf file of your project as well as your zipped complete project folder.
1. Export your Jupyter Notebook file to .pdf format (see resources below if you do not know how to do this).
2. Zip your complete project folder (see resources below if you do not know how to do this). Include all files needed to run the project. This should include your environment (.yaml) file, your notebook (.ipynb) file, and any data files needed for the project to run.
3. Submit the following to the dropbox for this module:
a. Project .pdf file
. Project folder .zip file
Resources
·
Answered 1 days After Mar 03, 2022

Solution

Sathishkumar answered on Mar 05 2022
98 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here