Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

CP5805 Assignment 2 Main task DataFrame manipulation and visualisation Task Design and implement a data analysis program in Python using pandas as detailed in the instructions below. 85% of your mark...

1 answer below »

CP5805 Assignment 2
Main task
DataFrame manipulation and visualisation
Task
Design and implement a data analysis program in Python using pandas as detailed in the
instructions below.
85% of your mark will be based on the co
ectness and quality of the basic program, and
15% is based on the functionality in the challenge section.
You will need to use the skills covered across weeks one to five for this main task. Some
portions may require some further investigation of the pandas docs.
Important note about li
aries
For this assessment, you are free to use any standard Python li
aries, as well the li
aries we
have covered in subject contents. In fact, you must use pandas appropriately to fulfill the
equirements of this assessment. You may, if it allows you to write more efficient or effective
code, use additional li
aries, provided these li
aries are included in the standard Anaconda
installation. You may not use any li
aries that need to be installed separately (e.g., via
conda or pip).
Detailed instructions
Your program will allow users to load a DataFrame from a CSV file, clean the data in various
ways, display statistics, and create visualisations.
When the program runs, the user will see an introductory message (you are welcome to
determine this as you see you fit, but make sure to include your name). For example:
Welcome to The DataFrame Statistician!
Programmed by Ada Lovelace
After the welcome message, the user will be presented with the following menu:
Please choose from the following options:
1 – Load data from a file
2 – View data
3 – Clean data
4 – Analyse data
5 – Visualise data
6 - Save data to a file
7 - Quit
Option 7 will exit the program; every other option will do some task and then display the menu
again until the user chooses 7 from this menu.
If the user enters anything other than a value between 1 and 7, display an appropriate e
or
message (e.g., Invalid selection!), then get the user to enter another choice.
Menu option 1 - load data from a file
When the user chooses option 1, they will be asked for a filename to load, which is expected to
e in the same directory as the program (no need for path information). Your program should use
the exact filename as stated. Do not append .csv or any other extension – although the
contents of the file will be expected to be CSV, a CSV file could be stored under any
extension, or no extension.
Your program should be able to handle any file in a format like the following:
day,min_temp,max_temp,rainfall,humidity
1,11,23,3,55
1,11,23,3,55
2,13,25,0,60
3,9,19,17,80
4,9,18,36,85
5,,,,50
6,12,22,,60
7,13,23,0,65
So, the first row should be the names of the columns, and the following rows should consist of
the data. Your program should not be hard coded to deal with the example weather data
above; it should work with any CSV file where all the column values are numeric and it
can be loaded as a DataFrame. Your program should work for any number of rows or
columns.
There are two problems your program may encounter here.
• the file does not exist or cannot be opened
• pandas cannot interpret the data as a DataFrame
In both of these cases your program should display an appropriate e
or message (e.g., "File not
found", "Unable to load data") then return control to the main menu.
Your program only needs to handle one DataFrame in the system at a time. If a DataFrame was
previously loaded, it should be replaced.
After the file loads successfully, the program should display the names of the columns, and ask
the user if they want to set any of the columns as an index. Valid input in this case will consist of
either one of the column names, or the blank string (user just presses `Enter`). If the input is not
valid, loop until the user enters a valid column name or blank.
The program should then set the DataFrame's index to the selected column or skip this if the user
entered the blank string.
Menu option 2 - View data
This option simply prints the DataFrame to the screen. In the following example, day was set as
the index when the DataFrame was loaded.
min_temp max_temp rainfall humidity
day
XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
XXXXXXXXXXNaN NaN NaN XXXXXXXXXX
XXXXXXXXXX XXXXXXXXXXNaN XXXXXXXXXX
XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
Menu option 3 - Clean data
This option will enter a submenu offering various cleaning operations.
Cleaning data:
1 - Drop rows with missing values
2 - Fill missing values
3 - Drop duplicate rows
4 - Drop column
5 - Rename column
6 - Finish cleaning
Cleaning option 1 - Drop rows with missing values
This option will ask the user for a threshold value. This must be a non-negative integer. A row
should be dropped if it has fewer non-null values than the threshold. For example, if there are 7
columns, and the threshold is 4, then there will need to be at least 4 non-null (or equivalently no
more than 3 null values).
Cleaning option 2 - Fill missing values
This option will ask the user to enter a value to fill in all the missing cells of the DataFrame.
Accept any number for this value. and display an e
or message if the user enters a non-number.
Cleaning option 3 - Drop duplicate rows
This option will remove any (fully) duplicate rows from the DataFrame.
Cleaning option 4 - Drop column
Present the user with the list of columns in the data and ask them to enter a name. If the entered
column name exists in the DataFrame, drop this column from the DataFrame. If the entered
column name does not exist, ask again.
Cleaning option 5 - Rename column
The user will choose a column to rename, then enter a new name. Make sure the new name is not
the name of an existing column, and that it is not blank.
Cleaning option 6 - Finish cleaning
Return to the main menu.
Menu option 4 - Analyse data
For each of the columns in the DataFrame, produce a report like the one below. Make sure to use
pandas functions as appropriate.
humidity
--------
number of values (n): 7
XXXXXXXXXXminimum: 50.00
XXXXXXXXXXmaximum: 85.00
XXXXXXXXXXmean: 65.00
XXXXXXXXXXmedian: 60.00
standard deviation: 12.91
std. e
. of mean: 4.88
Display each statistic to two decimal places (except for number of values, which is always a
whole number). After displaying the statistics reports, finish by displaying a table of co
elations
like the one below (hint: you don't have to write your own code to compute co
elations, search
the pandas docs).
XXXXXXXXXXmin_temp max_temp rainfall humidity
min_temp XXXXXXXXXX XXXXXXXXXX
max_temp XXXXXXXXXX XXXXXXXXXX
rainfall XXXXXXXXXX XXXXXXXXXX
humidity XXXXXXXXXX XXXXXXXXXX
Menu option 5 - Visualise data
In this case, ask the user:
• If they want a bar graph, line graph, or boxplot (repeat until they give a valid selection)
• Whether they want to use subplots
• For a title (skip if they leave it blank)
• For an x-axis label (skip if they leave it blank)
• For a y-axis label (skip if they leave it blank)
Then display the plot.
Menu option 6 - Save data to a file
Ask the user for a filename, including file extension (e.g., data.csv). Use the exact filename
given including the extension – if the user
Answered 5 days After Sep 29, 2022

Solution

Sathishkumar answered on Oct 05 2022
56 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here