Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

assignment 5 Assignment: Survivor Name: Overview In this assignment, you will test all the skills that you have learned during this course to manipulate the provided data to find the answers to...

1 answer below »
assignment 5
Assignment: Survivo
Name:
Overview
In this assignment, you will test all the skills that you have learned during this course to manipulate
the provided data to find the answers to questions about the TV show Survivor. If you are not familia
with this show, start by watching this short clip that
iefly explains it: Survivor Explained
Please note that your notebook should be named survivor when submitting to CodeGrade for the
automatic grading to work properly.
Data
The Survivor data is a R package from Daniel Oehm. Daniel has made the data for this package
available as an Excel file as explained in his article on gradientdescending.com. Please make sure
that you use the file from our Brightspace page though to make sure that your data will match what
CodeGrade is expecting. We have also updated some e
ors in the file, which is another reason that
you must use the data given to you.
You need to first read the article on the website linked above. This will give you additional details
about the data that will be important as you answer the questions below.
Please note that there is a data dictionary in the file that explains the columns in the data. You will
also want to become familiar with the various spreadsheets and column names.
Finally, here are a couple of things to know for those of you that have not seen the show:
● Survivor is a reality TV show that first aired May 31, 2000 and is cu
ently still on TV.
● Contestants are
oken up into two teams (usually) where they live in separate camps.
● The teams compete in various challenges for rewards (food, supplies,
ief experience trips,
etc) and tribal immunity.
● The team that loses a challenge, and therefore doesn't get the tribal immunity, goes to tribal
council where they have to vote one of their members out (this data is represented in the
"Vote History" spreadsheet).
● After there are a small number of contestants left, the tribes are merged into one tribe where
each contestant competes for individual immunity. The winner of the individual immunity
cannot get voted out and is safe at the next tribal council.
● The are also hidden immunity idols that are hidden around the campground. If a contestant
finds and plays their hidden immunity at the tribal council, then all votes against them do not
count, and the player with the next highest number of votes goes home.
● When the contestants get down to 2 or 3 people, a number of the last contestants, known as
the jury, come back to vote for the person who they think should win the game. The winner is
https:
www.youtube.com/watch?v=l1-hTpG_krk
http:
gradientdescending.com/survivor-data-from-the-tv-series-in-
the one who gets the most jury votes (this data is represented in the "Jury Votes"
spreadsheet). This person is known as the Sole Survivor.
● Voting recap:
○ Tribal Council votes (Vote History spreadsheet) are bad; contestants with the most
votes get sent home
○ Jury Votes (Jury Votes spreadsheet) are good; contestants with the most votes win
the game and is the Sole Survivo
Note
Show Work
Remember that you must show your work. Students submissions are spot checked manually to
verify that they are not hard coding the answer from looking only in the file or in CodeGrade's
expected output. If this is seen, the student's answer will be manually marked wrong and their grade
will be changed to reflect this.
For example, if the question is who is the contestant who has received the most tribal votes to be
voted out. Select their record from the castaway_details DataFrame.
You would show your work and code similar to this:
### inco
ect way ###
Q1 = castaway_details[castaway_details['castaway_id'] == 333]
### co
ect way - showing your work ###
# get index
idx = vote_history.groupby('vote_id').size().sort_values(ascending=False).index[0]
# select row based on index
Q1 = castaway_details[castaway_details['castaway_id'] == idx]
Use Copy
Don't change any of the original DataFrames unless specifically asked or CodeGrade will not work
co
ectly for this assignment. Make sure you use copy() if needed.
In [ ]:
# standard imports
import pandas as pd
import numpy as np

# Do not change this option; This allows the CodeGrade auto grading to function
co
ectly
pd.set_option('display.max_columns', None)
First, import the data from the survivor.xlsx file, calling the respective DataFrames the same as the
sheet name but with lowercase and snake case. For example, the sheet called Castaway Details
https:
en.wikipedia.org/wiki/Snake_case
should be saved as a DataFrame called castaway_details. Make sure that the data files are in the
same folder as your notebook.
Note: You may or may not need to install openpyxl for the code below to work. You can use: $ pip
install openpyxl
In [ ]:
# import data from Excel

# setup Filename and Object
fileName = "survivor.xlsx"
xls = pd.ExcelFile(fileName)

# import individual sheets
castaway_details = pd.read_excel(xls, 'Castaway Details')
castaways = pd.read_excel(xls, 'Castaways')
challenge_description = pd.read_excel(xls, 'Challenge Description')
challenge_results = pd.read_excel(xls, 'Challenge Results')
confessionals = pd.read_excel(xls, 'Confessionals')
hidden_idols = pd.read_excel(xls, 'Hidden Idols')
jury_votes = pd.read_excel(xls, 'Jury Votes')
tribe_mapping = pd.read_excel(xls, 'Tribe Mapping')
viewers = pd.read_excel(xls, 'Viewers')
vote_history = pd.read_excel(xls, 'Vote History')
season_summary = pd.read_excel(xls, 'Season Summary')
season_palettes = pd.read_excel(xls, 'Season Palettes')
tribe_colours = pd.read_excel(xls, 'Tribe Colours')
Exercise1: Change every column name of every DataFrame to lowercase and snake case. This is a
standard first step for some programmers as lowercase makes it easier to write and snake case
makes it easier to copy multiple-word column names.
For example, Castaway Id should end up being castaway_id. You should try doing this using a fo
loop instead of manually changing the names for each column. It should take you no more than a
few lines of code. Use stackoverflow if you need help.
In [ ]:
### ENTER CODE HERE ###
Q2: What contestant was the oldest at the time of their season? We want to look at their age at the
time of the season and NOT their cu
ent age. Select their row from the castaway_details
DataFrame and save this as Q2. This should return a DataFrame and the index and missing values
should be left as is.
In [ ]:
### ENTER CODE HERE ###
Q3: What contestant played in the most number of seasons? Select their row from the
castaway_details DataFrame and save this as Q3. This should return a DataFrame and the index
and missing values should be left as is.
In [ ]:
### ENTER CODE HERE ###
https:
openpyxl.readthedocs.io/en/stable
Q4: Create a DataFrame of all the contestants that won their season (aka their final result in the
castaways DataFrame was the 'Sole Survivor'). Call this DataFrame sole_survivor. Note that
contestants may appear more than one time in this DataFrame if they won more than one season.
Make sure that the index goes from 0 to n-1 and that the DataFrame is sorted ascending by season
number.
The DataFrame should have the same columns, and the columns should be in the same order, as
the castaways DataFrame.
In [ ]:
### ENTER CODE HERE ###
Q5: Have any contestants won more than one time? If so, select their records from the sole_survivo
DataFrame, sorting the rows by season. Save this as Q5. If no contestant has won twice, save Q5
as the string None.
In [ ]:
### ENTER CODE HERE ###
Q6: Using value_counts(), what is the normalized relative frequencies (percentage)
eakdown of
gender for all the contestants? Count someone who played in multiple seasons only once. Round
the results to 3 decimal places. Save this as Q6.
In [ ]:
### ENTER CODE HERE ###
Q7:
● What percentage of times has a male won his season? Save this percentage as Q7A.
● What percentage of time has a female won her season? Save this percentage as Q7B.
● Note: Round all percentages to two decimal points and write as a float (example: 55.57).
● Note 2: If a contestant has won twice, count each win separately.
In [ ]:
### ENTER CODE HERE ###
In [ ]:
### ENTER CODE HERE ###
Q8: What is the average age of contestants when they appeared on the show? Save this as Q8.
Round to nearest integer.
In [ ]:
### ENTER CODE HERE ###
Q9: Who played the most total number of days of Survivor? If a contestant appeared on more than
one season, you would add their total days for each season together. Save the top five contestants
in terms of total days played as a DataFrame and call it Q9, sorted in descending order by total days
played.
The following columns should be included: castaway_id, full_name, and total_days_played where
total_days_played is the sum of all days a contestant played. The index should go from 0 to n-1.
Note: Be careful because on some seasons, the contestant was allowed to come back into the game
after being voted off. Take a look at Season 23's contestant Oscar Lusth in the castaways
DataFrame as an example. He was voted out 7th and then returned to the game. He was then voted
out 9th and returned to the game a second time. He was then voted out 17th the final time. Be aware
https:
pandas.pydata.org/docs
eference/api/pandas.Series.value_counts.html
https:
en.wikipedia.org/wiki/Ozzy_Lusth#South_Pacific
of this in your calculations and make sure you are counting the days according to the last time they
were voted off or won.
In [ ]:
### ENTER CODE HERE ###
Q10A & Q10B: Using the castaway_details data, what is the percentage of total extroverts and
introverts that have played the game (count players only once even if they have played in more than
one season). Do not count contestants without a personality type listed in your calculations. Save
these percentages as Q10A and Q10B respectively. Note: Round all percentages to two decimal
points and write as a float (example: 55.57).
For more information on personality types check this Wikipedia article.
In [ ]:
### ENTER CODE HERE ###
In [ ]:
### ENTER CODE HERE ###
Q11A & Q11B: Now that we know the percentages of total players that are extroverted and
introverted, let's see if that made a difference in terms of who actually won their season.
What is the percentage of total extroverts and introverts that have won the game (count players only
once even if they have won more than one season)? Save these percentages as Q11A and Q11B
espectively. Note: Round all percentages to two decimal points and write as a float (example:
55.57).
In [ ]:
### ENTER CODE HERE ###
In [ ]:
### ENTER CODE HERE ###
Q12: Which contestants have never received a tribal council vote (i.e. a vote to be voted out of the
game as shown in the vote_id column in the vote_history DataFrame)? Note that there are various
easons for a contestant to not receive a tribal vote: they quit, made it to the end, medical
emergency, etc. Select their rows from
Answered 32 days After Nov 03, 2022

Solution

Mohd answered on Dec 06 2022
37 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here