PYTHON ASSIGNMENTSend code as well as paste plots generated in a word, also explain what it...

Question

PYTHON ASSIGNMENTSend code as well as paste plots generated in a word, also explain what it concludesLet us use data analytical skills to determine which factors contribute to higher medical costs. The insurance.csv dataset is related to individual medical costs billed by health insurance companies. It also includes some personal information. Use from these -- matplotlib pyplot numpy pandaAssignmentData Description · age: age of primary beneficiary · sex: insurance contractor gender, 1 (female), 0 (male) · bmi: body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9 · children: number of children covered by health insurance / number of dependents · smoker: 1 (smoking), 0 (non-smoking) · region: the beneficiary's residential area in the US, 0 (southwest), 1(southeast), 2 (northwest), 3 (northeast) · charges: individual medical costs billed by health insurance Questions 1. We will examine if bmi has an impact on the medical costs. Put the bmi on the x-axis. The color of each point will be set according to whether the patient is a smoker. Set the transparency to be 0.7. Be sure to include the coloar, and set appropriate labels for x-axis, y-axis and the coloar. What business insights can you get? 2. We further compare the distribution of the medical costs of smokers and that of non-smokers. Plot the distribution of medical costs of smokers first. Then on the same figure, plot the distribution of medical costs of non-smokers and set the transparency to 0.6. The number of bins is 12 for both plots. Set appropriate labels and legends. 3. We study whether age is an important factor by comparing the distribution of medical costs of young people and that of elder people. On the same plot, generate a histogram of medical costs of patients younger than 40 years old, and then another histogram representing the rest of the patients. Set the transparency of the second histogram to 0.7. The number of bins is 15 for both histogram. Set appropriate labels and legends. What can you conclude from this figure? 4. Open-ended question. Now it is your turn to discover something interesting and valuable! What else can you conclude from this dataset using the data visualization skills we leant? Generate two more figures and explain your findings. PART 2 of Assignment.  Visualization Practice: Bike Sharing Systems Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Cuently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues. Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and aival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data. Data Description: We will be using the daily version of the Capital Bikeshare System dataset from the UCI Machine Learning Repository. This data set contains information about the daily count of bike rental checkouts in Washington, D.C.’s bikeshare program between 2011 and 2012. It also includes information about the weather and seasonal/temporal features for that day (like whether it was a weekday). • day: Day of the record (relative to day 1: XXXXXXXXXX)• season: Season (1:winter, 2:spring, 3:summer, 4:fall)• weekday: Day of the week (0=Sunday, 6=Saturday)• workingday: If day is neither weekend nor holiday is 1, otherwise is 0. • weathersit:– 1: Clear, Few clouds, Partly cloudy, Partly cloudy– 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist– 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds• temp: Normalized temperature in Celcius• windspeed: Normalized wind speed• casual: Count of checkouts by casual/non-registered users • registered: Count of checkouts by registered users• cnt: Total checkouts [ ]: import pandas as pddaily = pd.read_csv('day.csv') daily.head() Questions: 1. Understand Trends. Generate a line chart to show the checkouts over time by using day column as the x-axis and cnt column as the y-axis. Label the x-axis as ‘Day’, and y-axis as ‘Check Outs’. What can you conclude? 2. Explore Relationships. We will plot the daily count of bikes that were checked out by casual/non-registered users against the temperature. Color the points to be ‘#539cab’. Set the transparency to be 0.7. Be sure to include appropriate labels for x-axis and y-axis. What insight can you get? 3. Explore Relationships with Multidimensional Information. We will plot the daily count of bikes that were checked out by casual/non-registered users against the temperature. The color of each point will be set according to whether it is a working day. Set the trans- parency to be 0.7. Be sure to include appropriate labels for x-axis and y-axis. Change the legend of the color bar to whether it is a working day. What additional insights can you get? 4. Examine Distributions. Let’s first build a histogram of the registered bike checkouts with the number of bins as 10. Set appropriate labels. Also set the title to be “Distribution of Registered Check Outs”. 5. Compare Distributions. We now compare the distributions of registered and casual check- outs. To make the figure easy to understand, additional to the histogram we made for the previous question, we will set the transparency of the casual one to 0.8 and the number of bins to 5. Set appropriate labels. 6. How do the temperatures change across the seasons? You need to choose the type of visual- ization that best serves this purpose. What are the mean and median temperatures? 7. What else can you conclude from this dataset by using various data exploration?  age,sex,bmi,children,smoker,region,charges19,1,27.9,0,1,0, XXXXXXXXXX18,0,33.77,1,0,1, XXXXXXXXXX28,0,33,3,0,1, XXXXXXXXXX33,0,22.705,0,0,2, XXXXXXXXXX32,0,28.88,0,0,2, XXXXXXXXXX31,1,25.74,0,0,1, XXXXXXXXXX46,1,33.44,1,0,1, XXXXXXXXXX37,1,27.74,3,0,2, XXXXXXXXXX37,0,29.83,2,0,3, XXXXXXXXXX60,1,25.84,0,0,2, XXXXXXXXXX25,0,26.22,0,0,3, XXXXXXXXXX62,1,26.29,0,1,1, XXXXXXXXXX23,0,34.4,0,0,0, XXXXXXXXXX56,1,39.82,0,0,1, XXXXXXXXXX27,0,42.13,0,1,1, XXXXXXXXXX19,0,24.6,1,0,0, XXXXXXXXXX52,1,30.78,1,0,3, XXXXXXXXXX23,0,23.845,0,0,3, XXXXXXXXXX56,0,40.3,0,0,0, XXXXXXXXXX30,0,35.3,0,1,0, XXXXXXXXXX60,1,36.005,0,0,3, XXXXXXXXXX30,1,32.4,1,0,0, XXXXXXXXXX18,0,34.1,0,0,1, XXXXXXXXXX34,1,31.92,1,1,3, XXXXXXXXXX37,0,28.025,2,0,2, XXXXXXXXXX59,1,27.72,3,0,1, XXXXXXXXXX63,1,23.085,0,0,3, XXXXXXXXXX55,1,32.775,2,0,2, XXXXXXXXXX23,0,17.385,1,0,2, XXXXXXXXXX31,0,36.3,2,1,0,3871122,0,35.6,0,1,0, XXXXXXXXXX18,1,26.315,0,0,3, XXXXXXXXXX19,1,28.6,5,0,0, XXXXXXXXXX63,0,28.31,0,0,2, XXXXXXXXXX28,0,36.4,1,1,0, XXXXXXXXXX19,0,20.425,0,0,2, XXXXXXXXXX62,1,32.965,3,0,2, XXXXXXXXXX26,0,20.8,0,0,0,2302.335,0,36.67,1,1,3, XXXXXXXXXX60,0,39.9,0,1,0, XXXXXXXXXX24,1,26.6,0,0,3, XXXXXXXXXX31,1,36.63,2,0,1, XXXXXXXXXX41,0,21.78,1,0,1, XXXXXXXXXX37,1,30.8,2,0,1, XXXXXXXXXX38,0,37.05,1,0,3, XXXXXXXXXX55,0,37.3,0,0,0, XXXXXXXXXX18,1,38.665,2,0,3, XXXXXXXXXX28,1,34.77,0,0,2, XXXXXXXXXX60,1,24.53,0,0,1, XXXXXXXXXX36,0,35.2,1,1,1, XXXXXXXXXX18,1,35.625,0,0,3, XXXXXXXXXX21,1,33.63,2,0,2, XXXXXXXXXX48,0,28,1,1,0, XXXXXXXXXX36,0,34.43,0,1,1, XXXXXXXXXX40,1,28.69,3,0,2, XXXXXXXXXX58,0,36.955,2,1,2, XXXXXXXXXX58,1,31.825,2,0,3, XXXXXXXXXX18,0,31.68,2,1,1, XXXXXXXXXX53,1,22.88,1,1,1, XXXXXXXXXX34,1,37.335,2,0,2, XXXXXXXXXX43,0,27.36,3,0,3, XXXXXXXXXX25,0,33.66,4,0,1, XXXXXXXXXX64,0,24.7,1,0,2, XXXXXXXXXX28,1,25.935,1,0,2, XXXXXXXXXX20,1,22.42,0,1,2, XXXXXXXXXX19,1,28.9,0,0,0, XXXXXXXXXX61,1,39.1,2,0,0, XXXXXXXXXX40,0,26.315,1,0,2, XXXXXXXXXX40,1,36.19,0,0,1, XXXXXXXXXX28,0,23.98,3,1,1, XXXXXXXXXX27,1,24.75,0,1,1, XXXXXXXXXX31,0,28.5,5,0,3, XXXXXXXXXX53,1,28.1,3,0,0, XXXXXXXXXX58,0,32.01,1,0,1, XXXXXXXXXX44,0,27.4,2,0,0, XXXXXXXXXX57,0,34.01,0,0,2, XXXXXXXXXX29,1,29.59,1,0,1, XXXXXXXXXX21,0,35.53,0,0,1, XXXXXXXXXX22,1,39.805,0,0,3, XXXXXXXXXX41,1,32.965,0,0,2, XXXXXXXXXX31,0,26.885,1,0,3, XXXXXXXXXX45,1,38.285,0,0,3, XXXXXXXXXX22,0,37.62,1,1,1, XXXXXXXXXX48,1,41.23,4,0,2, XXXXXXXXXX37,1,34.8,2,1,0, XXXXXXXXXX45,0,22.895,2,1,2, XXXXXXXXXX57,1,31.16,0,1,2, XXXXXXXXXX56,1,27.2,0,0,0, XXXXXXXXXX46,1,27.74,0,0,2, XXXXXXXXXX55,1,26.98,0,0,2, XXXXXXXXXX21,1,39.49,0,0,1, XXXXXXXXXX53,1,24.795,1,0,2, XXXXXXXXXX59,0,29.83,3,1,3, XXXXXXXXXX35,0,34.77,2,0,2, XXXXXXXXXX64,1,31.3,2,1,0, XXXXXXXXXX28,1,37.62,1,0,1, XXXXXXXXXX54,1,30.8,3

Rajashekar · Accepted Answer

Name:		Date:
PYTHON ASSIGNMENT
1.Insurance Dataset
Use data analytical skills to determine which factors contribute to higher medical costs. The insurance.csv dataset is related to individual medical costs billed by health insurance companies. It also includes some personal information.
1.1. Questions 
1. We will examine if bmi has an impact on the medical costs. Put the bmi on the x-axis. The color of each point will be set according to whether the patient is a smoker. Set the transparency to be 0.7. Be sure to include the color bar, and set appropriate labels for x-axis, y-axis and the color bar. What business insights can you get? 
Insights
The most obvious trend that we can observe here is that non-smokers have lower average charges accumulated compared to smokers.
Various other insights that can be derived from the plot are as follow:
1. Maximum number of people who are Non-smokers do not incur more than 15000 with few outliers that do not exceed 40000.
2. The BMI of non-smokers is fairly distributed from 20-40
3. People that smoke and have a BMI between 15 and 30 incur higher charges than non-smokers with charges ranging between 15000 and 30000.
4. A significant number of smokers with BMI between 30 and 40 incur the highest amount of charges ranging from 30000 to 50000.
2. We further compare the distribution of the medical costs of smokers and that of non-smokers. Plot the distribution of medical costs of smokers first. Then on the same figure, plot the distribution of medical costs of non-smokers and set the transparency to 0.6. The number of bins is 12 for both plots. Set appropriate labels and legends. 
3. We study whether age is an important factor by comparing the distribution of medical costs of young people and that of elder people. On the same plot, generate a histogram of medical costs of patients younger than 40 years old, and then another histogram representing the rest of the patients. Set the transparency of the second histogram to 0.7. The number of bins is 15 for both histograms. Set appropriate labels and legends. What can you conclude from this figure? 
Insights
1. Majority of the young patients incur very insignificant charges signifying their superior health owing to their lower age. Major number of young patients incur charges less than 10000 with few percentages of patients incurring 20000 and 35000.
2. Compared to young patients the number of other patients with charges around 10000 is significantly less (200-250 compared to 350 of young patients). These patients on average incur higher costs compared to younger patients with the highest costs being more than 60000.
3. The costs incur increase as the age deteriorates.
4. Open-ended question. Now it is your turn to discover something interesting and valuable! What else can you conclude from this dataset using the data visualization skills we leant? Generate two more figures and explain your findings. 
Insights
When we compare how male patients and female patients are associated with cost we observe that distribution is mostly similar with higher number of male patients incur larger charges between 30000 and 50000. This indicates that the charges incurred by patients are determined mostly by other factors like age and smoking as explored earlier.
Insights
1. Women with age between 30 and 53 incur the highest amount of charges.
2. Men with age between 43 and 52 incur highest amount of charges.
3. This indicates that women spend admitted to the hospital over a wide range of age groups compared to men
Insights
The south-East region has the highest number of smokers and consequently incur the highest amount of charges. The number of Non-smokers is evenly distributed across all regions with the south-west region having higher number of Non-smokers compared to smokers
2.Bike rental Dataset
The daily version of the Capital Bikeshare System dataset from the UCI Machine Learning Repository. This data set contains information about the daily count of bike rental checkouts in Washington, D.C.’s bikeshare program between 2011 and 2012. It also includes information about the weather and seasonal/temporal features for that day (like whether it was a weekday). 
2.1. Questions 
1. Understand Trends. Generate a line chart to show the checkouts over time by using day column as the x-axis and cnt column as the y-axis. Label the x-axis as ‘Day’, and y-axis as ‘Check Outs’. What can you conclude? 
Insights
1. The general trend for both years seems to show that number of checkouts steadily increase over the year until they peak mid-year.
2. They steadily decrease until the end of the year and consecutively pick up as the next year progress following the same trend as previous year.
3. The number of overall checkouts significantly increase in the second year.
2. Explore Relationships. We will plot the daily count of bikes that were checked out by casual/non-registered users against the temperature. Color the points to be ‘#539cab’. Set the transparency to be 0.7. Be sure to include appropriate labels for x-axis and y-axis. What insight can you get? 
Insights
1. People rent bikes less when it is colder as evident from the graph which shows checkouts ranging from 1500 to 4000 until 0.4 temp. This indicates the existence of various other factors like road conditions, body temperature, etc.
2. Highest number of check outs occur at mild temperatures with majority ranging between 4000 and 8000.
3. As the temperature increases the check outs decease but are still significantly higher than checkouts at lower temperatures
3. Explore Relationships with Multidimensional Information. We will plot the daily count of bikes that were checked out by casual/non-registered users against the temperature. The color of each point will be set according to whether it is a working day. Set the transparency to be 0.7. Be sure to include appropriate labels for x-axis and y-axis. Change the legend of the color bar to whether it is a working day. What additional insights can you get? 
Insights
People rent bikes on working days much more than non-working days. This indicates that people are using bikes to travel to their work destinations more than they need for leisure on non-working days.
4. Examine Distributions. Let’s first build a histogram of the registered bike checkouts with the number of bins as 10. Set appropriate labels. Also set the title to be “Distribution of Registered Check Outs”. 
Insights
The number of check outs that occur per day mostly fall between the 3000-4000 range.	
5. Compare Distributions. We now compare the distributions of registered and casual check- outs. To make the figure easy to understand, additional to the histogram we made for the previous question, we will set the transparency of the casual one to 0.8 and the number of bins to 5. Set appropriate labels.
Insights
1. The casual renters generally rent out fewer number of times compared to registered renters indicating that most of casual renters are not returning customers.
2. The maximum checkouts for casual renters steadily decrease to 3000.
3. From this we can concur that casual renters check out only once or the number of check outs by them happens during non-working days or holidays only.
6. How do the temperatures change across the seasons? You need to choose the type of visualization that best serves this purpose. What are the mean and median temperatures? 
Insights
1. The temperature varies differently for different seasons for the two seasons as shown in the graphs. 
2. The mean temperature for each season varies with winter mean temp as 0.3, spring mean temp is 0.55, summer mean temp is 0.72 and fall mean temp is 0.41.
3. We observe highest temperatures reaching in summer to 0.

PYTHON ASSIGNMENT Send code as well as paste plots generated in a word, also explain what it concludes Let us use data analytical skills to determine which factors contribute to higher medical costs....

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment