nstruction: This homework needs to solved using both excel and python. Please submit one individual file for each question – use naming convention, ques*_studentname. *Attached is thedataset.
Download dataset.
For this exercise, we use theHepatitis Disease datasetfrom UCI data repository. This data consists of 156 instances with 20 attributes.
Attribute information:
1. Class: DIE, LIVE
2. AGE: 10, 20, 30, 40, 50, 60, 70, 80
3. SEX: male, female
4. STEROID: no, yes
5. ANTIVIRALS: no, yes
6. FATIGUE: no, yes
7. MALAISE: no, yes
8. ANOREXIA: no, yes
9. LIVER BIG: no, yes
10. LIVER FIRM: no, yes
11. SPLEEN PALPABLE: no, yes
12. SPIDERS: no, yes
13. ASCITES: no, yes
14. VARICES: no, yes
15. BILIRUBIN: Continuous
16. ALK PHOSPHATE: Continuous
17. SGOT: Continuous
18. ALBUMIN: Continuous
19. PROTIME: Continuous
20. HISTOLOGY: no, yes
Question no 1: Use Excel and the hepatitis dataset. Answer the following questions:
XXXXXXXXXX=7)
a. Probability of a Male patient being dead.
b. There is one patient with attribute ANOREXIA value to be "?" -- question is, what is the likely value of this attribute for this patient?
c. What is the probability that a patient between age [10,50] use steroid? (Replace “?” with “Yes”)
d. Which one is more likely, a person with no ANTIVIRALS being Alive or a person with MALAISE being dead?
e. Which Age group is more likely to be dead ? What are the probabilities? (Group the ages in 3 groups. 20-40, 40-60, 60-80)
f. Is the age attribute normally distributed? Reason why or why not?
[ for Question no 1: you are allowed to use inbuilt excel function. As an example, for probability of a male being dead, I would like to see something as follows:
"This question could be answered by finding xx and doing xxx".
Show how finding XX
How doing XXXX
Therefore answer is:
2.Use Excel/Python and the Hepatitis dataset: (3+2= 5)
- Create 3 different visualizations showing the mean and standard deviation (orstandard erroras it is referred to in this context) of the sampling distributions of sample age for sample sizes: 2, 5, 10
- What happens to the mean of the sample means of age as the sample size is increased? What happens to the standard error ?
[description: In addition to doing the work in python or excel, you need to write a descriptive answer that summarizes your findings.
Question no 3: USE PYTHON (1+2+2)
a. Generate a discrete uniform distribution of population size 100 between interval (1,10).)
b Consider the sample size of N=10, Simulate the sampling distribution of the sample mean. (repeat 100 times) Draw the visualization.
c Consider the sample size of N=30, what is the sample mean and sample standard deviation? (repeat 100 times). Draw the visualization.
[Code+ graphs]
here is file if you need anything else let me know
also please if your doing work in excel not put screen shot just done in excle whatever question needs and asked