BSTAT 3321 Projects

The purpose of this project is to infer from a random sample of 30 countries some characteristics of all of the countries in the world. The following variables will be considered.

** Economy**

**a. **GDP **per capita **in dollars (PPP)

- Unemployment as a percent
- Inflation rate as a percent

**People and Society**

- Female literacy rate as a percent
- GDP spent on education as a percent
- Infant mortality rate as total deaths per 1000 live births
- Predominant religion as a name e.g. Protestant Christianity for the United States

- GDP per capita – histogram (4 points)
- Unemployment –histogram (4 points)
- Predominant religion – pie chart (max of XXXXXXXXXXslices) (4 points)
- GDP spent on education as a percent vs. GDP per capita in dollars – scatterplot (4 points)
- Infant mortality vs. Female literacy – scatterplot (4 points)
- Construct a 95% confidence interval for the world’s mean GDP per capita and mean GDP spent on education as a percent XXXXXXXXXXpoints each)
- Do your data support the hypothesis that more than half of the countries in the world are predominately Christian at the XXXXXXXXXXsignificance level? At the .10 level? (20 points)
- Regression and Correaltion analysis
- Determine the sample regression line relating infant mortality to female literacy and plot it on a new scattergram XXXXXXXXXXpoints)
- Test the hypothesis that b
_{1}= 0. (4 points) - Compute a 95% confidence interval estimate for b
_{1}. (4 points) - With 95% confidence, estimate an average for all countries and predict for an individual country the infant mortality rate when female literacy rate is XXXXXXXXXXpoints)
- Compute the correlation coefficient and coefficient of determination. (4 points)
- Interpret the results of the above analysis (parts 1, 2, 3, and 4) in a managerial summary of no more than one page using non-technical language, meaningful to a person who has never had a statistics course. (20 points)

**Bonus: **Appearance of the project: cover, table of contents, color, organization,

clarity (10 points)

1. a Following is the histogram for GDP per capita

. Following is the histogram for unemployment

c. Following is the pie-chart for predominant religions

d. Following is the scatterplot between GDP spent on education as a percentage and GDP per capita

It can be seen that there is a negative relationship between GDP spent on education as percentage and GDP per capita.

e. Following is the scatterplot between Infant mortality and Female literacy

It can be seen that there is a negative relationship between female literacy rate and infant mortality rate.

2. Confidence interval for world’s mean GDP per capita:

World’s mean GDP per capita

Average

28455.17

Standard deviation

32086.04

Sample size

30

Confidence Coefficient

(at 5% level of significance)

1.96

Margin of e

o

11481.84

Upper bound

39937.02

Lower bound

16973.33

Thus, the confidence interval for world’s mean GDP per capita is $16,73.33 and $ 39,937.02

Confidence interval for mean GDP spent on education as a percent:

Mean GDP spent on education as a percent

Average

4.32

Standard deviation

0.019972

Sample size

30

Confidence Coefficient

1.96

Margin of e

o

0.007147

Upper bound

4.327147

Lower bound

4.312853

Confidence interval for mean GDP spent on education as a percent is between 4.312853 and 4.3217147

3. We have to test the hypothesis that whether more than half of the countries in the world are predominantly Christian.

We have to make a proportion test and proportion p here is 0.5 i.e. half the population

The null and alternative hypotheses are as follows:

H0 : p = 0.5

H1 : p > 0.5

Number of occurences (Christian in this case)

13

Sample Size

30

Sample proportion

0.433333

Standard deviation

0.091287

p-value

0.767395591

The p-value is greater than 0.05 and we can conclude that we accept null hypothesis and reject alternative hypothesis. Thus, more than half of the countries in the world are not predominantly Christian.

4. a. The regression line is given by the equation:

Y = α + β1X + Є

Where Y is the dependent variable i.e. infant mortality rate and independent variable X is female literacy.

The regression output is given as below:

SUMMARY OUTPUT

Regression Statistics

Multiple...

