Great Deal! Get Instant \$25 FREE in Account on First Order + 10% Cashback on Every Order Order Now

# Motivation The first step in the statistical process involves asking a research question. Some questions can be answered with a statistical method from Math 361, others require a more advanced...

Motivation
The first step in the statistical process involves asking a research question. Some questions can be answered with a statistical method from Math 361, others require a more advanced statistical technique.  Many questions are not suitable for statistical analysis at all, being better suited to philosophy, mathematics, direct experimentation, etc.

Instructions
Can a method from our “methods” table be used to answer the research questions?  Think about the number and type of variable that would need to be collected to answer the question.  Then see if these match a column of the “methods” table.  If yes, decide whether a statistic or graph would suffice or if you should use a confidence interval or test of significance.  If you choose a test, write the null and alternative hypotheses for the question. Your answers for each of the three questions are worth 8 points, for a total of 24 points.
Relevant Class Material
"methods" table, variables, observational units, types of variable

SECTION 1 (24 points)
Fill in the table.  Some cells may be left empty depending on your answer to the first question.
Question
Do a higher proportion of people live in rural areas now than before Covid?
What proportion of people in Klamath Falls are mo
idly obese?
On average, how much do students spend on textbooks a term?
Can a method from this class be used?  If yes, answer the following questions. If not,
iefly explain how the question should be answered without statistics.
Yes or no

What variable(s) need to be collected to answer this question?

What is the type of each variable?
Numerical or binary categorical

What observational unit(s) could the variable(s) be collected on?

Assuming a sample is used, state the population

Would a test or a confidence interval be more appropriate for this question?
Confidence Interval or Test?

State the name of the most appropriate approximate method from the "methods" table
t or z?
One sample or two sample?
Paired or independent (if two sample)?

State the name of the type of graph you recommended for the variable(s)
Boxplot(s),
arplot, histogram(s), scatterplot, or stacked barplot?

SECTION 2:
The second step in the statistical process is to make a plan for collecting data, analyzing data and making a conclusion from the data. Careful planning can reduce biases in estimation due to data collection or study design.

In this section you will create a plan to answer the research question, "Does serving in student government during high school lead to higher earnings at age 30 for people in the US?"

There are 11 questions here, each worth 2 points with the exception of question 3.

What is the population in the research question?

What variable(s) must be collected to answer the research question?

(4 points) Frame the research question as null and alternative hypotheses with an appropriate parameter.  Write both hypotheses using appropriate symbols. (In OneNote, you can "insert" a "symbol" to obtain π or µ)

Briefly explain why it is not feasible to collect data via a simple random sample in order to answer the research question.

Is it feasible to perform a randomized controlled experiment (RCT) to answer this research question?  Briefly explain your reasoning.

Which of these three possible explanations is most directly addressed by the computation of a p-value for the null and alternative hypotheses of question 3?

Choose 1:  causal effect                            chance                             confounding variable

Identify a possible confounding variable in this study and
iefly explain how it relates to both obtaining a student government participation and earnings at age 30.

What is a Type I e
or in the context of this research question?
Once the data is collected on your response and treatment variables, you will be ready to do the data analysis.
Sketch or insert a table of how you plan to summarize the dataset you collect (means or medians, standard deviation or MAD…). Include pretend numbers and be sure to label the columns and rows.

Once the data is collected on your response and treatment variables, you will be ready to do the data analysis.
Which inferential method do you plan to use?

The last step in the plan is to decide how you form a conclusion based on the data analysis.  The p-value from a test will tell you the probability of seeing a difference as extreme as the difference in your dataset assuming the null hypothesis is true.  Choose one option below and
iefly outline your conclusions under the following scenarios:

I choose option ___

Option 1: Ignore the potential for confounding bias and choose between "by chance" and "causal effect" via a p-value  as following:

If the p-value is less than ______, I will conclude that _____________
If the p-value is above _________, I will conclude that ____________

Option 2: Decide the potential for confounding bias is so extreme that it is not worthwhile to do a test using only the response and treatment variables. In this case,
iefly explain how the method of subclassification could be used to adjust this plan to account for the confounding variable you are wo
SECTION 3:
The "parks.csv" dataset in Canvas dataset contains information on all recorded visits to National Parks.  This (and much more) is available for public download here: STATS - National Reports (nps.gov). The parks.csv dataset has the annual visits in 2018 and 2019 by type of visit.  The types are
Recreational visits (RV)
Non-recreational visits (NRV)
Concessioner Lodging (CL)
Concessioner Camping (CC)
Tent overnights (TO)
RV overnights (RVO)
Backcountry overnights (BO)
Non-recreational overnights (N)
Misc. overnights (MO)
EXCEL FILE ATTACHED FOR THIS SECTION
Here's a blank code notebook if you're using R:

https:
You will likely find the R code helpful

Use the dataset "parks.csv" to answer the following questions.

(2 points) How many parks are included in the dataset?

2. (4 points) In 2019, what was largest number of Backcountry overnight visits to a single park? Which park was it?

3. (6 points) Create a subset of the dataset for parks with a non-zero number of Backcountry overnights in 2019. Briefly describe the distribution of Backcountry overnights in 2019 by computing a measure of spread, a measure of center and the number of parks with non-zero Backcountry Overnight visits.

Number:
Center:
4. (6 points) Using the full dataset, compute the difference between the number of Backcountry overnights in 2018 and 2019 for each park.  Create an appropriate graph of this variable and write a sentence or two describing what you learned about it's distribution. Include a screenshot of your graph and comment on center, shape, spread and any outliers, as appropriate.

5. (6 points) Review the list of available park names, e.g. by opening "parks.csv" in Excel.  Choose a park and a variable you're interested in and compare the variable's value for your park with the distribution of that variable for all parks.  For example, compute the median and MAD for the variable for all parks and say how your park compares to these values.  Write a sentence describing what you learned.
Include a printout of your R code or Jamovi screenshot or Excel spreadsheet below:
Answered 2 days AfterMar 14, 2022

## Solution

Suraj answered on Mar 16 2022
Section 1:
Question
Do a higher proportion of people live in rural areas now than before Covid?
What proportion of people in Klamath Falls are mo
idly obese?
On average, how much do students spend on textbooks a term?
Can a method from this class be used?  If yes, answer the following questions. If not,
iefly explain how the question should be answered without statistics.
Yes or no
Yes
Yes
Yes
What variable(s) need to be collected to answer this question?

Proportion of people in rural area before and after covid
Proportion of people in Klamath Falls are mo
idly
Average time spend on textbooks in a term
What is the type of each variable?
Numerical or binary categorical
Binary categorical
Binary categorical
Numerical
What observational unit(s) could the variable(s) be collected on?

Yes or No
Yes or No
Average time in hours
Assuming a sample is used, state the population

People who live in Rural area
People of Klamath Falls
Students of a school
Would a test or a confidence interval be more appropriate for this question?
Confidence Interval or Test?
Test
Test
Test
State the name of the most appropriate approximate method from the "methods" table
t or z?
One sample or two sample?
Paired or independent (if two sample)?
Two sample proportion z test
One sample proportion z test
One sample t-test
State the name of the type of graph you recommended for the variable(s)
Boxplot(s),
arplot, histogram(s), scatterplot, or stacked barplot?
Stacked Bar Plot
Bar plot
Boxplot
Section 2:
The second step in the statistical process is to make a plan for collecting data, analyzing data and making a conclusion from the data. Careful planning can reduce biases in estimation due to data collection or study design.

In this section you will create a plan to answer the research question, "Does serving in student government high school lead to higher earnings at age 30 for people in the US?"

There are 11 questions here, each worth 2 points with the exception of question 3.

1.
What is the population in the research question?
Solution:
The population for this research question is all people who are serving in the student government high school in US and all the people of age 30 in US.

2.
What variable(s) must be collected to answer the research question?
Solution:
The Variables needed to collect for this research question is the salary of age 30 people and the salary of the people at the student government high school.

3.
(4 points) Frame the research question as null and alternative hypotheses with an appropriate parameter.  Write both hypotheses using appropriate symbols. (In OneNote, you can...
SOLUTION.PDF