Problem Set 1Problem Set 1Statistics 100Due June 29, 2020 at 11:59 pmProblem set policies. Please...

Question

Problem Set 1Problem Set 1Statistics 100Due June 29, 2020 at 11:59 pmProblem set policies. Please provide concise, clear answers for each question. Note that only writing the result ofa calculation (e.g., "SD = 3.3") without explanation is not sufficient. For problems involving R, be sure to includethe code in your solution.Please submit your problem set via Canvas as a PDF, along with the R Markdown source file.We encourage you to discuss problems with other students (and, of course, with the teaching team), but you mustwrite your final answer in your own words. Solutions prepared "in committee" are not acceptable. If you docollaborate with classmates on a problem, please list your collaborators on your solution.Problem 1.For each of the following scenarios, discuss (in at most five sentences) the main issue(s) withespect to sampling or reporting bias.a) A particular city has 14 architects who own their own firm. To select a survey sample, eacharchitect was contacted via telephone by order of appearance in the telephone directory,then the first 8 that agreed to be interviewed formed the sample.) The September 1992 issue of Prevention magazine included a women’s health survey; ap-proximately 16,500 women responded to the survey. The May 1993 issue reported on thesurvey results, claiming that “92% of our readers rated their health as excellent, very good,or good”.c) Many scholars and policymakers are interested in estimating the prevalence of mental ill-ness among the homeless population. In one study, the authors sampled homeless personswho received medical attention from a clinic that was part of the Health Care for the Home-less project, resulting in an estimated prevalence of 33%.1 The authors maintain that se-lection bias is not a serious problem because the clinics are easily accessible to homelesspeople.Problem 2.A recently published analysis examined 10 studies that measured optimism and pessimism byasking participants about their level of agreement with statements like “In uncertain times, Iusually expect the best,” or “I rarely expect good things to happen to me”. Optimistic peopletend to expect that they will encounter favorable outcomes, whereas less optimistic people tendto expect that they will encounter unfavorable outcomes.2These studies also measured other variables on participants, including factors related to heartdisease. The analysis found that compared with pessimists, people with the most optimistic out-look had a 35% lower risk for cardiovascular events (e.g., heart attacks). The studies, on average,1This project is a federally funded program that ings general health and mental health services to homeless people.2Alan Rozanski, MD, et al. Association of optimism with cardiovascular events and all-cause mortality. JAMANetwork Open 2019; 2(9):e1912200.1observed people over a 14-year period and compared the rate of cardiovascular events betweenthose classified as optimists versus pessimists.a) A popular newspaper reports on the analysis with the headline “Thinking Positively Im-proves Cardiovascular Health”. Write a short response to the editor explaining clearly whythe headline is potentially misleading. Be sure to use language accessible to a general audi-ence without a statistics background. Limit your answer to at most five sentences.) Briefly describe a plausible study design that has the potential to demonstrate the effect ofthinking positively on cardiovascular health.c) Suppose someone who is very optimistic reads about the analysis and concludes that thefindings suggest he has a 35% lower risk for cardiovascular events than his friend who isextremely pessimistic. Explain why this is not necessarily the case.Problem 3.The following graphs are based on data from the National Center for Health Certificates.a) Describe what you see in the two graphs, with particular focus on the differences betweenthe two distributions.) Economists are interested in the possible causes driving the shape of the age distribution in2016.i. Discuss a possible reason behind the discrepancy between the 1980 distribution andthe 2016 distribution; i.e., what is a potential factor driving the difference in the distri-utions?ii. Discuss a possible reason behind the shape of the age distribution in 2016.2Problem 4.The Stanford Open Policing Project is a team of researchers and journalists at Stanford Univeristyworking to collect and standardize data on vehicle and pedestrian stops from law enforcementdepartments across the country, with the goal of investigating and improving interactions betweenthe police and the public. In a recently published analysis based on these data, the authors foundthat police stops and search decisions suffer from persistent racial bias.3In this problem, you will work with data from the Stanford Open Policing Project and conductan exploratory analysis based on approaches used by the study team. The dataset stops.Rdatacontains standardized data on police stops in Philadelphia, Pennsylvania between 2013 and 2017.Each case represents a single police stop.The variables are defined as follows:– date: date of the stop, in YYYY-MM-DD format– year: year of the stop– time: 24-hour time for the stop, in HH:MM format– location: freeform text of the location, e.g. street number and street name– lat: latitude of the stop– lng: longitude of the stop– district: police district– service_area: police service area– subject_age: age of the stopped subject– subject_race: race of the stopped subject, recorded as either white, black, hispanic,asian/pacific islander, or otheunknown– subject_sex: the recorded sex of the stopped subject– type: type of stop, either vehicular or pedestrian– aest_made: recorded as TRUE if an aest was made, and FALSE if otherwise– outcome: strictest police action taken, either aest, citation, warning, summons– contraband_found: recorded as TRUE if contraband was found from a search, and FALSE ifotherwise– frisk_performed: recorded as TRUE if a frisk was performed, and FALSE if otherwise– search_conducted: recorded as TRUE if a search was conducted, and FALSE if otherwise– search_person: recorded as TRUE if search of a person has occued, and FALSE if otherwise– search_vehicle: recorded as TRUE if search of a vehicle has occued, and FALSE if otherwiseUse these data to answer the following questions.a) Take an initial look at the stops dataset.i. How many police stops are represented in the data?ii. What date range does the data cover?iii. Of the police stops recorded, what proportion of stops occued in 2017?) Describe the distribution of age of stopped subjects, referencing numerical and graphicalsummaries as needed.3E. Pierson, C. Simoiu, J. Overgoor, S. Coett-Davies, D. Jenson, A. Shoemaker, V. Ramachandran, P. Barghouty,C. Phillips, R. Shroff, and S. Goel. A large-scale analysis of racial disparities in police stops across the United States.Nature Human Behaviour, Vol. 4, 2020.3c) To naow the scope of the analysis, we will focus on vehicular police stops that occued in2017. Subset the data appropriately and name the subset stops.subset.i. Using numerical and graphical summaries, describe the distribution of race of stoppedsubjects, among vehicular stops in 2017. Does any race appear to be oveepresented?ii. In a few sentences, iefly explain why it would be helpful to account for racial demo-graphics in Philadelphia when interpreting the values in part i.iii. The dataset population_2017.Rdata contains information about racial demographics inPhiladelphia for 2017. Use this information to compute the “stop rate” for each group,where stop rate is defined as number of police stops per member of the population. Foexample, if 10 police stops occur in which the stopped subject is Asian, and there are100 Asian members of the population, the stop rate for Asians is 10/100 = 0.10.Report the stop rate for each race group.iv. Based on the calculations in part iii., relative to white drivers, how much more often arelack drivers stopped by the police? Relative to white drivers, how much more oftenare Hispanic drivers stopped by the police?d) After a driver is stopped, officers may cay out a search of the driver or vehicle if theysuspect more serious criminal activity. One strategy for understanding whether data suggestiased decision-making is the outcome test, which is based on assessing the proportion ofsearches that successfully identify contraband. If searches of minorities are successful lessoften than searches of whites, this suggests that officers are searching minorities on the basisof less evidence.i. Calculate hit races by race in Philadelphia in 2017 for vehicular stops, where hit rateis defined as the proportion of searches in which contraband was found. Describe youfindings.It may be the case that the bar for stopping people is lower in certain police districts, andthat minorities are more likely to live in neighborhoods in those districts. The dataframehit_rates.Rdata contains the hit rate for whites, black, and Hispanics in each police districtin Philadelphia (for vehicular stops and searches in XXXXXXXXXXInformation about each district iscontained in two rows: one row contains the hit rates of black drivers and one row containsthe hit rates of Hispanic drivers.ii. Create a plot that summarizes the relationship between the hit rates of black driversand the hit rates of white drivers for police districts in Philadelphia.iii. Add a y = x line to the plot from part i. Describe what a point on the y = x line wouldepresent in context of the data.iv. With reference to the y = x line, describe what you see in the plotted data. Are theesults suggestive of bias against black drivers? Explain your answer.4Problem 5.Vitamin D is essential for growth and bone health in children. It can be either obtained fromdietary sources or produced by the body upon exposure of skin to ultraviolet waves (typically viasun exposure). Natural food sources rich in Vitamin D are scarce. Even in many low latitudecountries where sunshine is plentiful, Vitamin D deficiency is a public health concern.A study was conducted to evaluate Vitamin D status among schoolchildren in Thailand. Thestudy drew data from a randomized trial conducted in rural subdistricts of a specific suegion ofthe country that assessed the efficacy of a seasoning powder fortified with iron, zinc, iodine, andVitamin A for reducing anemia.Exposure to sunlight allows the body to produce serum 25(OH)D, which is a marker of VitaminD status. Serum 25(OH)D is then converted into a biologically active form, serum 1,25(OH)2D.Data on both serum levels were used to determine the prevalence of Vitamin D deficiency in thesubpopulation under study. Vitamin D deficiency is defined as having a serum 25(OH)D levelelow 50 nmol/L.The file vitamin_d

Bezawada Arun · Accepted Answer

---
title: "Problem Set 1"
author: "Ioannis Lamprou"
date: "26 June 2020 - 08:20"
output:
  pdf_document:
    fig_height: 3.5
    fig_width: 5
  word_document: default
geometry: margin=1in
fontsize: 11pt
---
## Problem 1.
a)
First of all, this is sampling bias. The reason is that the first eight architects whose last names are higher in the alphabet order have a higher possibility to be selected than those who their last names are lower in the alphabet order.
In an unbiased sample, each person in the population has equal chances to be sampled (selected).
b)
First of all, 92% of women that responded to the survey that their health is excellent, very good, or good does not mean that 92% of women that read this particular magazine gave these ratings. Also, this does not even mean that 92% of the magazine readers gave these ratings as well. 
So, in my personal opinion, the sample that was announced is not randomly selected between the females that read the magazine. For instance, there is a possibility that the magazine selected the women that are in excellent health who are more likely to respond to these types of surveys.
In my opinion, the magazine was inaccurately reported to its readers about this subject. 
c)
In my opinion, there is no possibility for each person to seek medical attention from a clinic. For instance, individuals that need general health services are more likely to seek medical help from clinics than people with mental illness. So, even in the case that clinics are accessible, in this sample, there is probably a selection bias. 
## Problem 2.
a)
Dear (Editors name),
The newspaper headline "Thinking Positively Improves Cardiovascular Health" is misleading as the studies do not indicate that thinking positively would improve the cardiovascular health of people. 
The studies indicated that optimistic individuals have a 35% lower likelihood of getting a heart attack. They have not shown in the results or even revealed that it would enhance your cardiovascular health.
b)
A plausible study design that has the potential to demonstrate the effect of thinking positively on cardiovascular health is the following:
  i.    The study must be in two parts: a systematic review (latest researches) and meta-analysis.

  ii.    The study must contain a large portion of the population of different ages, countries, climates for a long period of time.
  iii.    The study must take into consideration other factors that affect cardiovascular health.
c)
I will explain to him that researches does not show that thinking positively would improve the cardiovascular health of people. However, there is a strong correlation between these two but there are other factors like age, physical activity that need to be considered in the studies.
## Problem 3.
a)
The first graph on the left (1980 graph) exhibits a unimodal distribution with little right skewing for 20 years. On the other hand, the second graph (2016 graph) exhibits a bimodal distribution with two peaks, one peak in almost 20 years and the second one in 29 years approximately. 
b)
  i.
  
  Comparing these two graphs, we can identify that women have more opportunities to find jobs or access pursuing higher education, for instance, than in the past. So,

Problem Set 1 Problem Set 1 Statistics 100 Due June 29, 2020 at 11:59 pm Problem set policies. Please provide concise, clear answers for each question. Note that only writing the result of a...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment