STAT 4410/8416 Homework 4STAT 4410/8416 Homework 4lastName firstNameDue on Nov 8, 20191. Exploring...

Question

STAT 4410/8416 Homework 4STAT 4410/8416 Homework 4lastName firstNameDue on Nov 8, 20191. Exploring XML data; In this problem we will read the xml data. For this we will obtain a xml datacalled olive oils from the link http:www.ggobi.orgook/data/olive.xml. Please follow the directionsin each step and provide your codes and output.a. Parse the xml data from the above link and store in a object called olive. Obtain the root of the xmlfile and display its name.. Examine the actual file by going to the link above and identify the path of categorical variables in thexml tree. Use that path to obtain the categorical variable names. Please keep the names, not nicknames and store them in cvNames. Display cvNames.c. Now examine the file by going to the link and identify the path of real variables in the xml tree. Usethat path to obtain the real variable names. Please keep the names, not nick names and store them invNames. Display rvNames.d. Notice the path for the data in xml file. Use that path to obtain the data and store the data in a dataframe called oliveDat. Change the column names as you have obtained the column names. Displaysome data.e. Generate a plot of your choice to display any feature of oliveDat data. Notice that the column namesare different fatty acids. The values are % of fatty acids found in the Italian olive oils coming fromdifferent regions and areas.f. Explain what these two lines of codes are doing. xmlSApply(r[[1]][[2]], xmlGetAttr, "name")2. Working with date-time data; The object myDate contains the date and time when this questionwas provided to you. Based on this object answer the following questions.myDate a. Convert myDate into a date-time object with Chicago time zone. Display the result.. Write your codes so that it displays the week day of myDate.c. What weekday is it after exactly 100 years from myDate? Show your codes and the answer.d. Add one month with myDate and display the resulting date time. Explain why the time zone haschanged even though you did not ask for time zone change.e. Suppose this homework is due on November 8, 2019 by 11.59PM. Compute and display how manyminutes you got to complete this homework?3. Data Wrangling and Dates In this problem, we will be using the mdsr and Luhman packages.a. Using the presidential dataset, show a simple table that displays the number of leap years thatoccured during each president’s time in office. Please label the second “Bush” as “Bush2”.. Consider the Teams dataset from the Luhman package that provides a series of baseball statisticsover a number of years. Note that the “H” column refers to number of home runs. The followingoutlines a procedure to follow to determine the number of home runs that occued during eachpresidents’ (adjusted) time in office.1http:www.ggobi.orgook/data/olive.xmli. First, filter the Teams dataset to only include years between 1953 and 2016.ii. Next, we will partition the rows of the presidential dataset by only considering the yeaof each president’s start and end dates with the conditions that 1) if a president’s term didNOT start in January, then we will not include that year in their time in office, and 2) ifa president’s term ended in January, then that ending year will also not be included. Foexample, Johnson will be considered as having a starting year of 1964 and an ending year of1968.iii. Answer the question: Which president had the most number of home runs occur during theiterm? Report this number.4. Creating HTML Page; In this problem we would like to create a basic HTML page. Please followeach of the steps below and finally submit your HTML file on Canvas. Please note that you don’t needto answer these questions here in the .Rmd file.a. Open a notepad or any plain text editor. Write down some basic HTML codes as shown in online(year 2014) Lecture 15, slide 6 and modify according to the following questions. Save the file ashw4.html and upload on Canvas as a separate file.. Write “What is data science?” in the first header tag, h1c. Hw1 solution contains the answer of what is data science. The answer has three paragraphs.Write the three paragraphs of text about data science in three different paragraph tags p>.You can copy the text from hw1 solution.d. Write “What we learnt from hw1” in second heading under tag h2e. Copy all the points we learnt in hw1 solution. List all the points under ordered list tag ol>.Notice that each item of the list should be inside list item tag li>.f. Now we want to make the text beautiful. For this we would write some CSS codes in betweenheadhead> tag under style>. For this please refer to online (year 2014) lecture15 slide 8. First change the fonts of the body tag to Helvetica Neue.g. For the paragraph that contains the definition of data science, give an attribute id='dfn' and inCSS change the color of ‘dfn’ to white, background-color to olive and font to be bold.h. For other paragraphs, give an attribute >i. Write CSS so that color of h1,h2 becomes orange.j. Write javaScripts codes so that onClick on h1 header, it shows a message ‘Its about data science’.5. Boston hubway data; This question will explore Boston hubway data. Please carefully answer eachquestion below including your codes and results.a. Obtain the compressed data, bicycle-rents.csv.zip, from Canvas and display few data rows.. For each day, count the number of bikes rented for that date and show the data in a time seriesplot.c. Based on the rent date column, create two new columns weekDay and hourDay which representweek day name and hour of the day respectively. Store the data in myDat and display few recordsof the data. Hint: For weekday use function wday().d. Summarize myDat by weekDay based on the number of rents for each weekDay and store thedata in weekDat. Display some data.e. Create a suitable plot of the data you stored in weekDay so that it displays number of bike rentsfor each week day.f. Now we want to investigate what happens in each day. Summarize myDat again but this time byweekDay and hourDay and obtain the number of rents. Store the data in hourDat and Displaysome data.g. The dataframe hourDat is now ready for plotting. Generate line plots showing number of bikeents vs hour of the day and colored by weekDay.6. Bonus for undergraduate (3 points) mandatory for graduate students: The following linkcontains the complete texts of Romeo and Juliet written by Shakespeare. Read the complete text andgenerate a plot similar to Romeo and Juliet case study in online(year 2014) lecture 13 (last plot).http:shakespeare.mit.eduomeo_juliet/full.html2http:shakespeare.mit.eduomeo_juliet/full.html7. Bonus (2 points) question for all : In the United States, a Consumer Expenditure Survey (CE)is conducted each year to collect data on expenditures, income, and demographics. These data areavailable as public-use microdata (PUMD) files in the following link. Download the data for the yea2016 and explore. Provide some plots and numerical summary that creates some interest about thisdata.https:www.bls.gov/cex/pumd.htm3https:www.bls.gov/cex/pumd.htm

Kshitij · Accepted Answer

STAT 4410/8416 Homework 4
STAT 4410/8416 Homework 4 
lastName firstName 
Due on Nov 8, 2019 
1. Exploring XML data; In this problem we will read the xml data. For this we will obtain 
a xml data called olive oils from the link http://www.ggobi.org/book/data/olive.xml. 
Please follow the directions in each step and provide your codes and output. 
a. Parse the xml data from the above link and store in a object called olive. Obtain the 
root of the xml file and display its name. 
library("XML") 
library("xml2",lib.loc="~/R/win-library/3.4") 
library("dplyr") 
library("ggplot2") 
olive%select(name,leapyears) 
## # A tibble: 11 x 2 
##    name       leapyears 
##               
##  1 Eisenhower         2 
##  2 Kennedy            0 
##  3 Johnson            2 
##  4 Nixon              1 
##  5 Ford               1 
##  6 Carter             1 
##  7 Reagan             2 
##  8 Bush               1 
##  9 Clinton            2 
## 10 Bush2              2 
## 11 Obama              2 
b. Consider the Teams dataset from the Luhman package that provides a series of baseball 
statistics over a number of years. Note that the “H” column refers to number of home 
runs. The following outlines a procedure to follow to determine the number of home 
runs that occurred during each presidents’ (adjusted) time in office. i. First, filter the 
Teams dataset to only include years between 1953 and 2016. 
Teams % filter(yearID %in% 1953:2016) 
ii. Next, we will partition the rows of the presidential dataset by only considering the 
year of each president’s start and end dates with the conditions that 1) if a president’s 
term did NOT start in January, then we will not include that year in their time in office, 
and 2) if a president’s term ended in January, then that ending year will also not be 
included. For example, Johnson will be considered as having a starting year of 1964 
and an ending year of 1968. 
checkstartyear%group_by(yearID)%>%summarise(homeruns=sum(H)) 
 
homecount% filter(yearID %in% x:y) 
  hc=sum(temp$homeruns) 
  return(hc) 
} 
 
psd$homeruns 
## 1 Bush2 
4. Creating HTML Page; In this problem we would like to create a basic HTML page. 
Please follow each of the steps below and finally submit your HTML file on Canvas. 
Please note that you don’t need to answer these questions here in the .Rmd file. 
a. Open a notepad or any plain text editor.

STAT 4410/8416 Homework 4 STAT 4410/8416 Homework 4 lastName firstName Due on Nov 8, 2019 1. Exploring XML data; In this problem we will read the xml data. For this we will obtain a xml data called...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment