Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

STAT 4410/8416 Homework 4 STAT 4410/8416 Homework 4 lastName firstName Due on Nov 8, 2019 1. Exploring XML data; In this problem we will read the xml data. For this we will obtain a xml data called...

1 answer below »
STAT 4410/8416 Homework 4
STAT 4410/8416 Homework 4
lastName firstName
Due on Nov 8, 2019
1. Exploring XML data; In this problem we will read the xml data. For this we will obtain a xml data
called olive oils from the link http:
www.ggobi.org
ook/data/olive.xml. Please follow the directions
in each step and provide your codes and output.
a. Parse the xml data from the above link and store in a object called olive. Obtain the root of the xml
file and display its name.
. Examine the actual file by going to the link above and identify the path of categorical variables in the
xml tree. Use that path to obtain the categorical variable names. Please keep the names, not nick
names and store them in cvNames. Display cvNames.
c. Now examine the file by going to the link and identify the path of real variables in the xml tree. Use
that path to obtain the real variable names. Please keep the names, not nick names and store them in
vNames. Display rvNames.
d. Notice the path for the data in xml file. Use that path to obtain the data and store the data in a data
frame called oliveDat. Change the column names as you have obtained the column names. Display
some data.
e. Generate a plot of your choice to display any feature of oliveDat data. Notice that the column names
are different fatty acids. The values are % of fatty acids found in the Italian olive oils coming from
different regions and areas.
f. Explain what these two lines of codes are doing.
<- xmlRoot(olive)
xmlSApply(r[[1]][[2]], xmlGetAttr, "name")
2. Working with date-time data; The object myDate contains the date and time when this question
was provided to you. Based on this object answer the following questions.
myDate <- " XXXXXXXXXX:50:21"
a. Convert myDate into a date-time object with Chicago time zone. Display the result.
. Write your codes so that it displays the week day of myDate.
c. What weekday is it after exactly 100 years from myDate? Show your codes and the answer.
d. Add one month with myDate and display the resulting date time. Explain why the time zone has
changed even though you did not ask for time zone change.
e. Suppose this homework is due on November 8, 2019 by 11.59PM. Compute and display how many
minutes you got to complete this homework?
3. Data Wrangling and Dates In this problem, we will be using the mdsr and Luhman packages.
a. Using the presidential dataset, show a simple table that displays the number of leap years that
occured during each president’s time in office. Please label the second “Bush” as “Bush2”.
. Consider the Teams dataset from the Luhman package that provides a series of baseball statistics
over a number of years. Note that the “H” column refers to number of home runs. The following
outlines a procedure to follow to determine the number of home runs that occu
ed during each
presidents’ (adjusted) time in office.
1
http:
www.ggobi.org
ook/data/olive.xml
i. First, filter the Teams dataset to only include years between 1953 and 2016.
ii. Next, we will partition the rows of the presidential dataset by only considering the yea
of each president’s start and end dates with the conditions that 1) if a president’s term did
NOT start in January, then we will not include that year in their time in office, and 2) if
a president’s term ended in January, then that ending year will also not be included. Fo
example, Johnson will be considered as having a starting year of 1964 and an ending year of
1968.
iii. Answer the question: Which president had the most number of home runs occur during thei
term? Report this number.
4. Creating HTML Page; In this problem we would like to create a basic HTML page. Please follow
each of the steps below and finally submit your HTML file on Canvas. Please note that you don’t need
to answer these questions here in the .Rmd file.
a. Open a notepad or any plain text editor. Write down some basic HTML codes as shown in online
(year 2014) Lecture 15, slide 6 and modify according to the following questions. Save the file as
hw4.html and upload on Canvas as a separate file.
. Write “What is data science?” in the first header tag, h1
c. Hw1 solution contains the answer of what is data science. The answer has three paragraphs.
Write the three paragraphs of text about data science in three different paragraph tags p>.
You can copy the text from hw1 solution.
d. Write “What we learnt from hw1” in second heading under tag h2
e. Copy all the points we learnt in hw1 solution. List all the points under ordered list tag ol>.
Notice that each item of the list should be inside list item tag li>.
f. Now we want to make the text beautiful. For this we would write some CSS codes in between
head
head> tag under style>. For this please refer to online (year 2014) lecture
15 slide 8. First change the fonts of the body tag to Helvetica Neue.
g. For the paragraph that contains the definition of data science, give an attribute id='dfn' and in
CSS change the color of ‘dfn’ to white, background-color to olive and font to be bold.
h. For other paragraphs, give an attribute >i. Write CSS so that color of h1,h2 becomes orange.
j. Write javaScripts codes so that onClick on h1 header, it shows a message ‘Its about data science’.
5. Boston hubway data; This question will explore Boston hubway data. Please carefully answer each
question below including your codes and results.
a. Obtain the compressed data, bicycle-rents.csv.zip, from Canvas and display few data rows.
. For each day, count the number of bikes rented for that date and show the data in a time series
plot.
c. Based on the rent date column, create two new columns weekDay and hourDay which represent
week day name and hour of the day respectively. Store the data in myDat and display few records
of the data. Hint: For weekday use function wday().
d. Summarize myDat by weekDay based on the number of rents for each weekDay and store the
data in weekDat. Display some data.
e. Create a suitable plot of the data you stored in weekDay so that it displays number of bike rents
for each week day.
f. Now we want to investigate what happens in each day. Summarize myDat again but this time by
weekDay and hourDay and obtain the number of rents. Store the data in hourDat and Display
some data.
g. The dataframe hourDat is now ready for plotting. Generate line plots showing number of bike
ents vs hour of the day and colored by weekDay.
6. Bonus for undergraduate (3 points) mandatory for graduate students: The following link
contains the complete texts of Romeo and Juliet written by Shakespeare. Read the complete text and
generate a plot similar to Romeo and Juliet case study in online(year 2014) lecture 13 (last plot).
http:
shakespeare.mit.edu
omeo_juliet/full.html
2
http:
shakespeare.mit.edu
omeo_juliet/full.html
7. Bonus (2 points) question for all : In the United States, a Consumer Expenditure Survey (CE)
is conducted each year to collect data on expenditures, income, and demographics. These data are
available as public-use microdata (PUMD) files in the following link. Download the data for the yea
2016 and explore. Provide some plots and numerical summary that creates some interest about this
data.
https:
www.bls.gov/cex/pumd.htm
3
https:
www.bls.gov/cex/pumd.htm
Answered Same Day Nov 11, 2021

Solution

Kshitij answered on Nov 13 2021
136 Votes
STAT 4410/8416 Homework 4
STAT 4410/8416 Homework 4
lastName firstName
Due on Nov 8, 2019
1. Exploring XML data; In this problem we will read the xml data. For this we will obtain
a xml data called olive oils from the link http:
www.ggobi.org
ook/data/olive.xml.
Please follow the directions in each step and provide your codes and output.
a. Parse the xml data from the above link and store in a object called olive. Obtain the
oot of the xml file and display its name.
li
ary("XML")
li
ary("xml2",lib.loc="~/R/win-li
ary/3.4")
li
ary("dplyr")
li
ary("ggplot2")
olive<-xmlParse("http:
www.ggobi.org
ook/data/olive.xml")

oot<-xmlRoot(olive)
xmlName(root)
## [1] "ggobidata"
olive <- read_xml("http:
www.ggobi.org
ook/data/olive.xml")
. Examine the actual file by going to the link above and identify the path of categorical
variables in the xml tree. Use that path to obtain the categorical variable names. Please
keep the names, not nick names and store them in cvNames. Display cvNames.
categoricalPath<-"
ggobidata/data/variables/categoricalvariable"

colsc <- xml_find_all(olive, categoricalPath)
cvNames<-xml_attr(colsc,"name")
cvNames
## [1] "region" "area"
c. Now examine the file by going to the link and identify the path of real variables in the
xml tree. Use that path to obtain the real variable names. Please keep the names, not
nick names and store them in rvNames. Display rvNames.
ealPath<-"
ggobidata/data/variables
ealvariable"
colsr <- xml_find_all(olive, realPath)
vNames<-xml_attr(colsr,"name")
vNames
## [1] "palmitic" "palmitoleic" "stearic" "oleic" "linoleic"
## [6] "linolenic" "arachidic" "eicosenoic"
http:
www.ggobi.org
ook/data/olive.xml
d. Notice the path for the data in xml file. Use that path to obtain the data and store the
data in a data frame called oliveDat. Change the column names as you have obtained
the column names. Display some data.
oliveDat <- xml_find_all(olive, "
ecord")
values <- strsplit((trimws(xml_text(oliveDat))),"\ +")
oliveDat<-lapply(values,function(x) {
data.frame(
ind(setNames(as.numeric(x),c(
cvNames,rvNames))))})
oliveDat<-do.call(
ind,oliveDat)
head(oliveDat)
## region area palmitic palmitoleic stearic oleic linoleic linolenic
## 1 1 1 1075 75 226 7823 672 NA
## 2 1 1 1088 73 224 7709 781 31
## 3 1 1 911 54 246 8113 549 31
## 4 1 1 966 57 240 7952 619 50
## 5 1 1 1051 67 259 7771 672 50
## 6 1 1 911 49 268 7924 678 51
## arachidic eicosenoic
## 1 60 29
## 2 61 29
## 3 63 29
## 4 78 35
## 5 80 46
## 6 70 44
e. Generate a plot of your choice to display any feature of oliveDat data. Notice that the
column names are different fatty acids. The values are % of fatty acids found in the
Italian olive oils coming from different regions and areas.
g1<-oliveDat[which(oliveDat[,2]==1),3:10]
data1<-stack(summarise_all(rg1,mean,na.rm=TRUE))
p<- ggplot(data1, aes(x="", y=(data1$values/sum(data1$values)*100), fill=dat
a1$ind))+
geom_bar(width = 0.5, stat = "identity")+
labs(y="percentage",fill="fats")+
ggtitle("average percentage of various acids found in the Italian olive oil
s coming from region 1")+
theme(plot.title = element_text(hjust=0.25))
p
g2<-oliveDat[which(oliveDat[,2]==2),3:10]
data2<-stack(summarise_all(rg2,mean,na.rm=TRUE))
p2<- ggplot(data2, aes(x="", y=(data2$values/sum(data2$values)*100), fill=da
ta2$ind))+
geom_bar(width = 0.5, stat = "identity")+
labs(y="percentage",fill="fats")+
ggtitle("average percentage of various acids found in the Italian olive oil
s coming from region 2")+
theme(plot.title = element_text(hjust=0.25))
p2
g3<-oliveDat[which(oliveDat[,2]==3),3:10]
data3<-stack(summarise_all(rg3,mean,na.rm=TRUE))
p3<- ggplot(data3, aes(x="", y=(data3$values/sum(data3$values)*100), fill=da
ta3$ind))+
geom_bar(width = 0.5, stat = "identity")+
labs(y="percentage",fill="fats")+
ggtitle("average percentage of various acids found in the Italian olive oil
s coming from region 3")+
theme(plot.title = element_text(hjust=0.25))
p3
f. Explain what these two lines of codes are doing.
<- xmlRoot(olive)
xmlSApply(r[[1]][[2]], xmlGetAttr, "name")
Answer: xmlRoot function finds the top level xml node of olive i.e. categorical variable and
eal variable names and passes it to r. the xmlSApply applies the given function to each of
the children of the given xmlnode. r[[1][2]] is the node representing real variables and
xmlGetAttr function extracts all the values for the "“name”" attribute from all it’s
anches.
"
2. Working with date-time data; The object myDate contains the date and time when
this question was provided to you. Based on this object answer the following
questions.
myDate <- "2019-10-30 19:50:21"
a. Convert myDate into a date-time object with Chicago time zone. Display the result.
li
ary("lu
idate")
myDate<-ymd_hms(myDate,tz="America/Chicago")
myDate
## [1] "2019-10-30 19:50:21 CDT"
. Write your codes so that it displays the week day of myDate.
weekdays(myDate)
## [1] "Wednesday"
wday(myDate)
## [1] 4
c. What weekday is it after exactly 100 years from myDate? Show your codes and the
answer.
tempDate<-myDate
year(tempDate)=year(myDate)+100
weekdays(tempDate)
## [1] "Monday"
wday(tempDate)
## [1] 2
d. Add one month with myDate and display the resulting date time. Explain why the time
zone has changed even though you did not ask for time zone change.
tempDate<-myDate
tempDate
## [1] "2019-10-30 19:50:21 CDT"...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here