Data TransformationData TransformationZhichao Jiang XXXXXXXXXXVisualisation is an important tool for...

Question

Data TransformationData TransformationZhichao Jiang XXXXXXXXXXVisualisation is an important tool for insight generation, but it is rare that you get the data in exactly the right form you need. Often you will need to create some new variables or summaries, or maybe you just want to rename the variables or reorder the observations in order to make the data a little easier to work with.1 Import data1.1 Working directoryR associates itself with a folder (i.e. directory) on your computer. To see which one, run getwd() at the console.    This folder is known as your “working directory”    When you save files, R will save them here    When you load files, R will look for them here2 Data transformationWhat geoms shoul be used for this graph?We will learn the five key dplyr functions that allow you to solve the vast majority of your data manipulation challenges:    Pick observations by their values (filter()).    Reorder the rows (aange()).    Pick variables by their names (select()).    Create new variables with functions of existing variables (mutate()).    Collapse many values down to a single summary (summarize()).These can all be used in conjunction with group_by() which changes the scope of each function from operating on the entire dataset to operating on it group-by-group. These six functions provide the ves for a language of data manipulation.All ves work similarly:    The first argument is a data frame (tile).    The subsequent arguments describe what to do with the data frame, using the variable names (without quotes).    The result is a new data frame.Together these properties make it easy to chain together multiple simple steps to achieve a complex result. Let’s dive in and see how these ves work.2.1 select()select(babynames,name,prop)## # A tile: 1,924,665 x 2##    name XXXXXXXXXXprop##          ##  1 Mary XXXXXXXXXX##  2 Anna XXXXXXXXXX##  3 Emma XXXXXXXXXX##  4 Elizabeth 0.0199##  5 Minnie XXXXXXXXXX##  6 Margaret XXXXXXXXXX##  7 Ida XXXXXXXXXX##  8 Alice XXXXXXXXXX##  9 Bertha XXXXXXXXXX## 10 Sarah XXXXXXXXXX## # … with 1,924,655 more rows2.2 Select helpers    use : to select range of columnsselect(babynames,name:prop)## # A tile: 1,924,665 x 3##    name XXXXXXXXXXn   prop##           ##  1 Mary XXXXXXXXXX##  2 Anna XXXXXXXXXX##  3 Emma XXXXXXXXXX##  4 Elizabeth XXXXXXXXXX##  5 Minnie XXXXXXXXXX##  6 Margaret XXXXXXXXXX##  7 Ida XXXXXXXXXX##  8 Alice XXXXXXXXXX##  9 Bertha XXXXXXXXXX## 10 Sarah XXXXXXXXXX## # … with 1,924,655 more rows    use - to select every column butselect(babynames,-c(name,prop))## # A tile: 1,924,665 x 3##     year sex       n##      ## XXXXXXXXXXF XXXXXXXXXX## XXXXXXXXXXF XXXXXXXXXX## XXXXXXXXXXF XXXXXXXXXX## XXXXXXXXXXF XXXXXXXXXX## XXXXXXXXXXF XXXXXXXXXX## XXXXXXXXXXF XXXXXXXXXX## XXXXXXXXXXF XXXXXXXXXX## XXXXXXXXXXF XXXXXXXXXX## XXXXXXXXXXF XXXXXXXXXX## XXXXXXXXXXF XXXXXXXXXX## # … with 1,924,655 more rows    use starts_with() to select columns start withselect(babynames,starts_with("n"))## # A tile: 1,924,665 x 2##    name XXXXXXXXXXn##         ##  1 Mary XXXXXXXXXX##  2 Anna XXXXXXXXXX##  3 Emma XXXXXXXXXX##  4 Elizabeth  1939##  5 Minnie XXXXXXXXXX##  6 Margaret   1578##  7 Ida XXXXXXXXXX##  8 Alice XXXXXXXXXX##  9 Bertha XXXXXXXXXX## 10 Sarah XXXXXXXXXX## # … with 1,924,655 more rows    use ends_with() to select columns end withselect(babynames,ends_with("e"))## # A tile: 1,924,665 x 1##    name     ##        ##  1 Mary     ##  2 Anna     ##  3 Emma     ##  4 Elizabeth##  5 Minnie   ##  6 Margaret ##  7 Ida      ##  8 Alice    ##  9 Bertha   ## 10 Sarah    ## # … with 1,924,655 more rows    use contains() to select columns containselect(babynames,contains("e"))## # A tile: 1,924,665 x 3##     year sex   name     ##          ## XXXXXXXXXXF     Mary     ## XXXXXXXXXXF     Anna     ## XXXXXXXXXXF     Emma     ## XXXXXXXXXXF     Elizabeth## XXXXXXXXXXF     Minnie   ## XXXXXXXXXXF     Margaret ## XXXXXXXXXXF     Ida      ## XXXXXXXXXXF     Alice    ## XXXXXXXXXXF     Bertha   ## XXXXXXXXXXF     Sarah    ## # … with 1,924,655 more rows    use num_range() to select named in prefix, number styleselect(babynames,num_range("x",1:5))## # A tile: 1,924,665 x 02.3 $ and select()$ extracts columnn contents as a vector. select() extracts column contents as a tile.select(babynames, n)abynames$n2.3.1 Your turnWhich of these is NOT a way to select the name and n columns together?select(babynames, -c(year, sex, prop))select(babynames, name:n)select(babynames, starts_with("n"))select(babynames, ends_with("n"))2.4 filter()filter() allows you to subset observations based on their values. The first argument is the name of the data frame. The second and subsequent arguments are the expressions that filter the data frame. filter(babynames, name == "Gaet")## # A tile: 110 x 5##     year sex   name       n      prop##             ## XXXXXXXXXXM     Gaet XXXXXXXXXX## XXXXXXXXXXM     Gaet XXXXXXXXXX## XXXXXXXXXXM     Gaet XXXXXXXXXX## XXXXXXXXXXM     Gaet XXXXXXXXXX## XXXXXXXXXXM     Gaet XXXXXXXXXX## XXXXXXXXXXM     Gaet XXXXXXXXXX## XXXXXXXXXXM     Gaet XXXXXXXXXX## XXXXXXXXXXM     Gaet XXXXXXXXXX## XXXXXXXXXXM     Gaet XXXXXXXXXX## XXXXXXXXXXM     Gaet XXXXXXXXXX## # … with 100 more rows2.4.1 Missing valuesOne important feature of R that can make comparison tricky are missing values, or NA (“not availables”). NA represents an unknown value so missing values are “contagious”: almost any operation involving an unknown value will also be unknown.NA > 5## [1] NANA + 10## [1] NANA == NA## [1] NANA | FALSE## [1] NANA & FALSE## [1] FALSENA*0## [1] NAInf*0## [1] NaNIf you want to determine if a value is missing, use is.na() filter() only includes rows where the condition is TRUE; it excludes both FALSE and NA values. If you want to preserve missing values, ask for them explicitly.df le(x = c(1, NA, 3))filter(df, x > 1)## # A tile: 1 x 1##       x##   ## XXXXXXXXXXfilter(df, is.na(x) | x > 1)## # A tile: 2 x 1##       x##   ## 1    NA## XXXXXXXXXX2.4.2 Your turn    Use filter, babynames, and the logical operators to find:    All of the rows where prop is greater than or equal to 0.08    All of the children named “Sea”2.4.3 Boolean operatorsfilter(babynames, name == "Gaett", year == 1880)## # A tile: 1 x 5##    year sex   name XXXXXXXXXXn     prop##            ## XXXXXXXXXXM     Gaett XXXXXXXXXXfilter(babynames, name == "Gaett" & year == 1880)## # A tile: 1 x 5##    year sex   name XXXXXXXXXXn     prop##            ## XXXXXXXXXXM     Gaett XXXXXXXXXX

Pooja · Accepted Answer

library(tidyr)
library(dplyr)
library(ggplot2)
library(nycflights13)
library(openair)
nycflights13::flights
#summary(flights)
#View(flights)
#2a#
table1

Data Transformation Data Transformation Zhichao Jiang XXXXXXXXXX Visualisation is an important tool for insight generation, but it is rare that you get the data in exactly the right form you need....

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment