---

title: "Report"

author: "Untitled"

date: "17 December 2019"

output:

word_document: default

pdf_document: default

html_document:

df_print: paged

---

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

```

# Import the original data:

```{r}

load("ACS_2017_MD.Rdata")

acs = ACS_2017_MD

dim(acs)

```

The raw data seems to be huge as the dimension suggests. It has 59463 observations and 510 variables. Since we are interested in a particular research question, the entire data is not necessary to conduct the required analysis. Hence a research question is first set up and then the necessary variables are taken according to the question and hence we analyse that smaller data.

# Research question:

How do variances of wages can be described across different household types by average age of the people in the household?

# Smaller Subset data:

Since the question here involves the question regarding the wage and the age of the people across different household types. Hence we need those variables only for the entire analysis. We proceed to use some statistical tests and a regression model to justify the relationships between the variables with the dependent variable being Wage and the other two variables age and household type being the independent ones. We take a look at the data to have a better idea before any further analysis.

# Structure of the data:

```{r}

acs = acs[,c(10,72,342)]

str(acs)

summary(acs)

```

One can see that the data contains the required three variables but the household type instead of being a factor variable comes out to be a numeric one. Hence one should keep that in mind and take the factor...

