Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Overview and Rationale Being able to ask appropriate questions of data is an important part of the work of data analytics. It is also critical to be able to interpret the results of the analysis. This...

1 answer below »







Overview and Rationale
Being able to ask appropriate questions of data is an important part of the work of data analytics. It is also critical to be able to interpret the results of the analysis. This assignment is intended to familiarize you with the data sets and to get you thinking about key business questions you can ask and answer from this data.
This project will help you measure your understandings of basic concepts on analytics.
It will help you measure your skills to obtain basic descriptive statistics from a data set.
It will help you measure your skills to prepare graphical displays of your data analysis results.
It will help you measure your skills on R, R Studio and R Markdown.
It will help you measure your skills to apply critical thinking to make meaningful observations of your data analysis results.
Initial analysis of the data set (Get to know your data)
The Excel data set file Midterm_dataset.xlxs contains 1,000 observations on Global Sales from a Store Company. The variables names are self-explanatory.
Save the data set inside your ALY2010 R Project/DataSets folder.
Open the data set on Excel to perform an initial overview, observe all variables and number of observations.
The data contains five numerical variables you will use: Sales, Quantity, Discount, Profit, Shipping Cost.
Important: Use Excel Only to observe your data, but be very careful of not introducing changes to the data set.
    
1. Open the ALY2010 R Project you created in R Studio.
If you need a review, observe the co
esponding section in my website https:
pubs.com/Dee_Chiluiza/home .
2. Create a new R Markdown file.
3. If you followed my instructions at the beginning of the class, you should be able to access the DataSets folder from the Files tab.
4. On the R Markdown file, create a first R chunk to introduce the codes for all li
aries and data sets you use. Do not include install packages codes. If you need to install packages, do it directly on the console. Below you have an example of that R Chunk:

5. When you import the data set, change its name, just make sure to choose a short name that is easy to use.

Now you are ready to start working with your data set.
Important: There are five (5) numerical variables in the data set, before you click “Import,” be sure to select numeric for continuous and integer for discrete data.
    
Report starts here
Title
1. Title: Present a good title for your report.
Introduction
2. Introduction: Present a well informative introduction section, this will measure your understanding of the topic and analytical processes for data analysis:
Your introduction needs good information and good organization. This applies for any report you make. Try to separate each topic in a paragraph.
· General topic: Show your understanding of the topic related to the data set, in this case, sales, retail marketing, and anything related to the business: corporations, global market, importance of analytics for this industry, etc. You choose the topic you want to present to your audience. As a guide, write a paragraph of 5 to 8 lines for this topic. Here are some examples of references you can read, it is not required, it is a suggestion:
Ali, Fareeha. January 21, 2021. US ecommerce grows 44.0% in 2020. Digital Commerce 360. https:
www.digitalcommerce360.com/article/us-ecommerce-sales/
Global powers of retailing 2020. Deloitte. https:
www2.deloitte.com/content/dam/Deloitte/f
Documents/consume
usiness/Publications/deloitte_global-powers-of-retailing-2020.pdf
· Analytics: Describe the importance of data analysis in the industry
· Descriptive and inferential statistics: Clearly describe each topic, and explain their differences.
· Data set description: Briefly mention the nature of the data set you are about to use.
· Problem identification: Imagine that you work for this company and you are given this data set. Based on the information, what questions would you ask to the data to improve the company performance? Would you focus on profits, market size, product category, global sales, shipping cost? You choose the aspect is interesting to you.
· Plan: Briefly describe your plan to address the problem, in this case, the analytical and visualization tools you plan to use.
· Use references to support each topic.
Remember to be organized, prepare each topic in a separate paragraph. Each topic will be reviewed and graded.
Analysis section
All R codes must be presented on your report. For each task, remember to follow these rules:
1. Add a number and a tile to each task.
2. Describe the task.
3. Do the task.
4. Make observations of the task results (you can mention and explain the codes you used, list the results you obtained, and very important, interpret the results; mention anything you consider relevant).
Task 1. Descriptive statistics of numerical variables.
Prepare all codes inside one single R Chunk. All codes should be stored using the names, do not let codes to be unnecessarily presented, control them. Present only what is requested.
Present a summary of the five numerical variables.
Create objects to obtain mean, median, standard deviation, and range (calculated using max - min).
Use those objects names to create and present a 5x4 table.
Use Matrix as shown in https:
pubs.com/Dee_Chiluiza/vectors_matrix.
Create a vector for the column names.
Create a vector for the row names.
Present this table on your report. Do not present a raw table, use a li
ary such as kable(), see my website.
Task 2. Numerical variables
You can use the following r chunk with par() code to present the two figures together; just fill the boxplot() and hist() codes.

2.1 Inside an R Chunk and prepare codes to present a horizontal box plot and a histogram to display the data of variable “Sales”. Remember to provide a professional presentation to your graphs. On the histogram, increase the number of
eaks.
Write a
ief summary of the data results you obtained.
2.2 Inside an R Chunk and prepare codes to present a horizontal box plot and a histogram to display the data of variable “Discount”. Remember to provide a professional presentation to your graphs. On the histogram, increase the number of
eaks.
Write a
ief summary of the data results you obtained.
2.3 Inside an R Chunk and prepare codes to present a horizontal box plot and a histogram to display the data of variable “Profit”. Remember to provide a professional presentation to your graphs. On the histogram, increase the number of
eaks. Notice that profits have negative values.
Write a
ief summary of the data results you obtained.
2.4 Inside an R Chunk and prepare codes to present a horizontal box plot and a histogram to display the data of variable “ShippingCost”. Remember to provide a professional presentation to your graphs. On the histogram, increase the number of
eaks.
Write a
ief summary of the data results you obtained.
Task 3. Categorical variables
For these three tasks you will use tables, bar plots and pie charts to show how many observations there are for each category. Remember that for this task, all you need to do is to first create a table with the variable, this will allow you to see the observations per category, then use the table to create the two graphs.
Create the tables but DO NOT present them, present only the bar plot and pie charts.
Use code par(mfrow = c(1,2)) at the beginning of your R Chunk.
Each plot must have a good presentation, including colors and the data values on top of each bar.
See my website for additional information: https:
pubs.com/Dee_Chiluiza/800800

3.1 Create a table then a bar plot and a pie chart to display the counts of each category of variable “Ship mode.”
3.2 Create a table then a bar plot and a pie chart to display the counts of each category of variable “Market.”
3.3 Create a table then a bar plot and a pie chart to display the counts of each category of variable “Company segment.”

Sort the bars, when you enter the data you can use: sort(your table name), decreasing = TRUE) Make an horizontal bar plots if you prefer.
Use las = 1 to turn y-axis labels horizontal.
Use cex.names = 0.5 to reduce the size of the labels in the y-axis.
Start the R chuck with par(mai=c(0.6,1.4,1,0.4)) to increase left margin (1.4) and display long names.

For each task, write a summary of the results you obtained.
I’m helping you to visualize the presentation of this data, observe the image below (created using a different data set). Notice the use of the par(mfrow()) code combination. Notice the table, it was created using a name, but it is not presented. The two graphs (bar, pie) were created using basic codes, you must improve their presentation.

Task 4. Data analysis.
In this task you will combine one categorical variable with one numerical variable.
Hint: Use this code to combine variables: name = tapply(numerical variable name, INDEX = categorical variable name, FUN = function you want to use) In FUN = use mean, median, sd, sum, etc., depending on the question.
After applying tapply(), you will use the object’s name you chose to present a table and then a bar plot of the data.
4.1 Which Company department has the highest Profit?
Using tapply() , create an object to combine and store Profit (numerical) with Comp_Department (categorical).
In this case, use FUN =sum to calculate all profits per market.
· Transform the object into a data frame and present it as a table using kable().
· Display the data using a horizontal bar plot.
· Finally, make observations of the figure you obtained.
4.2 What was the mean sales per Region? Repeat the steps you used in 4.1
Write a
Answered 8 days After Mar 09, 2022

Solution

Suraj answered on Mar 16 2022
104 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here