Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

ICT110 Introduction to Data Science Task 2 Semester 1, 2018 ICT110 Introduction to Data Science Assignment 2 Page 2 of 7 Assessment and Submission Details Marks: 30% of the Total Assessment for the...

1 answer below »
ICT110

Introduction to Data Science


Task 2



Semester 1, 2018

ICT110 Introduction to Data Science Assignment 2
Page 2 of 7
Assessment and Submission Details

Marks: 30% of the Total Assessment for the Course

Due Date: 11:59pm Friday, Week 12

Submit your assignment to Blackboard Task 2. Please follow the submission instructions on
Blackboard.

The assignment will be marked out of a total of 100 marks and forms 30% of the total
assessment for the course. ALL assignments will be checked for plagiarism by SafeAssign
system provided by Blackboard automatically.

Refer to your Course Outline or the Course Web Site for a copy of the “Student Misconduct,
Plagiarism and Collusion” guidelines.

Assignment submission extensions will only be made using the official Faculty of Arts,
Business and Law Guidelines.

Requests for an extension to an assignment MUST be made to the course coordinator prior to
the date of submission and requests made on the day of submission or after the submission
date will only be considered in exceptional circumstances.


ICT110 Introduction to Data Science Assignment 2
Page 3 of 7
Background

A research team planned to study the heath development of the world in the past 15 years.
The team retrieved the dataset from World Bank (http:
databank.worldbank.org) about
Health and Population Statistics between 2001 and 2015.

The dataset contains the following attributes:
• Birth rate, crude (per 1,000 people)
• Fertility rate, total (births per woman)
• Adolescent fertility rate (births per 1,000 women ages 15-19)
• Death rate, crude (per 1,000 people)
• Cause of death, by communicable diseases and maternal, prenatal and nutrition conditions
(% of total)
• Cause of death, by injury (% of total)
• Cause of death, by non-communicable diseases (% of total)
• Mortality caused by road traffic injury (per 100,000 people)
• Health expenditure per capita (cu
ent US$)
• GNI per capita, Atlas method (cu
ent US$)
• Health expenditure, private (% of GDP)
• Health expenditure, public (% of GDP)
• Health expenditure, total (% of GDP)
• Maternal mortality ratio (national estimate, per 100,000 live births)
• Immunization, BCG (% of one-year-old children)
• Life expectancy at birth, male (years)
• Life expectancy at birth, female (years)
• Life expectancy at birth, total (years)
• School enrollment, primary (% gross)
• School enrollment, secondary (% gross)
• School enrollment, tertiary (% gross)
• School enrollment, tertiary, female (% gross)
• Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years
of age)
• Unemployment, female (% of female labor force) (modeled ILO estimate)
• Unemployment, male (% of male labor force) (modeled ILO estimate)
• Unemployment, total (% of total labor force) (modeled ILO estimate)
More details about the data attributes and data content can be found in the attached
documents.

Assignment Task

You are a member of the team, and need to perform data analysis on countries in the region
of East Asia & Pacific.

The team has not set any specific goal for the analysis. Therefore, you have the freedom to
explore the data, and dig out anything you feel interesting or significant.
http:
databank.worldbank.org
ICT110 Introduction to Data Science Assignment 2
Page 4 of 7

You have been requested to prepare a data analysis report about your work and explain your
findings. The potential audiences include other researchers, business representatives, and
government agencies. They may have limited ICT or mathematical knowledge.

To prepare the report, please follow the following outline:

1. Introduction
Provide an introduction to the problem. Include background material as appropriate: who
cares about this problem, what impact it has, where does the data come from.

2. Data Setup
Describe how to load the data, and the li
aries needed. Provide an overview of the data
about its dimensions and structures.

3. Exploratory Data Analysis
Perform 3 one-variable analysis. Plot at least one graph for each variable. Explain why the
selected graph is appropriate.

Perform 2 two-variable analysis. Plot at least one graph for each variable. Explain why the
selected graph is appropriate

The analysis can be performed on all years and all countries, or on a subset of your interest.

4. Advanced Analysis
4.1 Clustering
Briefly explain the concept of clustering and k-means.
Try to do a clustering analysis to group countries according to some selected attributes.

4.2 Linear Regression
Briefly explain the concept of linear regression.
Try to do 2 linear regression analysis. Plot the learned models.

The analysis can be performed on all years and all countries, or on a subset of your interest.

5. Conclusion

6. Reflections
In this part, discuss any difficulties you had performing the analysis and how you solved
those difficulties. Reflect on how the analysis process went for you, what you learnt, and
what you might do differently next time.

For the data analysis, you need to provide both R code, and the explanation to the code and
the result. For the section 2 – 4, please represent each R code snippet in a box with some
comments. For example:
# Draw a boxplot on the attribute “Income”
oxplot(MyData$income)


The following guidelines will be used in marking each section of the assignment:
surajrimal
Sticky Note
ask the question which country has the high birth rate and other 2 and make the bar graph and describe
surajrimal
Sticky Note
1 qnd 2 anayalysis
ICT110 Introduction to Data Science Assignment 2
Page 5 of 7

100% 90% 75% 65% 50% 25% 0
Outstanding: High
Distinction:
Distinction: Credit: Pass: Fail: Not
Submitted:
An outstanding
attempt – well
formatted and
professionally
presented
piece of work.
An excellent
piece of work
that meets all
the specified
criteria with
very minor
omissions or
mistakes
More than
competently
meets the
criteria
specified with
only minor
mistakes or
omissions.
Competently
meets the
criteria as
specified with
few minor
mistakes or
omissions.
Satisfactorily
meets the
criteria.
Did not
sufficiently
meet the
criteria to
pass.
No attempt
made or
different from
what is
acceptable


Report Format

Your report should be no less than 1,200 words and it would be best to be no longer than
2,000 words long. All comments and graph titles are counted.

The report MUST be formatted using the following guidelines:
• Paragraph text – 12 point Cali
i single line spacing
• Headings – Arial in an appropriate type size
• Margins – 2.5cm on all margins
• Header – Report title
• Footer – page number (including the word “Page”)
• Page numbering – roman numerals (i, ii, iii, iv) up to and including the Table of
Contents, restart numbering using conventional numerals (1, 2, 3, 4) from the first
page after the Table of Contents.
• Title Page – Must not contain headers or footers. Include your name as the report’s
author but DO NOT include any reference to your student ID, course code or course
name.
• The report is to be created as a single Microsoft Word document. No other format is
acceptable and doing so will result in the deduction of marks.
For advise on report writing, the following book provides good advices:
Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia.

Referencing

2 references for the explanation of Clustering and 2 for linear regression are required. These
eferences should follow the Harvard method of referencing. Note that ALL references
should be from journal articles, conference papers, technical papers or a recognized expert in
the field. DO NOT use Wikipedia as a reference. The use of unqualified references will result
in the deduction of marks.

Submission

The completed assignment is to be submitted to Blackboard Task 2 by the due date of
11:59pm Friday, Week 12.

ICT110 Introduction to Data Science Assignment 2
Page 6 of 7
The assignment will be assessed according to the marking sheet which is shown in the last
page. Late submission will be penalised according to the policy in the course outline. Please
note Saturday and Sunday are included in the count of days late.

Assignment Return and Release of Grades

Assignment grades will be available on the course website in two weeks after the submission.
An electronic assignment marking sheet will be available at this time.
Where an assignment is undergoing investigation for alleged plagiarism or collusion the
grade for the assignment and the assignment will be withheld until the investigation has
concluded.

Assignment Guidelines

This assignment will take a number of weeks to complete and will require a good
understanding of data science and management for successful completion. It is imperative
that students take heed of the following points in relation to doing this assignment:

1. Ensure that you clearly understand the requirements for the assignment – what has to be
done and what are the deliverables.
2. If you do not understand any of the assignment requirements – Please ASK the course
coordinator or your tutor.
3. Each time you work on any aspect of the assignment reread the assignment requirements
to ensure that what is required is clearly understood.

ICT110 Introduction to Data Science
Answered Same Day May 31, 2020 ICT110 University of the Sunshine Coast

Solution

Abr Writing answered on Jun 05 2020
126 Votes
1. Introduction
We have a plan to study the heath development of the countries in the region of East Asia & Pacific. The data set was collected from World Bank (http:
databank.worldbank.org) about Health and Population Statistics. The indicators that are included in the study for health development are birth rate, crude (per 1000 people), Death rate, crude (per 1,000 people), Health expenditure, total (% of GDP), Life expectancy at birth, total (years) and Unemployment, total.

2. Data Setup
The data set is exported from excel to R. We need the package of cluster fork-means cluster analysis. The package can be loaded using “install package from local files” option after download from web. The data set has five variables on thirty seven countries from the region of East Asia & Pacific for the year 2014.
3. Exploratory Data Analysis
We analyze 3 one-variable by plotting a bar chart for each of the variable. As the variables are plotted against their country, bar chart will be the appropriate one. We can compare the variables with respect to country by analyzing the plots.
    #bar plot of Birth rate, Crude (per 1,000 people)
arplot(dat$BirthrateCrude,names.arg=dat$CountryCode,ylab="Birth rate, crude (per 1,000 people)")
    #bar plot of Death rate, Crude (per 1,000 people)
arplot(dat$DeathrateCrude,names.arg=dat$CountryCode,ylab="Death rate, crude (per 1,000 people)")
    #bar plot of Life Expectancy at Birth, Total
arplot(dat$LifeExpectancyatBirthTotal,names.arg=dat$CountryCode,ylab="Life expectancy at birth, total (years)")
It can found that the crude birth rate ranges from 7.9 to 38.7. Timor-Leste has the highest crude birth rate whereas Hong Kong SAR, China has the lowest crude birth rate as expected. Looking at the crude death rate, we find that it ranges from 2.99to 10. Japan has the highest crude death rate whereas Brunei Darussalam has the lowest crude birth rate. The life expectancy at birth ranges from 62.6 years to 93.98 years. Hong Kong SAR, China has the highest life expectancy at birth whereas Papua New Guinea has the lowest life expectancy at birth as expected.
The two-variable analysis is done with scatter plot. We have included three variables in the analysis: The Health expenditure, total (% of GDP)", Life expectancy at Birth (years) and crude death rate.
    #Scatter plot of Health expenditure, total (% of GDP)" and Life expectancy at Birth (years) ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here