Financial Data ScienceTask-1 (Session-1, 2021)
Financial Data ScienceTask-1 (Session-1, 2021)
Assignment on Financial Data Science
Total Marks: 100
Submission Deadline: 11:59pm, 6 April 2021
General Instructions
• This assignment has two parts.
• Part-I is on theoretical background and descriptive analysis and Part-II is machine learning,
specifically, classification models.
• You have been assigned a company to work with (look at ‘Assignment_afin8015_s1_2021.xlsx’
on ilearn), you must work on the company listed against your name as the companies in you
analysis.
• The data time period used in the assessment will be from 1 July 2018 to 28 Feb 2021.1
• Both parts must be documented in one document as Part-I and Part-II.
• The assignment requires submission of your working R code files
– All data files used in the code must be submitted.
– The code must be included in the appendix of the document and an R code file should
e uploaded.
– You are not required to use the R Markdown format for this assignment but you may
use it if you choose to.
• Your individual paper must not exceed 12 A4 pages of 11pt font size with 2 spacing. This
excludes any appendices, tables and lengthy R output you may elect to incorporate in the
eport.
• The word count mentioned in the questions is the maximum word count and excludes any
figures and/or tables.
• Marks will be awarded for depth of coverage, quality of insight, succinctness and accuracy of
answers.
• Marks will be deducted for poorly informed reports which lack proper formatting, referencing
etc. Following deduction will apply
– No references (in-text and end text), includes reference to data source: -10
1 Inform the unit convenor ASAP if the data time period or OHLC data is not available for the
company allocated to you.
1
2
– No coversheet: -5
– Illegible presentation: -10
– Lack of informed research: -10
– Plagiarism will be dealt according to the university policy and a high similarity score will
e penalized.
• The discussion must be informed by research and the report must cite all the sources.
• Both, in-text and end text citations are required. End text references are excluded from the
page limit. Use one citation style, either APA or Harvard.
• FACTSET is the prefe
ed data source for the assignment along with publicly available in-
formation from company website and ASX and the sources mentioned in this document.
• Assignment (document) must include a cover sheet.
• A sample coversheet is provided on ilearn, you may choose to use it.
Please contact your unit convenor well before the submission deadline for any clarifications you may
need on the assignment instructions. You may also post your questions on the discussion forum.
3
Assignment Questions
Scenario: You are an intern data scientist at Shootingformars Corp. and have data analysis and
machine learning skills, particularly in the financial service sector. As it happens, you
mentor Ms Fowler has just been approached by Mr Hofstadter, a new client who has
ecently started investing in the Share Market and has been using some past information
to make his trading decisions on a daily basis. Mr Hofstadter has also been researching
during his spare time and has heard that modern Data Science methods such as Machine
Learning can be used to predict the price direction for stocks and other financial assets.
Unfortunately, he has limited understanding of the Data Science process and limited
programming skills, he does have some background in statistics.
After the initial meeting with Mr Hofstadter, Ms Fowler has decided to treat this as a
educational/proof of concept project and
ought you on board to conduct the analysis
and prepare the documentation for the project.
Ms Fowler has assigned a publicly trading stock listed in ‘Assignment_afin8015_s1_2021.xlsx’
and given you a set of tasks as listed in Part-I and Part-II of this document. Part-I is
aimed to assist Mr Hofstadter in developing a better understanding of the Data Science
and Descriptive Statistics using statistics and visualisation. Part-II of the task is to
use the stock assigned to you and conduct a classification exercise for demonstration.
The task requires you to create a professional standard document to be presented to the
client. You have been given a choice of either using a traditional workflow of creating
and word document and R for coding the methods separately and then
ing them all
together in one document or use a reproducible method with an RMarkdown file.
Part I. Data Science Concepts & Descriptive Analysis
1 Task-1 (3+7=10 Marks)
1. Explain the concept of Data Science.
(3 marks)
2. Outline and explain the Life Cycle of a Data Science Project. Use example(s) from the financial
service sector domain.
(7 marks)
Go beyond the text book and in-class resources to include recent developments and explain the concept
with Financial Service Sector as the main domain. All references must be cited.
2 Task-2 (10 Marks)
1. Use FACTSET and download the daily Open, High, Close (OHLC) Prices and Trading Volume
for the company stock assigned to you from 01-July-2018 to 28-Feb-2021.
(2 marks)
3 Task-3 (10 Marks) 4
2. Use the closing prices and percentage logarithmic returns of the closing prices to generate
descriptive statistics (including Skewness, Kurtosis and Test for Normal Distribution). Present
the statistics in the document and
iefly discuss the range, distribution and tail behaviou
of the price and return series. Keep the discussion
ief and to the point, remember you
client has some statistical background an understanding of the stock market. (Word limit:
250 words)
(8 marks)
3 Task-3 (10 Marks)
1. Plot and present the closing prices and log returns using ggplot2 package in R. Hint: One way
is to extract the dates and closing prices and returns in a data frame and convert it from wide
to long.
(2 marks)
2. Use the last 6 months OHLC prices and the Volume data to plot the following charts :
(a) Line Chart
(b) Candlestick Chart:
(c) Add the following Technical indicators to the candlestick chart
i. 5 Day Simple Moving Average
ii. 10 Day Exponential Moving Average
iii. 5 Day Momentum
iv. Bollinger Bands
(6 marks)
3. Comment on the trend and price direction based on the plots generated in 1 and 2 above.
(Word limit: 150 words)
(2 marks)
Part II. Classification Models & Application
4 Task-4 (10 Marks)
1. As Mr Hofstadter has limited exposure to Machine Learning (ML) and various methods in ML,
you are tasked to conduct a short review of ML and ML methods with a focus on Classification
models. Your review should also include the following
(a) An overview of Machine Learning.
(b) Discussion on three (at least) different classification methods.
(c) As the modelling task requires to conduct a price direction forecast exercise, the re-
view should also include examples of previous research using ML for stock price move-
ment/direction prediction.
5 Task-5 (60 Marks) 5
Go beyond the text book and in-class resources to include recent developments and research. All
eferences must be cited. (Word Limit: 300 words)
(10 marks)
5 Task-5 (60 Marks)
Your final task is to conduct a proof of concept comparative analysis of two classification methods to
demonstrate classification and predictive ability of ML methods in modelling and predicting the price
direction based on various technical indicators. Specifically, the task should conduct the following:
1. Select the closing prices from the OHLC stock price data downloaded from FACTSET (same
as in Task-2) and create the following Technical Indicators2.
(a) Moving Average: 5 day and 10 day and their one period lag
(b) Log returns and its one period lag
(c) MACD (default values for nFast, nSlow and nSig)
(d) Exponential Moving Average: 5 day and 10 day
(e) Momentum: 5 day
(f) Volatility: 5 day
(g) A price direction indicator based on 3 day lagged price
1→ Pt ≥ Pt−3
0 otherswise
(14 marks)
2. Combine the indicators in a data frame and visualise the data using
(a) A time series plot, and
(b) Box plots of indicators categorised by price direction
(6 marks)
3. Create a 70:30 training and testing sample from the dataset and conduct a classification exercise
using Logistic Regression. The analysis should include the following:
(a) Training on the training sample using a ‘timeslice’ sampling. Use at least 250 days as
window size and 14 days for prediction horizon in a fixed window.
(b) Data pre-processing to standardise the data.
(c) Prediction on the test set and co
esponding confusion matrix.
(d) Brief discussion on the accuracy of the prediction based on the confusion matrix.
(20 marks)
2 Hint: Use the TTR and quantmod package
5 Task-5 (60 Marks) 6
4. Conduct the classification exercise (in 3 above) using k-Nearest Neighbours algorithm. The
analysis should include the following:
(a) A odd number grid search for the ‘k’ parameter from 1 to 99.
(b) Prediction on the test set and co
esponding confusion matrix.
(c) Brief discussion on the accuracy of the prediction based on the confusion matrix.
(15 marks)
5. Compare the performance of the Logistic Regression Model and k-NN model based on thei
accuracy and provide a recommendation for Mr Hofstadter. (Word Limit: 150 words)
(5 marks)
Your final report must include both Part-I and Part-II and must contain the output from the
analysis conducted in R. Final code and data files must be submitted on the relevant links on
ilearn.
**End of Assignment Questions**
I Data Science Concepts & Descriptive Analysis
Task-1 (3+7=10 Marks)
Task-2 (10 Marks)
Task-3 (10 Marks)
II Classification Models & Application
Task-4 (10 Marks)
Task-5 (60 Marks)
https:
login.factset.com/login/xoM4jTODNt1DQZiEFfv2epjgwEy_90OY4PMHmCT4nVWBNoOUq5l1gngk-6goz21COd
ZuVD15k0pCIYD-6IuetP5fqblSARR2M8A421ykGPyC-oHSj33Viko7AhGs_Su1bsGqV_D10OhjaPmrmF3dE3eaRChh3t/xoM55/uVDd2
you will get authentication code of FACTSET on this email
XXXXXXXXXX
password: bosco411#
Hey, you have to use FACTSET for the data source
It’s part of the assignment
I am sending you username and password for FACTSET so you can download data
Please tell the tutor to use these login details of FACTSET for downloading data asked in the assignment.
username: ajay.yadav2
password: Ajayyadav30#
https:
login.factset.com/login/xoM4jTODNt1DQZiEFfv2epjgwEy_90OY4PMHmCT4nVWBNoOUq5l1gngk-6goz21COd
ZuVD15k0pCIYD-6IuetP5fqblSARR2M8A421ykGPyC-oHSj33Viko7AhGs_Su1bsGqV_D10OhjaPmrmF3dE3eaRChh3t/xoM55/uVDd2
a one time code will be generated on my email once you login
I can send the email id and password for the account where the code will be sent so you don't have to wait for my reply to get the code
XXXXXXXXXX
password: bosco411#
you will get authentication code of FACTSET on this email