Microsoft Word - Assessment-4-T2-2022.docx : . Assessment 4 XXXXXXXXXXSubmission Instructions a) Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and...

1 answer below »

Microsoft Word - Assessment-4-T2-2022.docx
: .
Assessment 4 XXXXXXXXXXSubmission
Instructions
a) Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and
explanations including outputs and figures into a separate file and submit as a PDF file.
) Submission other than the above-mentioned file formats will not be assessed and given zero for the
entire submission.
c) Insert your Python code responses into the cell of your submitted “.ipynb” file followed by the
question i.e., copy the question by adding a cell before the solution cell. If you need multiple cells for
etter presentation of the code, add question only before the first solution cell.
d) Your submitted code should be executable. If your code does not generate the submitted solution,
then you will get zero for that part of the marks.
e) Answers must be relevant and precise.
f) No hard coding is allowed. Avoid using specific value that can be calculated from the data provided.
g) Use all the topics covered in the unit for answering this assignment.
h) Submit your assignment after running each cell individually.
________________________________________________________________________________
Questions
________________________________________________________________________________
Background
In this project you are given a dataset and an article that uses this dataset. The authors have developed eight ML
models for cyber security intrusion detection and compared their performance. You must read the article to
understand the problem, the dataset, and the methodology to complete the following tasks.

Dataset
NSL-KDD dataset has been developed to solve problems in KDD 99 challenge. It does not contain unnecessary
and repetitive records according to the original KDD 99 data set. A detailed description of the dataset can be found in
the Dataset section of the provided article. You can also use other sources for better understanding the dataset and answer
questions.
Please use the provided dataset “Intrusion_detection_NSL_KDD.csv” for answering the questions and DO NOT
DOWNLOAD AND USE dataset from any other sources. Use the file “FieldNames.pdf” for pre-processing the independent
and target variables BEFORE ANSWERING any questions.
Tasks:
1. Read the article and reproduce the results (Accuracy, Precision, Recall, F-Measure) for NSL-KDD dataset
: .
using following classification methods: XXXXXXXXXX marks)
o SVM Linear
o SVM Quadratic
o SVM Cubic
o KNN Fine
o KNN Medium
o KNN Cubic
o TREE Fine
o TREE Medium
These results can be found in Table 4 of the manuscript and should be used for comparison purposes, if required.
Write a report summarising the dataset, used ML methods, experiment protocol and results including variations,
if any. During reproducing the results:
i) you should use the same set of features used by the authors.
ii) you should use the same classifier with exact parameter values. iii) you should use the
same training/test splitting approach as used by the authors. iv) you should use the same
pre/post processing, if any, used by the authors.
[N.B. Definition of used algorithm can be found in this link: https:
au.mathworks.com/help/stats/choose-
aclassifier.html. However, your submission must be in python not in Matlab.]

N.B.
(i) If you find any issue in reproducing results due to incomplete description of model in the provided article
then make your own assumption and explain the reason. If your justification is co
ect then your solution will be
considered co
ect and assessed accordingly.
(ii) If you find some subtle variations in results due to implementation differences of methods used in the study
i.e., packages and modules in Python vs Matlab implementation, then appropriate explanation of them will be
considered during evaluation of your submission.
(iii) Similarly, variation in results due to randomness of data splitting will also be considered during
evaluation based on your explanation.
(iii) Obtained marks will be proportional to the number of ML methods that you will report in your submission
with co
ectly reproduced results.
(iv) Make sure your submitted Python code segment generates the reported results, otherwise you will receive
zero marks for this task.

Marking criteria:
i) Unsatisfactory (x<4): tried to implement the methods but unable to follow the approach presented in
the article. Variation of marks in this group will depend on the quality of report.
ii) Fair (4<=x<5): appropriately implemented 50% of the methods presented in the article. Variation of
marks in this group will depend on the quality of report. iii) Good (5<=x<7): appropriately
implemented 70% of the methods presented in the article. Variation of marks in this group will depend
on the quality of report.
iv) Excellent(x>=7): appropriately implemented >=90% of the methods presented in the article. Variation of
marks in this group will depend on the quality of report.
: .
2. Design and develop your own ML solution for this problem. The proposed solution should be
different from all approaches mentioned in the provided article. This does not mean that you must have to
choose a new ML algorithm. You can develop a novel solution by changing the feature selection approach
or parameter optimisations process of used ML methods or using different ML methods or different
combinations of them. This means, the proposed system should be substantially different from the methods
presented in the article but not limited to only change of ML methods. Compare the result with reported
methods in the article. Write a technical report summarising your solution design and outcomes. The report
should include: XXXXXXXXXXmarks)
i) Motivation behind the proposed solution.
ii) How the proposed solution is different from existing ones.
iii) Detail description of the model including all parameters so that any reader can implement your model.
iv) Description of experimental protocol.
v) Evaluation metrics.
vi) Present results using tables and graphs.
vii) Compare and discuss results with respect to existing literatures.
viii) Appropriate references (IEEE numbered).

N.B. This is a HD (High Distinction) level question. Those students who target HD grade should answer this
question (including answering all the above questions). For others, this question is an option. This question aims
to demonstrate your expertise in the subject area and the ability to do your own research in the related area.

Marking criteria:

Quality of solution Quality of report Overall score
Unsatisfactory Unsatisfactory Unsatisfactory; Score<5
Unsatisfactory Fair Unsatisfactory; Score<7
Unsatisfactory Good Unsatisfactory; Score<10
Fair Unsatisfactory Unsatisfactory; Score<10
Fair Fair Fair; Score<12
Fair Good Fair; Score<14
Good Unsatisfactory Fair; Score <14
Good Fair Good; Score <16
Good Good Good; Score >=16
Quality of solution
• Unsatisfactory: an appropriate solution presented whose performance is lower than the reported lowest
performance in the article.
• Fair: an appropriate solution presented whose performance is at least better than the lowest performance
eported in the article.
• Good: an appropriate solution presented whose performance is better than the best reported performances
in the article.

Quality of report
: .
• Unsatisfactory: either the report does not include all criteria mentioned above or the quality of description
is poor.
• Fair: the report has included all criteria mentioned above with an average quality of description.
• Good: the report can be considered as a first draft for a publication.
3. Present your result in a 3 minutes video using PowerPoint slides/animation. (5 marks) Marking
criteria:
(i) Quality of audio presentation (ii)
Quality of slides/animation.
(iii) Completeness of the information.

Microsoft Word - FieldNames.docx
The data file contains the following independent attributes.

Field Names

duration: continuous.
protocol_type: symbolic.
service: symbolic.
flag: symbolic.
src_bytes: continuous.
dst_bytes: continuous.
land: symbolic.
wrong_fragment: continuous.
urgent: continuous.
hot: continuous.
num_failed_logins: continuous.
logged_in: symbolic.
num_compromised: continuous.
oot_shell: continuous.
su_attempted: continuous.
num_root: continuous.
num_file_creations: continuous.
num_shells: continuous.
num_access_files: continuous.
num_outbound_cmds: continuous.
is_host_login: symbolic.
is_guest_login: symbolic.
count: continuous.
srv_count: continuous.
se
or_rate: continuous.
srv_se
or_rate: continuous.
e
or_rate: continuous.
srv_re
or_rate: continuous.
same_srv_rate: continuous.
diff_srv_rate: continuous.
srv_diff_host_rate: continuous.
dst_host_count: continuous.
dst_host_srv_count: continuous.
dst_host_same_srv_rate: continuous.
dst_host_diff_srv_rate: continuous.
dst_host_same_src_port_rate: continuous.
dst_host_srv_diff_host_rate: continuous.
dst_host_se
or_rate: continuous.
dst_host_srv_se
or_rate: continuous.
dst_host_re
or_rate: continuous.
dst_host_srv_re
or_rate: continuous.
In addition, there is a target attribute named “attack_type”, which is a categorical variable with 22
different values. Before using this as a target variable, this one needs to be mapped into four types (dos,
u2r, r2l and probe), as mentioned in Table 3 of the article, using following mapping

ass-4-eqqrycct.pdf fieldnames-bhnnjzve.pdf intrusiondetectionnslkdd-p5llbsiv.csv machinelearningmethodsforcybersecurityintrusiondetectiondatasetsandcomparativestudy-knrr0e4v.pdf

Answered 4 days After Oct 02, 2022

Solution

Raavikant answered on Oct 06 2022

63 Votes

SOLUTION.PDF

Microsoft Word - Assessment-4-T2-2022.docx : . Assessment 4 XXXXXXXXXXSubmission Instructions a) Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment