SIT720 Machine Learning Assessment Task 5: Machine Learning Project. ©Deakin University ...

Question

SIT720 Machine Learning Assessment Task 5: Machine Learning Project. ©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720 This document supplies detailed information on Assessment Task 5 for this unit. Key...

1 answer below »

SIT720 Machine Learning
Assessment Task 5: Machine Learning Project.

©Deakin University XXXXXXXXXX1 XXXXXXXXXXSIT720
This document supplies detailed information on Assessment Task 5 for this unit.
Key information
• Due: Sunday 10 October 2021 by 8.00 pm (AEST),
• Weighting: 35%
Learning Outcomes
This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning
Outcomes (GLO):
Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO)
ULO6 - Perform model selection and compute
elevant evaluation measure for a given problem.
ULO7 - Use concepts of machine learning algorithms
to design solution and compare multiple solutions.
GLO1 - Discipline-specific knowledge and
capabilities
GLO2 - Communication
GLO4 - Critical thinking
GLO5 - Problem solving
GLO6 - Self-management

Purpose
This assessment is an extensive machine learning project. The task is open in nature, where students should
make all design decisions to solve a problem and justify their decisions. In addition, they have to design and
develop solutions that are better than any existing solutions.

Assessment 5 XXXXXXXXXXTotal marks = 35

Submission Instructions
a) Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and
explanations including outputs and figures into a separate file and submit as a PDF file.
) Submission other than the above-mentioned file formats will not be assessed and given zero for the
entire submission.
c) Insert your Python code responses into the cell of your submitted “.ipynb” file followed by the
question i.e., copy the question by adding a cell before the solution cell. If you need multiple cells for
etter presentation of the code, add question only before the first solution cell.
d) Your submitted code should be executable. If your code does not generate the submitted solution,
then you will get zero for that part of the marks.
e) Answers must be relevant and precise.
f) No hard coding is allowed. Avoid using specific value that can be calculated from the data provided.
g) Use all the topics covered in the unit for answering this assignment.
h) Submit your assignment after running each cell individually.
i) The submitted notebook file name should be of this form “SIT720_A5_studentID.ipynb”. For example, if
your student ID is 1234, then the submitted file name should be “SIT720_A5_1234.ipynb”.

________________________________________________________________________________

SIT720 Machine Learning
Assessment Task 5: Machine Learning Project.

©Deakin University XXXXXXXXXX2 XXXXXXXXXXSIT720
Questions
________________________________________________________________________________

Background
In this project you are given a dataset and an article that uses this dataset. The authors have developed ten ML
models for predicting survival of patients with heart failure and compared their performance. You must read the
article to understand the problem, the dataset, and the methodology to complete the following tasks.
Dataset
The dataset contains the medical records of patients who had heart failure, collected during their follow-up period.
Each patient profile has 13 clinical features. A detailed description of the dataset can be found in the Dataset
section of the provided article (patient_survival_prediction.pdf).
Tasks:
1. Read the article and reproduce the results presented in Table-4 using Python modules and packages (including
your own script or customised codes). Write a report summarising the dataset, used ML methods, experiment
protocol and results including variations, if any. During reproducing the results: XXXXXXXXXX10 marks)
i) you should use the same set of features used by the authors.
ii) you should use the same classifier with exact parameter values.
iii) you should use the same training/test splitting approach as used by the authors.
iv) you should use the same pre/post processing, if any, used by the authors.
v) you should report the same performance metrics as shown in Table-4.
N.B.
(i) Some of the ML methods are not covered in the cu
ent unit. Consider them as HD tasks i.e., based on the
knowledge gained in the unit you should be able to find necessary packages and modules to reproduce the results.
(ii) If you find any issue in reproducing results or some subtle variations are found due to implementation
differences of packages and modules in Python then appropriate explanation of them will be considered during
evaluation of your submission.
(iii) Similarly, variation in results due to randomness of data splitting will also be considered during evaluation
ased on your explanation.
(iii) Obtained marks will be proportional to the number of ML methods that you will report in your submission
with co
ectly reproduced results.
(iv) Make sure your Python code segment generates the reported results, otherwise you will receive zero marks
for this task.
Marking criteria:
i) Unsatisfactory (x<5): tried to implement the methods but unable to follow the approach presented in
the article. Variation of marks in this group will depend on the quality of report.
ii) Fair (5<=x<6): appropriately implemented 50% of the methods presented in the article. Variation of
marks in this group will depend on the quality of report.
iii) Good (7<=x<8): appropriately implemented 70% of the methods presented in the article. Variation
of marks in this group will depend on the quality of report.
iv) Excellent(x>=8): appropriately implemented >=90% of the methods presented in the article.
Variation of marks in this group will depend on the quality of report.

SIT720 Machine Learning
Assessment Task 5: Machine Learning Project.

©Deakin University XXXXXXXXXX3 XXXXXXXXXXSIT720
2. Design and develop your own ML solution for this problem. The proposed solution should be different from
all approaches mentioned in the provided article. This does not mean that you must have to choose a new ML
algorithm. You can develop a novel solution by changing the feature selection approach or parameter
optimisations process of used ML methods or using different ML methods or different combinations of them.
This means, the proposed system should be substantially different from the methods presented in the article but
not limited to only change of ML methods. Compare the result with reported methods in the article. Write a
technical report summarising your solution design and outcomes. The report should include: XXXXXXXXXX20 marks)
i) Motivation behind the proposed solution.
ii) How the proposed solution is different from existing ones.
iii) Detail description of the model including all parameters so that any reader can implement your model.
iv) Description of experimental protocol.
v) Evaluation metrics.
vi) Present results using tables and graphs.
vii) Compare and discuss results with respect to existing literatures.
viii) Appropriate references (IEEE numbered).
N.B. This is a HD (High Distinction) level question. Those students who target HD grade should answer this
question (including answering all the above questions). For others, this question is an option. This question aims
to demonstrate your expertise in the subject area and the ability to do your own research in the related area.
Marking criteria:
(i) Unsatisfactory (<10): an appropriate solution presented whose performance is lower than the reported
performances in the article (Table 11). The variation in the marking in this group will depend on the quality of
the report.
(i) Fair (10 - <14): an appropriate solution presented whose performance is at least equal with the lowest
performance reported in the article (Table 11). The variation in the marking in this group will depend on the
quality of the report.
(ii) Good (>=14): an appropriate solution presented whose performance is better than the best reported
performances in the article

assessment-5-t2-2021-mpdsn1h2.pdf heartfailureclinicalrecordsdataset-oxi2tbt2-emkxzxq3.csv patientsurvivalprediction-v505445b-opniw00o.pdf

Answered 4 days After Oct 05, 2021 Deakin University

Solution

Suraj answered on Oct 09 2021

143 Votes

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Part 1 Creating different models with same as article "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"import os\n",
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.naive_bayes import GaussianNB\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.svm import SVC\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.ensemble import GradientBoostingClassifier\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import make_scorer,precision_score,recall_score,f1_score,accuracy_score,roc_auc_score,matthews_co
coef\n",
"from sklearn.model_selection import cross_validate"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"warnings.filterwarnings(\"ignore\")\n",
"os.chdir(\"C:/Users/Hp/Desktop\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"

\n",
"

Suraj · Accepted Answer

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 1 Creating different models with same as article "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings
",
    "import os
",
    "import pandas as pd
",
    "import numpy as np
",
    "from sklearn.linear_model import LogisticRegression
",
    "from sklearn.ensemble import RandomForestClassifier
",
    "from sklearn.naive_bayes import GaussianNB
",
    "from sklearn.tree import DecisionTreeClassifier
",
    "from sklearn.svm import SVC
",
    "from sklearn.neighbors import KNeighborsClassifier
",
    "from sklearn.ensemble import GradientBoostingClassifier
",
    "from sklearn.model_selection import train_test_split
",
    "from sklearn.metrics import make_scorer,precision_score,recall_score,f1_score,accuracy_score,roc_auc_score,matthews_corrcoef
",
    "from sklearn.model_selection import cross_validate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "warnings.filterwarnings("ignore")
",
    "os.chdir("C:/Users/Hp/Desktop")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      age
",
       "      anaemia
",
       "      creatinine_phosphokinase
",
       "      diabetes
",
       "      ejection_fraction
",
       "      high_blood_pressure
",
       "      platelets
",
       "      serum_creatinine
",
       "      serum_sodium
",
       "      sex
",
       "      smoking
",
       "      time
",
       "      DEATH_EVENT
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      75.0
",
       "      0
",
       "      582
",
       "      0
",
       "      20
",
       "      1
",
       "      265000.00
",
       "      1.9
",
       "      130
",
       "      1
",
       "      0
",
       "      4
",
       "      1
",
       "    
",
       "    
",
       "      1
",
       "      55.0
",
       "      0
",
       "      7861
",
       "      0
",
       "      38
",
       "      0
",
       "      263358.03
",
       "      1.1
",
       "      136
",
       "      1
",
       "      0
",
       "      6
",
       "      1
",
       "    
",
       "    
",
       "      2
",
       "      65.0
",
       "      0
",
       "      146
",
       "      0
",
       "      20
",
       "      0
",
       "      162000.00
",
       "      1.3
",
       "      129
",
       "      1
",
       "      1
",
       "      7
",
       "      1
",
       "    
",
       "    
",
       "      3
",
       "      50.0
",
       "      1
",
       "      111
",
       "      0
",
       "      20
",
       "      0
",
       "      210000.00
",
       "      1.9
",
       "      137
",
       "      1
",
       "      0
",
       "      7
",
       "      1
",
       "    
",
       "    
",
       "      4
",
       "      65.0
",
       "      1
",
       "      160
",
       "      1
",
       "      20
",
       "      0
",
       "      327000.00
",
       "      2.7
",
       "      116
",
       "      0
",
       "      0
",
       "      8
",
       "      1
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "    age  anaemia  creatinine_phosphokinase  diabetes  ejection_fraction  \
",
       "0  75.0        0                       582         0                 20   
",
       "1  55.0        0                      7861         0                 38   
",
       "2  65.0        0                       146         0                 20   
",
       "3  50.0        1                       111         0                 20   
",
       "4  65.0        1                       160         1                 20   
",
       "
",
       "   high_blood_pressure  platelets  serum_creatinine  serum_sodium  sex  \
",
       "0                    1  265000.00               1.9           130    1   
",
       "1                    0  263358.03               1.1           136    1   
",
       "2                    0  162000.00               1.3           129    1   
",
       "3                    0  210000.00               1.9           137    1   
",
       "4                    0  327000.00               2.7           116    0   
",
       "
",
       "   smoking  time  DEATH_EVENT  
",
       "0        0     4            1  
",
       "1        0     6            1  
",
       "2        1     7            1  
",
       "3        0     7            1  
",
       "4        0     8            1  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=pd.read_csv("heart.csv")
",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(299, 13)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape   # for checking dimension of the data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      age
",
       "      anaemia
",
       "      creatinine_phosphokinase
",
       "      diabetes
",
       "      ejection_fraction
",
       "      high_blood_pressure
",
       "      platelets
",
       "      serum_creatinine
",
       "      serum_sodium
",
       "      sex
",
       "      smoking
",
       "      time
",
       "      DEATH_EVENT
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      count
",
       "      299.000000
",
       "      299.000000
",
       "      299.000000
",
       "      299.000000
",
       "      299.000000
",
       "      299.000000
",
       "      299.000000
",
       "      299.00000
",
       "      299.000000
",
       "      299.000000
",
       "      299.00000
",
       "      299.000000
",
       "      299.00000
",
       "    
",
       "    
",
       "      mean
",
       "      60.833893
",
       "      0.431438
",
       "      581.839465
",
       "      0.418060
",
       "      38.083612
",
       "      0.351171
",
       "      263358.029264
",
       "      1.39388
",
       "      136.625418
",
       "      0.648829
",
       "      0.32107
",
       "      130.260870
",
       "      0.32107
",
       "    
",
       "    
",
       "      std
",
       "      11.894809
",
       "      0.496107
",
       "      970.287881
",
       "      0.494067
",
       "      11.834841
",
       "      0.478136
",
       "      97804.236869
",
       "      1.03451
",
       "      4.412477
",
       "      0.478136
",
       "      0.46767
",
       "      77.614208
",
       "      0.46767
",
       "    
",
       "    
",
       "      min
",
       "      40.000000
",
       "      0.000000
",
       "      23.000000
",
       "      0.000000
",
       "      14.000000
",
       "      0.000000
",
       "      25100.000000
",
       "      0.50000
",
       "      113.000000
",
       "      0.000000
",
       "      0.00000
",
       "      4.000000
",
       "      0.00000
",
       "    
",
       "    
",
       "      25%
",
       "      51.000000
",
       "      0.000000
",
       "      116.500000
",
       "      0.000000
",
       "      30.000000
",
       "      0.000000
",
       "      212500.000000
",
       "      0.90000
",
       "      134.000000
",
       "      0.000000
",
       "      0.00000
",
       "      73.000000
",
       "      0.00000
",
       "    
",
       "    
",
       "      50%
",
       "      60.000000
",
       "      0.000000
",
       "      250.000000
",
       "      0.000000
",
       "      38.000000
",
       "      0.000000
",
       "      262000.000000
",
       "      1.10000
",
       "      137.000000
",
       "      1.000000
",
       "      0.00000
",
       "      115.000000
",
       "      0.00000
",
       "    
",
       "    
",
       "      75%
",
       "      70.000000
",
       "      1.000000
",
       "      582.000000
",
       "      1.000000
",
       "      45.000000
",
       "      1.000000
",
       "      303500.000000
",
       "      1.40000
",
       "      140.000000
",
       "      1.000000
",
       "      1.00000
",
       "      203.000000
",
       "      1.00000
",
       "    
",
       "    
",
       "      max
",
       "      95.000000
",
       "      1.000000
",
       "      7861.000000
",
       "      1.000000
",
       "      80.000000
",
       "      1.000000
",
       "      850000.000000
",
       "      9.40000
",
       "      148.000000
",
       "      1.000000
",
       "      1.00000
",
       "      285.000000
",
       "      1.00000
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "              age     anaemia  creatinine_phosphokinase    diabetes  \
",
       "count  299.000000  299.000000                299.000000  299.000000   
",
       "mean    60.833893    0.431438                581.839465    0.418060   
",
       "std     11.894809    0.496107                970.287881    0.494067   
",
       "min     40.000000    0.000000                 23.000000    0.000000   
",
       "25%     51.000000    0.000000                116.500000    0.000000   
",
       "50%     60.000000    0.000000                250.000000    0.000000   
",
       "75%     70.000000    1.000000                582.000000    1.000000   
",
       "max     95.000000    1.000000               7861.000000    1.000000   
",
       "
",
       "       ejection_fraction  high_blood_pressure      platelets  \
",
       "count         299.000000           299.000000     299.000000   
",
       "mean           38.083612             0.351171  263358.029264   
",
       "std            11.834841             0.478136   97804.236869   
",
       "min            14.000000             0.000000   25100.000000   
",
       "25%            30.000000             0.000000  212500.000000   
",
       "50%            38.000000             0.000000  262000.000000   
",
       "75%            45.000000             1.000000  303500.000000   
",
       "max            80.000000             1.000000  850000.000000   
",
       "
",
       "       serum_creatinine  serum_sodium         sex    smoking        time  \
",
       "count         299.00000    299.000000  299.000000  299.00000  299.000000   
",
       "mean            1.39388    136.625418    0.648829    0.32107  130.260870   
",
       "std             1.03451      4.412477    0.478136    0.46767   77.614208   
",
       "min             0.50000    113.000000    0.000000    0.00000    4.000000   
",
       "25%             0.90000    134.000000    0.000000    0.00000   73.000000   
",
       "50%             1.10000    137.000000    1.000000    0.00000  115.000000   
",
       "75%             1.40000    140.000000    1.000000    1.00000  203.000000   
",
       "max             9.40000    148.000000    1.000000    1.00000  285.000000   
",
       "
",
       "       DEATH_EVENT  
",
       "count    299.00000  
",
       "mean       0.32107  
",
       "std        0.46767  
",
       "min        0.00000  
",
       "25%        0.00000  
",
       "50%        0.00000  
",
       "75%        1.00000  
",
       "max        1.00000  "
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()     # descriptive statistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "#spliting data set
",
    "dep_var=df.iloc[:,12:]
",
    "ind_var=df.iloc[:,:12]
",
    "x_train,x_test,y_train,y_test=train_test_split(ind_var,dep_var,train_size=0.80,random_state=0) # 80% training-testing rule"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Decision tree
",
    "tree = DecisionTreeClassifier()
",
    "tree=tree.fit(x_train,y_train)
",
    "y_pred=tree.predict(x_test)
",
    "scores={"precision":make_scorer(precision_score),"recall":make_scorer(recall_score),
",
    "        "f1_score":make_scorer(f1_score),"Accuracy":make_scorer(accuracy_score),"Roc":make_scorer(roc_auc_score),
",
    "         "MCC":make_scorer(matthews_corrcoef)}
",
    "kfold=cross_validate(tree,x_train,y_train,scoring=scores,cv=10)
",
    "l=list(kfold.values())
",
    "l1=[np.mean(l[2]),np.mean(l[3]),np.mean(l[4]),np.mean(l[5]),np.mean(l[6]),np.mean(l[7])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Graadiant boosting 
",
    "
",
    "gb= GradientBoostingClassifier(learning_rate=0.1)
",
    "gb.fit(x_train, y_train)
",
    "y_pred=gb.predict(x_test)
",
    "scores={"precision":make_scorer(precision_score),"recall":make_scorer(recall_score),
",
    "        "f1_score":make_scorer(f1_score),"Accuracy":make_scorer(accuracy_score),"Roc":make_scorer(roc_auc_score),
",
    "         "MCC":make_scorer(matthews_corrcoef)}
",
    "kfold=cross_validate(gb,x_train,y_train,scoring=scores,cv=10)
",
    "l=list(kfold.values())
",
    "l2=[np.mean(l[2]),np.mean(l[3]),np.mean(l[4]),np.mean(l[5]),np.mean(l[6]),np.mean(l[7])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Logistic regression
",
    "LR=LogisticRegression()
",
    "LR.fit(x_train,y_train)
",
    "y_pred=LR.predict(x_test)
",
    "scores={"precision":make_scorer(precision_score),"recall":make_scorer(recall_score),
",
    "        "f1_score":make_scorer(f1_score),"Accuracy":make_scorer(accuracy_score),"Roc":make_scorer(roc_auc_score),
",
    "         "MCC":make_scorer(matthews_corrcoef)}
",
    "kfold=cross_validate(LR,x_train,y_train,scoring=scores,cv=10)
",
    "l=list(kfold.values())
",
    "l3=[np.mean(l[2]),np.mean(l[3]),np.mean(l[4]),np.mean(l[5]),np.mean(l[6]),np.mean(l[7])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Random Forest
",
    "rf=RandomForestClassifier()
",
    "rf.fit(x_train,y_train)
",
    "y_pred=rf.predict(x_test)
",
    "scores={"precision":make_scorer(precision_score),"recall":make_scorer(recall_score),
",
    "        "f1_score":make_scorer(f1_score),"Accuracy":make_scorer(accuracy_score),"Roc":make_scorer(roc_auc_score),
",
    "         "MCC":make_scorer(matthews_corrcoef)}
",
    "kfold=cross_validate(rf,x_train,y_train,scoring=scores,cv=10)
",
    "l=list(kfold.values())
",
    "l4=[np.mean(l[2]),np.mean(l[3]),np.mean(l[4]),np.mean(l[5]),np.mean(l[6]),np.mean(l[7])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Gaussian Naive Bays
",
    "gnb = GaussianNB()
",
    "gnb.fit(x_train, y_train)
",
    "y_pred=gnb.predict(x_test)
",
    "scores={"precision":make_scorer(precision_score),"recall":make_scorer(recall_score),
",
    "        "f1_score":make_scorer(f1_score),"Accuracy":make_scorer(accuracy_score),"Roc":make_scorer(roc_auc_score),
",
    "         "MCC":make_scorer(matthews_corrcoef)}
",
    "kfold=cross_validate(gnb,x_train,y_train,scoring=scores,cv=10)
",
    "l=list(kfold.values())
",
    "l5=[np.mean(l[2]),np.mean(l[3]),np.mean(l[4]),np.mean(l[5]),np.mean(l[6]),np.mean(l[7])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# SVM linear
",
    "clf = SVC(kernel='linear')
",
    "clf=clf.fit(x_train,y_train)
",
    "y_pred=clf.predict(x_test)
",
    "scores={"precision":make_scorer(precision_score),"recall":make_scorer(recall_score),
",
    "        "f1_score":make_scorer(f1_score),"Accuracy":make_scorer(accuracy_score),"Roc":make_scorer(roc_auc_score),
",
    "         "MCC":make_scorer(matthews_corrcoef)}
",
    "kfold=cross_validate(clf,x_train,y_train,scoring=scores,cv=10)
",
    "l=list(kfold.values())
",
    "l6=[np.mean(l[2]),np.mean(l[3]),np.mean(l[4]),np.mean(l[5]),np.mean(l[6]),np.mean(l[7])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# SVM Radial
",
    "clf1 = SVC(kernel='rbf')
",
    "clf1=clf1.fit(x_train,y_train)
",
    "y_pred=clf1.predict(x_test)
",
    "scores={"precision":make_scorer(precision_score),"recall":make_scorer(recall_score),
",
    "        "f1_score":make_scorer(f1_score),"Accuracy":make_scorer(accuracy_score),"Roc":make_scorer(roc_auc_score),
",
    "         "MCC":make_scorer(matthews_corrcoef)}
",
    "kfold=cross_validate(clf1,x_train,y_train,scoring=scores,cv=10)
",
    "l=list(kfold.values())
",
    "l7=[np.mean(l[2]),np.mean(l[3]),np.mean(l[4]),np.mean(l[5]),np.mean(l[6]),np.mean(l[7])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "#KNN
",
    "knn=KNeighborsClassifier(n_neighbors=3)
",
    "knn=knn.fit(x_train,y_train)
",
    "y_pred=knn.predict(x_test)
",
    "scores={"precision":make_scorer(precision_score),"recall":make_scorer(recall_score),
",
    "        "f1_score":make_scorer(f1_score),"Accuracy":make_scorer(accuracy_score),"Roc":make_scorer(roc_auc_score),
",
    "         "MCC":make_scorer(matthews_corrcoef)}
",
    "kfold=cross_validate(knn,x_train,y_train,scoring=scores,cv=10)
",
    "l=list(kfold.values())
",
    "l8=[np.mean(l[2]),np.mean(l[3]),np.mean(l[4]),np.mean(l[5]),np.mean(l[6]),np.mean(l[7])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      TN Rate
",
       "      TP Rate
",
       "      F1 score
",
       "      Accuracy
",
       "      ROC AUC
",
       "      MCC
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      Decision Tree
",
       "      0.669744
",
       "      0.701786
",
       "      0.675712
",
       "      0.794928
",
       "      0.769827
",
       "      0.536673
",
       "    
",
       "    
",
       "      Gradiant Boosting
",
       "      0.730000
",
       "      0.689286
",
       "      0.698099
",
       "      0.824094
",
       "      0.787474
",
       "      0.584983
",
       "    
",
       "    
",
       "      Logistic Regression
",
       "      0.788095
",
       "      0.701786
",
       "      0.714206
",
       "      0.841304
",
       "      0.802915
",
       "      0.628675
",
       "    
",
       "    
",
       "      Random forest
",
       "      0.831429
",
       "      0.705357
",
       "      0.746942
",
       "      0.857790
",
       "      0.816833
",
       "      0.668407
",
       "    
",
       "    
",
       "      Naive Bayes
",
       "      0.806667
",
       "      0.482143
",
       "      0.588065
",
       "      0.807609
",
       "      0.717174
",
       "      0.512565
",
       "    
",
       "    
",
       "      SVM linear
",
       "      0.831786
",
       "      0.566071
",
       "      0.641754
",
       "      0.820290
",
       "      0.750131
",
       "      0.567889
",
       "    
",
       "    
",
       "      SVM Radial
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.694565
",
       "      0.500000
",
       "      0.000000
",
       "    
",
       "    
",
       "      KNN
",
       "      0.335000
",
       "      0.203571
",
       "      0.249394
",
       "      0.627355
",
       "      0.508955
",
       "      0.025294
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "                      TN Rate   TP Rate  F1 score  Accuracy   ROC AUC  \
",
       "Decision Tree        0.669744  0.701786  0.675712  0.