Homework: This homework will test and develop your knowledge on predictive model construction and using their predictive performance for decision making. Please submit your answers for each section in a word document with your explanations with a supporting R script file you used for this assignment.
Use the BankruptcyData.csv file and the BankruptcyNew.csv data files for this assignment. The attributes to develop a predictive model for predicting bankruptcy of firms. Data dictionary is provided below with the variables’ labels in the data sets and the long descriptive names.
Attribute
No
Short
Name
Long
Name
1
fyea
Fiscal yea
2
at
Assets-total
3
bkvlps
Book value per share
4
invt
Inventories- total
5
Lt
Liabilities-total
6
rect
Receivables-total
7
cogs
Cost of goods sold
8
dvt
Dividends-Total
9
ebit
Earnings before interests and taxes
10
gp
Gross profit
11
ni
Net income (loss)
12
oiadp
Operating income after depreciation
13
revt
Revenue- total
14
dvpsx_f
Dividends per share -ex-date fiscal
15
mkvalt
Market value -Total -Fiscal
16
prch_f
Price High -Annual-Fiscal
17
bankruptcy
Output (Y/N)
1. Are there any variables with missing values? If so, remove records with any missing value.
Write your script for checking and removing records with missing values.
2. Build a logistic regression model using the 70% of the data for training and 30% for the testing to predict bankruptcy in R. Present a summary table of the odds ratios for the significant predictors (p<0.05).
3. Use a variable selection algorithm to remove multicollinearity. Write the final model with the list of final predictors in the model.
4. Report the confusion matrix using the final logistic regression model and the test data. Calculate model accuracy, precision, sensitivity and F1 score.
5. Plot the ROC curve and report the AUC statistic.
6. Construct a decision tree model (called DT-1) to predict the bankruptcy with 70% of the data for training and 30% for the testing.
7. Calculate model accuracy, precision, sensitivity and F1 score. Plot the ROC curve with the AUC statistic reported.
8. For pruning the decision tree what complexity parameter value would you use? Prune the tree and construct the confusion matrix for the pruned model (called DT-2). Calculate model accuracy, precision, sensitivity and F1 score for DT-2. Plot the ROC curve with the AUC statistic reported.
9. Which model would you use for predicting the new applications? Explain why.
10. The credit organization received new applications with the attribute values in the new Bankrupcynew.csv data file. Use one of the models you constructed to predict if the companies will undergo bankruptcy or not. Report the resulting table.
BankruptcyData.csv
BankruptcyData
fyear at bkvlps invt Lt rectr cogs dvt ebit gp ni oiadp revt dvpsx_f mkvalt prch_f Bankrupt
2007 1.733 -4.3373 0 65.444 0 0 0.168 -1.009 0 -1.181 -1.009 0 0 4.4088 0.5 yes
2006 2.614 -4.2455 0 64.976 0 0 0.179 -0.534 0.002 -0.121 -0.534 0.002 0 7.348 0.75 yes
2005 4.193 -4.2252 0 66.256 0 0 0.113 -0.406 0 -0.448 -0.406 0 0 4.1149 0.65 yes
2004 1.841 -4.1869 0 63.342 0 0 0.113 -0.357 0 -0.395 -0.357 0 0 4.4088 0.75 yes
2008 0 -1.4509 0 0.062 0 0 0 -0.062 0 -0.062 -0.062 0 0 0.0256 0.09 no
2007 0 -1.4266 0 0 0 0 0 0 0 -1.343 0 0 0 0.1559 0.1 no
2008 2.406 -1.1355 0 62.004 0 0 0.146 -2.343 0 -2.791 -2.343 0 0 10.5026 0.6 yes
2006 0 -1.0736 0 0.321 0 0 0 -0.142 0 -0.142 -0.142 0 0 0.1196 1.01 yes
2002 3 -1.007 0 3.587 0 0 0 -1.154 0 -1.217 -1.154 0 0 2.997 3.75 no
2002 2.643 -0.9861 0 12.25 0 0 0 -0.133 0.737 -1.278 -0.133 0.737 0 0.0974 0.02 yes
2004 0 -0.7597 0 0.098 0 0 0 -0.061 0 -0.021 -0.061 0 0 0.1419 6.9231 yes
2005 0 -0.6595 0 0.184 0 0 0 -0.198 0 -0.198 -0.198 0 0 0.0698 3.5 yes
2005 2.137 -0.537 0 7.501 0 0 0 -1.041 0.074 -0.676 -1.041 0.074 0 1.9978 0.44 yes
2004 3.367 -0.4692 0 8.055 0 0 0 -1.231 0.048 -1.231 -1.231 0.048 0 1.3989 0.45 yes
2002 1.009 -0.4164 0 5.644 0 0.548 0 -8.807 0.396 -29.56 -8.807 0.944 0 1.6967 214 no
2003 5.386 -0.346 0 8.843 0 0 0 -0.596 0.091 2.742 -0.596 0.091 0 2.9976 0.47 yes
2002 0 -0.2529 0 1.446 0 0 0 -0.11 0 -0.097 -0.11 0 0 0.343 1.65 no
2003 0.001 -0.1581 0 0.376 0 0 0 -0.821 0 -0.815 -0.821 0 0 0.3605 1.75 no
2004 0.023 -0.1276 0 0.417 0 0 0 -0.065 0 -0.079 -0.065 0 0 0.9261 2.75 no
2009 0.649 -0.1265 0 7.444 0 1.529 0 0.155 3.014 -4.551 0.155 4.543 0 0.7795 0.06 no
2006 0.007 -0.1177 0 5.022 0 0 0 -0.523 0 -1.314 -0.523 0 0 0.4262 0.035 no
2008 0.386 -0.0776 0 10.267 0 0 0 -0.449 0 -6.383 -0.449 0 0 0.1909 0.5 no
2007 0.005 -0.075 0 0.792 0 0 0 -0.037 0 -0.037 -0.037 0 0 2.6248 0.8 no
2009 0.007 -0.0526 0 1.335 0 0 0 -1.316 0 -1.363 -1.316 0 0 6.8132 2.99 no
2003 0.009 -0.0521 0 0.614 0 0 0 -0.182 0 0.514 -0.182 0 0 2.9043 0.27 no
2002 0 -0.0345 0 0.03 0 0.014 0 -0.012 -0.002 -0.014 -0.012 0.012 0 0.0608 7 no
2002 0.128 -0.0308 0 1.011 0 0 0 -0.469 0 -0.505 -0.469 0 0 0.717 0.24 no
2005 0.144 -0.0278 0 0.387 0 0 0 -1.089 0 -0.957 -1.089 0 0 3.6662 1.7 no
2009 0 -0.0273 0 18.523 0 0 0 -4.671 0 -6.52 -4.671 0 0 19.1323 0.11 no
2002 0.001 -0.0267 0 0.299 0 0 0 -0.777 0 -4.265 -0.777 0 0 0.3409 0.63 no
2006 0.062 -0.0236 0 0.45 0 0 0 -1.539 0.012 -1.589 -1.539 0.012 0 5.7624 1.01 no
2008 0 -0.0218 0 13.91 0 4.677 0 -4.625 -4.625 -5.361 -4.625 0.052 0 58.9719 0.155 no
2007 0.5 -0.0217 0 2.517 0 2.31 0 -4.261 1.363 -5.116 -4.261 3.673 0 7.4292 1.25 yes
2009 1.652 -0.0213 0 11.933 0 0 0 -0.441 0 3.736 -0.441 0 0 0.1931 0.02 no
2004 0.002 -0.0213 0 0.411 0 0.296 0 -0.293 -0.293 -0.33 -0.293 0.003 0 5.3774 2 no
2007 0.856 -0.0185 0 1.858 0 0 0 -3.172 0 -3.275 -3.172 0 0 5.4191 0.375 yes
2003 0 -0.0175 0 0.309 0 0 0 -0.376 0 -0.468 -0.376 0 0 26.4795 5 no
2007 1.21 -0.0175 0 1.658 0 2.764 0 -2.519 -2.484 -2.463 -2.519 0.28 0 30.2835 1.95 no
2008 0.003 -0.0161 0 0.403 0 0 0 -1.513 0 -1.513 -1.513 0 0 77.8932 22.75 no
2007 0.001 -0.0155 0 9.788 0 0 0 -4.022 0 XXXXXXXXXX -4.022 0 0 42.2333 0.23 no
2002 0 -0.013 0 0.026 0 0 0 -0.098 0 -0.098 -0.098 0 0 0.3 32 no
2007 3.248 -0.0094 0 5.033 0 0 0 -4.777 0 -43.072 -4.777 0 0 4.2016 0.145 no
2009 0.409 -0.0091 0 0.814 0 0 0 -0.356 0 -3.626 -0.356 0 0 19.9976 0.45 no
2009 0.287 -0.0078 0 0.64 0 0.09 0 -1.014 -0.09 -1.222 -1.014 0 0 5.211 0.16 no
2004 0.108 -0.0073 0 0.368 0 0 0 -1.627 0 -1.627 -1.627 0 0 11.8206 0.78 no
2007 0 -0.0055 0 0.092 0 0 0 -0.213 0 -0.04 -0.213 0 0 0.8361 0.21 no
2008 0.154 -0.0045 0 0.875 0 0 0 -0.088 0 -0.088 -0.088 0 0 8.05 1 no
2008 0.56 -0.0028 0 0.536 0 0.398 0 0.012 0.215 -0.931 0.012 0.613 0 2.2099 0.1 no
2009 0.005 -0.0026 0 0.121 0 0 0 -0.057 0 -0.057 -0.057 0 0 0.4892 0.02 no
2002 0 -0.0023 0 0.197 0 0 0 -0.129 0 12 -0.129 0 0 0.3554 0.014 yes
2007 0.001 -0.0014 0 0.489 0 0 0.03 -0.842 0 -1.51 -0.842 0 0 1.8811 0.036 no
2008 0.002 -0.0013 0 0.062 0 0 0 -0.117 0 -0.107 -0.117 0 0 0.4448 0.19 no
2003 0 -0.0011 0 0.095 0 0 0 -0.01 0 0.102 -0.01 0 0 0.1692 0.005 yes
2005 0.011 -0.0007 0 0.017 0 0 0 -0.037 0 -0.037 -0.037 0 0 0.4199 0.15 no
2005 0.027 -0.0004 0 0.111 0 0.007 0.052 -5.266 -0.004 -21.573 -5.266 0.003 0 5.643 0.37 no
2004 0.055 -0.0003 0 0.086 0 2.841 0.166 -2.881 -2.833 -3.382 -2.881 0.008 0 11.5321 1.1136 no
2009 0.315 -0.0001 0 0.325 0 1.79 0 -1.531 -1.341 -1.531 -1.531 0.449 0 4.3334 0.4 no
2006 0.776 0.0011 0 0.462 0 0 0.03 -1.676 0 -1.703 -1.676 0 0 9.2324 0.149 no
2009 0.022 0.0019 0 0.001 0 0 0 -0.023 0 -0.023 -0.023 0 0 0.3455 0.15 no
2003 4.18 0.0019 0 4.094 0 0.65 0 0.595 0.595 0.514 0.595 1.245 0 2.4268 0.09 no
2003 0.127 0.0034 0 0.059 0 0 0 -2.409 0 -2.409 -2.409 0 0 74.1955 4.15 no
2007 5.913 0.0034 0 5.882 0 0 26.745 26.532 27.484 26.74 26.532 27.484 3.05 XXXXXXXXXX 41 no
2004 0.042 0.0036 0 0.012 0 0 0 -0.05 0 3.124 -0.05 0 0 0.8397 1.4 no
2006 7.204 0.0039 0 7.168 0 0 30.288 30.095 31.079 30.259 30.095 31.079 2.92 XXXXXXXXXX 43.75 no
2008 0.047 0.0042 0 0.002 0 0 0 -0.025 0 -0.025 -0.025 0 0 0.4319 0.68 no
2003 4.064 0.005 0 4.019 0 0 17.418 17.367 18.169 17.398 17.367 18.169 1.88 XXXXXXXXXX 24.62 no
2008 9.525 0.0063 0 9.467 0 0 33.638 33.569 34.645 33.665 33.569 34.645 3.27 XXXXXXXXXX 40.39 no
2007 0.071 0.0064 0 0.002 0 0 0 -0.03 0 -0.03 -0.03 0 0 1.2956 0.51 no
2005 3.92 0.007 0 3.856 0 0 20.225 20.164 21.085 20.223 20.164 21.085 2.13 XXXXXXXXXX 34.4 no
2002 3.459 0.0073 0 3.394 0 0 16.88 16.825 17.436 16.886 16.825 17.436 2.11 196.482 25.5 no
2004 3.014 0.0074 0 2.948 0 0 14.286 14.286 15.061 14.308 14.286 15.061 1.71 XXXXXXXXXX 25.99 no
2006 0.1 0.0092 0 0.001 0 0 0 -0.014 0 -0.014 -0.014 0 0 1.9435 0.25 no
2009 3.586 0.0102 0 3.492 0 0 27.664 27.688 28.724 27.699 27.688 28.724 3.66 XXXXXXXXXX 36.939 no
2005 4.166 0.0134 0 3.57 0 0 0 -0.259 0 -0.259 -0.259 0 0 0.2851 0.02 no
2005 1.444 0.0173 0 0.567 0 0.191 0 -0.188 -0.188 -0.075 -0.188 0.003 0 2.0272 0.65 no
2004 4.175 0.0192 0 3.32 0 0.55 0 -0.525 -0.525 0.485 -0.525 0.025 0 0.6682 0.08 no
2009 7.823 0.0413 0 7.633 0 3.642 0 -3.312 -3.12 -5.469 -3.312 0.522 0 3.68 1.35 no
2007 1.213 0.0868 0 0.738 0 0.011 0 -0.038 -0.011 -0.061 -0.038 0 0 25.2988 8 no
2009 16.114 0.1072 0 7.766 0 0 0 -1.105 0 -1.783 -1.105 0 0 26.9288 2.4 no
2010 21.495 0.1547 0 7.317 0 0.459 0 -1.314 0.141 -2.104 -1.314 0.6 0 11.2049 0.72 no
2006 125 0.1839 0 5.562 0 0 0 -3.91 0 -3.514 -3.91 0 0 XXXXXXXXXX 0.51 no
2005 125.003 0.1878 0 3.036 0 0 0 -7.372 0 -14.928 -7.372 0 0 XXXXXXXXXX 0.65 no
2008 6.138 0.6613 0 2.779 0 0.03 0 -7.916 0.126 -9.202 -7.916 0.156 0 1.5346 3.2488 no
2006 17.136 1.0083 0 1.034 0 0.307 0 -6.217 0.539 -5.615 -6.217 0.846 0 19.3225 2.05 no
2003 28.51 1.291 0 6.99 0 0.223 0 -6.871 0.488 -10.704 -6.871 0.711 0 22.5986 1.65 no
2005 22.46 1.3138 0 1.284 0 0.345 0 -5.895 0.693 -5.468 -5.895 1.038 0 18.9709 2.92 no
2008 11.813 1.3545 0 6.273 0 2.089 0 -1.302 -1.121 -0.195 -1.302 0.968 0 2.0859 2.3 no
2004 28.682 1.6555 0 1.673 0 0.282 0 -6.312 0.635 6.004 -6.312 0.917 0 46.3863 4.28 no
2002 50.247 1.9108 0 16.237 0 0.516 0 -6.823 0.391 -11.108 -6.823 0.907 0 13.0808 1.37 no
2007 13.29 2.9689 0 1.438 0 0.297 0 -5.271 0.436 -4.418 -5.271 0.733 0 7.0658 5.96 no
2002 248.497 7.7239 0 229.952 0 5.233 0.228 3.658 11.486 1.052 3.658 16.719 0.095 22.5694 10.66 no
2003 342.469 7.7958 0 323.689 0 6.151 0.276 4.574 13.877 1.341 4.574 20.028 0.085 30.1125 12.5 no
2004 405.047 7.9469 0 385.291 0 7.565 0.332 4.028 13.816 0.926 4.028 21.381 0.165 45.7424 18.6 no
2004 254.115 8.5196 0 238.26 0 3.306 0 3.919 9.094 1.218 3.919 12.4 0 63.3671 34.05 no
2003 222.613 9.6061 0 207.714 0 2.762 0.008 2.462 7.129 0.482 2.462 9.891 0 40.0934 26.5 no
2009 1098.14 9.6977 0 202.855 0 256.774 0 XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX 138.257 0 1641.81 34.28 yes
2005 470.641 9.7106 0 439.266 0 11.38 0.445 5.315 16.998 1.407 5.315 28.378 0.155 67.0433 21.64 no
2008 75.464 10.1891 0 34.412 0 0 0 -13.253 0 -4.884 -13.253 0 0 38.3158 11.75 yes
2008 XXXXXXXXXX 10.7222 0 226.877 0 233.989 0 -15.36 30.001 -35.883 -15.36 263.99 0 XXXXXXXXXX 46.82 yes
2009 76.89 10.788 0 33.425 0 0 0 -0.698 0 2.084 -0.698 0 0 33.199 10.01 yes
2005 845.108 11.4069 0 786.431 0 14.961 2.665 14.867 38.968 6.511 14.867 53.929 0.5245 98.5076 26.8182 no
2007 64 11.6654 0 17 0 0 19 -4 0 7 -4 0 6.39 48.348 19.25 yes
2006 872.094 12.2791 0 808.832 0 20.04 2.787 15.806 41.374 7.335 15.806 61.414 0.54 108.192 23 no
2008 913.359 12.3917 0 854.585 0 39.829 0.854 -3.499 15.772 -7.112 -3.499 55.601 0.135 32.4896 16.55 no
2007 914.781 13.1717 0 846.802 0 25.053 2.808 15.07 41.745 7.669 15.07 66.798 0.545 89.0273 21.35 no
2006 645.983 13.5647 0 581.7 0 19.11 0.701 7.379 21.259 2.415 7.379 40.369 0.175 XXXXXXXXXX 30.5 no
2007 807.53 14.0794 0 740.695 0 27.842 0.854 8.91 26.432 2.678 8.91 54.274 0.18 75.8096 26.52 no
2006 110 15.8848 0 46 0 0 10 -13 0 15 -13 0 5 74.1336 21 yes
2005 125 18.0962 0 52 0 3 49 -22 1 29 -22 4 13 70.595 32 yes
2008 0.306 -0.0109 0.181 3.845 0 0.736 0 -1.796 -0.54 -1.962 -1.796 0.196 0 1.0331 0.048 no
2005 0.905 -0.059 0.202 3.813 0 0.561 0 -2.396 0.201 -3.806 -2.396 0.762 0 6.922 0.155 no
2003 220.21 23.0332 0.214 200.793 0 3.862 0.303 3.708 10.552 1.275 3.708 14.414 0.36 16.6493 19.9 no
2006 516.299 7.934 0.242 450.804 0 12.956 0 7.02 18.203 2.466 7.02 31.159 0 68.5165 12.5143 no
2002 179.537 10.109 0.302 164.606 0 3.162 0.002 2.272 5.774 0.784 2.272 8.936 0 21.7267 16.0952 no
2002 578.359 4.0905 0.331 527.171 0 7.337 3.244 22.908 40.223 11.715 22.908 47.56 0.26 XXXXXXXXXX 19.75 no
2007 222.218 13.1435 0.338 200.058 0 6.053 0.337 4.588 11.733 0.927 4.588 17.786 0.25 7.7725 10.75 no
2002 640.01 10.9734 0.368 591.398 0 12.904 2.323 11.889 33.351 5.702 11.889 46.255 0.525 73.981 17.5 no
2008 599.385 8.1279 0.4 516.933 0 11.983 0.016 8.165 19.835 2.91 8.165 31.818 0 55.5441 8.33 no
2009 42.967 7.3097 0.51 5.388 0 23.292 3.132 3.755 6.42 3.057 3.755 29.712 0.605 62.7202 14.96 no
2010 41.41 7.0729 0.52 5.006 0 22.211 3.188 2.29 4.873 1.851 2.29 27.084 0.62 72.5727 16.5 no
2002 XXXXXXXXXX 7.1159 0.533 XXXXXXXXXX 0 71.851 6.459 27.093 54.878 16.308 27.093 126.729 0.4 XXXXXXXXXX 22.5 no
2002 36.563 6.346 0.541 3.881 0 21.754 2.371 5.394 8.056 3.819 5.394 29.81 0.46 58.401 12.25 no
2003 37.537 6.4123 0.565 4.584 0 21.731 3.112 5.113 7.645 3.583 5.113 29.376 0.48 60.3833 12.15 no
2007 45.835 7.6591 0.582 6.498 0 23.647 2.209 5.493 8.327 4.189 5.493 31.974 0.57 87.5688 17.1 no
2004 40.58 6.7919 0.583 5.683 0 21.218 2.723 4.895 7.216 4.702 4.895 28.434 0.52 73.2165 15.25 no
2006 43.13 7.22 0.625 6.041 0 22.806 2.877 4.91 7.514 3.64 4.91 30.32 0.56 74.7434 14.8 no
2005 42.549 7.044 0.626 6.357 0 21.169 2.8 5.171 7.438 3.849 5.171 28.607 0.54 71.7779 14.6 no
2007 503.506 7.8763 0.658 434.62 0 13.947 0.004 5.474 16.324 0.844 5.474 30.271 0 69.5307 10.0381 no
2004 805.475 12.1442 0.757 750.073 0 9.027 2.529 15.036 37.77 6.161 15.036 46.797 0.555 XXXXXXXXXX 28 no
2008 44.057 7.4406 0.801 5.842 0 23.16 3.569 4.313 6.944 3.535 4.313 30.104 0.69 71.6472 17.43 no
2004 1.423 -0.0526 0.836 3.312 0 1.697 0 -1.502 1.083 -1.658 -1.502 2.78 0 7.446 0.173 no
2006 268.008 12.9075 0.951 246.246 0 5.64 0.337 5.183 12.402 1.786 5.183 18.042 0.2 16.6914 13 no
2003 689.21 11.4567 0.965 637.46 0 9.673 2.434 11.869 35.06 5.805 11.869 44.733 0.5425 XXXXXXXXXX 25.05 no
2009 577.658 4.7768 1.055 522.139 0 14.393 0.743 6.414 17.043 -25.768 6.414 31.436 0 27.6784 7.63 no
2004 230.541 12.0641 1.099 210.201 0 3.663 0.337 4.223 11.732 1.139 4.223 15.395 0.4 42.15 25 no
2005 245