CISC 5790: Data Mining Prof. Yijun ZhaoFordham University, Spring 2023Course ProjectDue: May 81...

Question

CISC 5790: Data Mining Prof. Yijun ZhaoFordham University, Spring 2023Course ProjectDue: May 81 IntroductionThis project requires you to explore classification algorithms on a real world dataset, and write aeport explaining your experimental results. The language of implementation is up to you — theonly requirement is that your program be able to interpret the data format specified below, ande able to classify instances and produce interesting statistics such as accuracy, false positive rate,false negative rate, etc. You are free to construct whatever user interface for your program, butyou must fully document your interface.2 Algorithm• Your algorithm should be based on the classification algorithms learned during the course.Usually a straight forward implementation of one method will not lead to satisfactory perfor-mance. Your algorithm can be a combination of methods and should incorporate one or moredata mining techniques when the situation arises. These techniques include (and certainlynot limited to):– Handling imbalanced dataset– Proper imputation methods for missing values– Different treatment of various type of features: continuous, discrete, categorical, etc.3 DataYou’ll be examining the behavior of your model on a dataset from the UCI machine learning lab.The dataset is represented in a standard format, consisting of 3 files. The first file, census-income.names,describes the categories and features of the dataset. It also has some empirical results for your ref-erence. The other two files are census-income.data and census-income.test, containing theactual data instances, formatted at one instance per line, as follows:1F 11 , F21 , . . . , Fk1 , label1F 12 , F22 , . . . , Fk2 , label2...F 1n , F2n , . . . , Fkn , labelnwhere F ji , labeli (i = 1, . . . , n, j = 1, . . . , k) represent the value of the jth feature and class categoryfor the ith instance respectively.The data you will be examining was extracted from the census bureau database. Each instancecontains an individual’s educational, demographic and family information. Prediction task is todetermine whether a person makes over 50K a year. You should use census-income.data totrain your classifier and use census-income.test to evaluate the performance of your learningalgorithm.4 Your Mission...Deliverables for this project are:• Code to implement the classification algorithm for the data file formats given above• A README file, with simple, clear instructions on how to compile and run youcode• Testing statistics for the application of your learning algorithm. At a minimum you shouldprovide training set accuracy, test set accuracy• A discussion of data mining techniques employed in your algorithm• A report analyzing the behavior of your algorithm on the dataset, including any unusual oanomalous (in your opinion) behavio25 How to turn in your code• Create a README file, with simple, clear instructions on how to compile and runyour project. If the TA cannot run your program by following the instructions,you will receive 50% of programing score.• Zip all your files (code, README, written report, etc.) in a zip file named{firstname} {lastname} CS5790 project.zip and upload it to Blackboard• Only one person in your group needs to turn in the code and the report. Makesure every team member’s name is listed on the cover of the report3

Mukesh · Accepted Answer

PowerPoint Presentation
Income Classification
About Dataset
An individual’s annual income results from various factors. Intuitively, it is influenced by the individual’s education level, age, gender, occupation, and etc.
This is a widely cited KNN dataset. I encountered it during my course, and I wish to share it here because it is a good starter example for data pre-processing and machine learning practices.
Fields
The dataset contains 16 columns
Target filed: Income
-- The income is divide into two classes: 50K
Number of attributes: 14
-- These are the demographics and other features to describe a person
We can explore the possibility in predicting income level based on the individual’s personal information.
Independent features
age: continuous.
workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov,
Without-pay, Never-worked.
fnlwgt: continuous.
education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
education-num: continuous.
marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
occupation: Tech-support, Craft-repair, Other-service, Sales,

CISC 5790: Data Mining Prof. Yijun Zhao Fordham University, Spring 2023 Course Project Due: May 8 1 Introduction This project requires you to explore classification algorithms on a real world...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment