Assignment 1The development of a data mining or rule visualisation routine There are two parts to...

Question

Assignment 1The development of a data mining or rule visualisation routine There are two parts to this assignment. You are required to answer EITHER part.Part A You will be provided with various sets of data for mining (and you can create your own). The assignment is to develop and implement a data mining algorithm (of any kind) such that: • it does not already exist in any commercially available system (although a significant extension to one that does is acceptable), • it is backed by appropriate research.  Documentation (in the form of a 3-5 page description relating the research behind the algorithm and a discussion anything that is novel/useful about your algorithm. Note that I do not require "formal" documentation).  Part B You will be provided with various rulesets that require appropriate visualisation tools. The assignment is to develop and implement a visualisation algorithm (of any kind) such that: • it does not already exist in any commercially available system  • it is backed by appropriate research   Documentation (in the form of a 3-5 page description relating the research behind the algorithm and a discussion anything that is novel/useful about your algorithm. Note that I do not require "formal" documentation).  Extensions Other extension (or undertaking both parts of the assignment!) would be looked on favourably and marks will be awarded up to the maximum mark available for the assignment - ie a nice extension can make up for lost marks. Marking Criteria for Both Parts Basic algorithm coded in any language 16 marks. Bonus for extensions 4 marks. Documentation 10 marks.  As far as the algorithm is concerned, you will be marked on the quality of your solution as follows:   a. computational complexity of you algorithm.   b. elegance of your programming.   c. accuracy and configurability (ie. setting thresholds).  As far as the documentation is concerned, you will be marked on:   a. your research into methods available and the novelty of your solution.   . your explanation of your algorithm.  Submission of Assignment All assignment 1's should be zipped into an archive (using your favorite zip package) and uploaded to FLO. It should include everything including documentation, the source, the executable and any test data you developed for yourself.  Name the document surname.zip where surname is your surname.  Data Mining and Knowledge DiscoveryCOMP7707 Advanced Data Mining (and Knowledge Discovery)Prof. John Roddick XXXXXXXXXXWith contributions from Aaron Ceglar, Carl Mooney and Mark Lethidge.Naturally occuing Cubic PyriteCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityOverview of Topic© 2018, Flinders University*TopicsIntroduction    The Role of Common Sense    Trends in Information Management    Fundamental Ideas    Developing Data Mining Algorithms    Applications of Knowledge Discovery    Future Directions in DMKDData Mining Techniques    Association Rule Mining    Clustering Algorithms    Classification and Prediction    Sequential Pattern Mining    Text Mining    Higher Order Data Mining    Visualisation TechniquesIncluding Higher Semantics    Spatial Data Mining    Temporal and Longitudinal Data Mining    Interestingness    Web MiningKnowledge Discovery    Ethics in Data Mining    Knowledge Discovery FrameworksNaturally occuing Cubic PyriteCOMP7707 Advanced Data MiningProf. John RoddickFlinders University XXXXXXXXXX*© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityDMKD - the discipline    A merger of (at least) four disciplines.*© 2018, Flinders University*Data Mining and Knowledge DiscoveryArtificialIntelligenceDatabaseSystemsStatisticsVisualisationVLDB, data warehousing, data modelling, data semantics, …Decision Tree Induction, Clustering, Inductive Logic, …Validity, Confidence, Autocoelation, …Data Visualisation, Dimension Reduction, …© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityWhere it fits in ICT    Database queries can be considered to confirm answers to fairly well formed questions or provide simple answers to (relatively) simple questions.    Data Analysis is used to give answers to questions which might require some discussion or where the answer is at first vague.    Data Mining allows the question itself to be ill-formed.  “Tell me something interesting about …”*© 2018, Flinders University*© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityTerminology    Data Mining is the term used to describe the algorithmsoutines used to discover interesting aspects about a dataset.      Knowledge Discovery is the term used to describe the overarching discovery process.    The difference is similar to the difference between programming and software engineering.    The terminology is misused (and misappropriated) quite a bit.    DMKD is one of the hottest research topic to emerge in the database research area in some years.*© 2018, Flinders University*© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityResearch Sources    Major Conferences    ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD    IEEE International Conference on Data Mining, ICDM    European Conference on Principles of Data Mining and Knowledge Discovery, PKDD    Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD    SIAM International Conference on Data Mining    International Conference on Data Warehousing and Knowledge Discovery, DaWaK    … plus local conferences such as AusDM    Conferences that have many DMKD papers    ACM SIGMOD International Conference on the Management of Data, SIGMOD    International Conference on Information and Knowledge Management, CIKM    International Conference on Very Large Data Bases, VLDB    IEEE International Conference on Data Engineering, ICDE    Journals    Data Mining and Knowledge Discovery, DMKD    ACM Transactions on Knowledge Discovery from Data, TOKDD     ACM Transactions on Database Systems, TODS    IEEE Transactions on Knowledge and Data Engineering, TKDE    Knowledge and Intelligent Systems, KAIS    Data and Knowledge Engineering, DKE*© 2018, Flinders University*© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityAbout ADM - the topic    Knowledge of Database Systems, Artificial Intelligence, Statistics and Visualisation is not required for this topic.      HOWEVER, if you find something a little difficult as a result of not having studied it, do read up on it.  I will try and provide references.    Being such a new area, some of the subject matter will come direct from research material.  Ie. do not expect to find all of the things we talk about implemented in commercial systems yet.    Enormous scope to join the team at Flinders in doing postdoctoral, postgraduate or adjunct research.*© 2018, Flinders University*© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityTopic Organisation    SAM has important details - please read    Assignments    I’ve kept it simple.    You can do all of them and get best of them - but be strategic.    Tutorial/Discussions Sessions    Will start in week 3*© 2018, Flinders University*© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityTopic Organisation 2    Timetable    Thursdays for 13 weeks    Lectures.    3pm – 5pm, 1 hr 50 mins    Tonsley 1.03    Tutorial - Starting wk 3.    noon – 1pm, 50 mins    Tonsley 1.14    Text Book    Tan, Steinbach and Kumar - worth the investment but not critical to buy    Other resources available in various University liaries*© 2018, Flinders University*© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityAssessment    Any two of…    Assignment 1 - The development of a data mining or rule visualisation routine    Assignment 2 - A research based pape    Assignment 3 - A critique of a seminal DMKD pape*© 2018, Flinders University*© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityTopic 1The Role of Common SenseCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityBenford’s Law    In 1938 Benford noticed that pages of logarithms coesponding to numbers starting with the numeral 1 were much dirtier than other pages.    The Theory …    Ask anyone to choose numbers randomly and, over a largish number of numbers, there will be     1/9th starting with 1,     1/9th starting with 2, etc.*© 2018, Flinders University*However, naturally occuing numbers do not follow this pattern.  They generally have:30% starting with 1,18% starting with 2, etc.© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityBenford’s Law, cont.    We can therefore tell if something that was supposed to be naturally occuing has been faked.  For example,    the numbers in an audited set of accounts …    random samples from a day's stock quotations,     a tournament's tennis scores,     the numbers on the front page of The New York Times,     the populations of towns,     the molecular weights of compounds,     the half-lives of radioactive atoms…     Has been applied to     fraud cases in Brooklyn    Income tax fraud in California*© 2018, Flinders University*(From "The First-Digit Phenomenon" by T. P. Hill, American Scientist, July-August 1998)© 2018, Flinders UniversityCOMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders UniversityTopic 2Trends in Information Management COMP7707 Advanced Data Mining, Semester 1, 2018COMP7707 Advanced Data Mining, Semester 1, 2018John F. Roddick, Flinders University*John F. Roddick, Flinders

Abr Writing · Accepted Answer

DataMining
May 25, 2018
Decision trees are very often used for prediction task and is extremely useful for following
reasons:
1. Decision trees perform the task of feature selection absolutely Features selection is one of
the most important task in data analysis. In a decision tree, when we fit the classifier to
dataset, it become very easy to figure out the most important features in the data from the
top few nodes. Higher the node is in the hierarchy, the more important and better its power
to split the data and perform the classification task. We described here why feature selection
is important in analytics.
2. Decision trees classifier can be easily trained by users than other classifier The different kind
of data normalization and transformation is not necessary in a decision tree because the
structure of the tree remains the same irrespective of that. For example,

Assignment 1 The development of a data mining or rule visualisation routine There are two parts to this assignment. You are required to answer EITHER part. Part A You will be provided with various...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment