Question 1.
The wine data set at https:
archive.ics.uci.edu/ml/datasets/wine has 13 features. Develop in
Python and apply your own version of the PCA algorithm to this data set, to visualize how PCA
helps with dimensionality reduction. Explain how many Principal Components you will choose
and why. What percent of the variance in the data do the selected Principal Components
cover?
For the implementation, you may use any objects, modules, and functions in NumPy, SciPy and
other python li
aries to do various operations such as to compute the eigen values, vectors or
perform any other math / linear alge
a operation, but not use the PCA function available in
SciKit-Learn directly.
Question 2. AutoML
Follow the Automated Machine Learning (AutoML) Walkthrough using DataRobot given here:
https:
community.datarobot.com/t5/knowledge-base/automated-machine-learning-automl-
walkthrough/tac-p/8251
Use the free trial version of DataRobot on a prediction or classification problem that you solved
in a ML domain in the past. By what % does the accuracy increase when you use DataRobot?
Which models listed on the leade
oard perform the best and why? List 5 takeaways from this
exercise.
Question 3. Graph Databases for ML
Here's an implementation of the K-means algorithm using Neo4j’s Cypher query language:
https:
medium.com/neo4j/k-means-clustering-with-neo4j-b0ec54bf0103
Implement K-means for the same dataset in python and using similar visualizations, compare its
performance with the implementation in Cypher.
https:
archive.ics.uci.edu/ml/datasets/wine
https:
community.datarobot.com/t5/knowledge-base/automated-machine-learning-automl-walkthrough/tac-p/8251
https:
community.datarobot.com/t5/knowledge-base/automated-machine-learning-automl-walkthrough/tac-p/8251
https:
medium.com/neo4j/k-means-clustering-with-neo4j-b0ec54bf0103