Python Code for Data Wrangling and Machine Learning with Report. All information on the task has...

Question

Python Code for Data Wrangling and Machine Learning with Report. All information on the task has been included in the file named 'Task Info' in the zipped file attached. A marking rubric has also been included for reference.

Sandeep Kumar · Accepted Answer

Introduction
With the boom in services, which can be easily accessed through a mobile application it has become more than vital to test their reliability, traditionally reviews and ratings have been the most robust ways to assess a service’s value. With tens of thousands of users, and their reviews it has been difficult to track their reliability and helpfulness. Such a task would take a human years to complete and is hence is needed to be automated. So, in this project I will be applying various machine learning algorithms to train a model to perform sentiment analysis on and predict the review ratings.
Data Source
For this project, the yelp dataset will be used, which comprises of 28068 training datasets and 7018 testing datasets of review and review metadata each. The review metadata contains the columns: date, business_id, review_id, reviewer_id, vote_cool, vote_useful, vote_cool. While the review dataset has only review column.
Features and Preprocessing
In the beginning, the model was trained with the data from review texts and review ratings. So we combined the differing dataset for review and review metadata and created a new column for the number of characters in the review. The review text was also cleaned. The format, style and whitespace were discarded. Word vectorization was implemented and text collection was converted to a matrix of token counts with over 75657 units, after erasing the word suffixes to find the root of words. Also, stopwords which are words that don’t have any informative value but appear regularly in the language wee discarded as well. The frequency of every word were counted and the sparse matrix was extracted.
Models
The focus of the models was positive, neutral and negative review rating, so the various models that have been used were of sentiment analysis and classification orientation, like random forest, decision tree,

Python Code for Data Wrangling and Machine Learning with Report. All information on the task has been included in the file named 'Task Info' in the zipped file attached. A marking rubric has also been...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment