- Sentiment analysis is the process of determining whether a piece of writing is positive, negative or neutral.
 - In this project I have demondtrated how various Machine Learning and Deep Learning models can be used for sentiment analysis.
 
- 
The dataset used is "Sentiment Labelled Sentences Dataset", from the UC Irvine Machine Learning Repository.
 - 
The sentences come from three different websites/fields:
- amazon.com
 - imdb.com
 - yelp.com
 
 - 
Each sentence is labelled as either 1 (for positive) or 0 (for negative).
 - 
For each website,tThere exist 500 positive and 500 negative sentences.
 - 
This dataset was created for the Paper 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015. (Please cite the paper if you want to use it :))
 - 
Link to the dataset is: Sentiment Labelled Sentences Data Set
 - 
The dataset is present in the Dataset folder.
 
- I have used the follwoing Machine Learning models:
 
- Multinomial Naive bayes
 - Random Forest
 - LinearSVC
 
- The code implementing these models is in 'modules/Sentiment_Analysis_ML.ipynb'.
 - All the trained models are stored at 'models/ML'. Thereafter the models are segrated as per the dataset (Amazon, IMDB, Yelp).
 
- I have used the follwoing Deep Learning models:
 
- Feed Forward Neural Network (FFNN)
 - Convolutional Neural Network (CNN)
 - Recurrent Neural Network (LSTM)
 
- As the dataset consists of three different set of data, I have created three different implementations for each of them.
 
- Amazon product Rreview Dataset ('modules/Amazon_Sentiment_Analysis_DL.ipynb')
 - IMDB Movie Review Dataset ('modules/IMDB_Sentiment_Analysis_DL.ipynb')
 - Yelp Restuarant Review Dataset ('modules/Yelp_Sentiment_Analysis_DL.ipynb')
 
- All the trained models are stored at 'models/DL'. Thereafter the models are segrated as per the dataset (Amazon, IMDB, Yelp).
 
- All the Deep Learning architectures use the GloVe Word Embeddings.
 - To download click here (please download them before running the code.)
 - The 6 Billion words, 100 dimensional vector representation variant is used.
 - The have been stored at location 'Dataset/GloVe_Word_Embeddings'
 
After tyring various machine learning and deep learning models, I got the following results.
| Model | Amazon Reviews | IMDB Reviews | Yelp Reviews | 
|---|---|---|---|
| Multinomial Naive Bayes | 85% | 85% | 78% | 
| Random Forest | 80% | 79% | 79% | 
| Linear SVC | 84% | 81.50% | 80% | 
| FFNN | 81.50% | 84% | 82% | 
| CNN | 87% | 85.50% | 82.50% | 
| LSTM | 87% | 85% | 83% |