In making this project I aimed to accomplish two things:
- To compare and contrast these two architectures seeking insight into how they work and how each approach performs against the other
- To outline all the building and benchmarking processes in a comprehensive, consolidated Juptyer notebook
├── #1. Data Preprocessing
│ ├── Load and inspect dataset
│ ├── Split data into train, val, and test sets
│
├── #2. ML (Logistic Regression) Approach
│ ├── Extract features with TF-IDF
│ ├── Train LR model
│ ├── Evaluate model on validation and test set
│ ├── Check performance
│
├── #3. DL (LSTM) Approach
│ ├── Tokenize and vectorize features
│ ├── Create dataset and dataloaders
│ ├── Build and train LSTM model
│ ├── Test on validation and test sets
│
├── #4. Comparisons and Conclusions
│ ├── Analyze classification reports
│ ├── Create, compare, and analyze ROC curves
│ ├── Analyze confusion matrices
│ ├── Check error overlap
│ ├── Analyze and compare model training and inferencing times
│ ├── Compare model memory usage
│ ├── Observe and analyze model sizes
git clone https://github.com/SawyerAlston/ML-vs-DL-Sentiment-Classification.git
cd ML-vs-DL-Sentiment-ClassificationThis project uses a variety of libraries, for compatibility please see the requirements.txt file.
pip install -r requirements.txtReplace cell 3 of Data Preprocessing section of the notebook with the following updated code:
from sklearn.model_selection import train_test_split
print (df["sentiment"].value_counts())
df = df.drop_duplicates(subset="text")
print (df["sentiment"].value_counts())
df = df.sample(frac=0.1, random_state=1)
# Split into train&val (90%) and the test set (10%)
train_val_texts, test_texts, train_val_labels, test_labels = train_test_split(df["text"].values, df["sentiment"].values, test_size=0.1, random_state=1)
# Now split train&val into train (90% of 90% --> 81%) and val (10% of 90% --> 9%)
train_texts, val_texts, train_labels, val_labels = train_test_split(train_val_texts, train_val_labels, test_size=0.2, random_state=1)
print(f"train split: {train_texts.shape[0]}")
print(f"validation split: {val_texts.shape[0]}")
print(f"test split: {test_texts.shape[0]}")Feel free to check all around, tinker, modify, or play/learn from the project however you would like :)
├── notebook(s)
│ ├── ML_vs_DL_classification_comparison.ipynb.ipynb
├── plots
├── (various model benchmark images)
├── README.md
├── requirements.txtThis project is licensed under the MIT License.



