In-Depth ML vs. DL Sentiment Classification Walkthrough and Comparison

In making this project I aimed to accomplish two things:

To compare and contrast these two architectures seeking insight into how they work and how each approach performs against the other
To outline all the building and benchmarking processes in a comprehensive, consolidated Juptyer notebook

📄 Notebook Outline

├── #1. Data Preprocessing
│     ├── Load and inspect dataset
│     ├── Split data into train, val, and test sets
│  
├── #2. ML (Logistic Regression) Approach
│     ├── Extract features with TF-IDF
│     ├── Train LR model
│     ├── Evaluate model on validation and test set
│     ├── Check performance
│  
├── #3. DL (LSTM) Approach
│     ├── Tokenize and vectorize features
│     ├── Create dataset and dataloaders
│     ├── Build and train LSTM model
│     ├── Test on validation and test sets
│  
├── #4. Comparisons and Conclusions
│     ├── Analyze classification reports
│     ├── Create, compare, and analyze ROC curves
│     ├── Analyze confusion matrices
│     ├── Check error overlap
│     ├── Analyze and compare model training and inferencing times
│     ├── Compare model memory usage
│     ├── Observe and analyze model sizes

🚀 How to Run

1. Clone the Repository

git clone https://github.com/SawyerAlston/ML-vs-DL-Sentiment-Classification.git
cd ML-vs-DL-Sentiment-Classification

2. Install Dependencies

This project uses a variety of libraries, for compatibility please see the requirements.txt file.

pip install -r requirements.txt

3. Reduce Dataset (performance optional)

Replace cell 3 of Data Preprocessing section of the notebook with the following updated code:

from sklearn.model_selection import train_test_split

print (df["sentiment"].value_counts())
df = df.drop_duplicates(subset="text")
print (df["sentiment"].value_counts())
df = df.sample(frac=0.1, random_state=1)

# Split into train&val (90%) and the test set (10%)
train_val_texts, test_texts, train_val_labels, test_labels = train_test_split(df["text"].values, df["sentiment"].values, test_size=0.1, random_state=1)

# Now split train&val into train (90% of 90% --> 81%) and val (10% of 90% --> 9%)
train_texts, val_texts, train_labels, val_labels = train_test_split(train_val_texts, train_val_labels, test_size=0.2, random_state=1)

print(f"train split: {train_texts.shape[0]}")
print(f"validation split: {val_texts.shape[0]}")
print(f"test split: {test_texts.shape[0]}")

4. Explore the Notebook!

Feel free to check all around, tinker, modify, or play/learn from the project however you would like :)

📊 Benchmarking Plots:

ROC Curve:

Confusion Matrices:

Train Times:

Inference Times:

📁 Repo Structure

├── notebook(s)
│   ├── ML_vs_DL_classification_comparison.ipynb.ipynb
├── plots
    ├── (various model benchmark images)
├── README.md
├── requirements.txt

📝License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Notebook(s)		Notebook(s)
Plots		Plots
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

In-Depth ML vs. DL Sentiment Classification Walkthrough and Comparison

📄 Notebook Outline

🚀 How to Run

1. Clone the Repository

2. Install Dependencies

3. Reduce Dataset (performance optional)

4. Explore the Notebook!

📊 Benchmarking Plots:

ROC Curve:

Confusion Matrices:

Train Times:

Inference Times:

📁 Repo Structure

📝License

About

Uh oh!

Releases

Packages

Languages

License

SawyerAlston/ML-vs-DL-Sentiment-Classification

Folders and files

Latest commit

History

Repository files navigation

In-Depth ML vs. DL Sentiment Classification Walkthrough and Comparison

📄 Notebook Outline

🚀 How to Run

1. Clone the Repository

2. Install Dependencies

3. Reduce Dataset (performance optional)

4. Explore the Notebook!

📊 Benchmarking Plots:

ROC Curve:

Confusion Matrices:

Train Times:

Inference Times:

📁 Repo Structure

📝License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages