Project Insight is designed to create NLP as a service with code base for both front end GUI (streamlit)  and backend server (FastApi) the usage of transformers models on various downstream NLP task.
The downstream NLP tasks covered:
- 
News Classification
 - 
Entity Recognition
 - 
Sentiment Analysis
 - 
Summarization
 - 
Information Extraction
To Do 
The user can select different models from the drop down to run the inference.
The users can also directly use the backend fastapi server to have a command line inference.
- Python Code Base: Built using 
FastapiandStreamlitmaking the complete code base in Python. - Expandable: The backend is desinged in a way that it can be expanded with more Transformer based models and it will be available in the front end app automatically.
 - Micro-Services: The backend is designed with a microservices architecture, with dockerfile for each service and leveraging on Nginx as a reverse proxy to each independently running service.
- This makes it easy to update, manitain, start, stop individual NLP services.
 
 
- Clone the Repo.
 - Run the 
Docker Composeto spin up the Fastapi based backend service. - Run the Streamlit app with the 
streamlit run command. 
- 
Download the models
- Download the models from here
 - Save them in the specific model folders inside the 
src_fastapifolder. 
 - 
Running the backend service.
- Go to the 
src_fastapifolder - Run the 
Docker Composecomnand 
$ cd src_fastapi src_fastapi:~$ sudo docker-compose up -d
 - Go to the 
 - 
Running the frontend app.
- Go to the 
src_streamlitfolder 
- Run the app with the streamlit run command
 
$ cd src_streamlit src_streamlit:~$ streamlit run NLPfily.py
 - Go to the 
 - 
Access to Fastapi Documentation: Since this is a microservice based design, every NLP task has its own seperate documentation
- News Classification: http://localhost:8080/api/v1/classification/docs
 - Sentiment Analysis: http://localhost:8080/api/v1/sentiment/docs
 - NER: http://localhost:8080/api/v1/ner/docs
 - Summarization: http://localhost:8080/api/v1/summary/docs
 
 
- 
Front End: Front end code is in the
src_streamlitfolder. Along with theDockerfileandrequirements.txt - 
Back End: Back End code is in the
src_fastapifolder.- This folder contains directory for each task: 
Classification,ner,summary...etc - Each NLP task has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile so that they can be independently mantained and managed.
 - Each NLP task has its own folder and within each folder each trained model has 1 folder each. For example:
 
- sentiment > app > api > distilbert - model.bin - network.py - tokeniser files >roberta - model.bin - network.py - tokeniser files- 
For each new model under each service a new folder will have to be added.
 - 
Each folder model will need the following files:
- Model bin file.
 - Tokenizer files
 network.pyDefining the class of the model if customised model used.
 - 
config.json: This file contains the details of the models in the backend and the dataset they are trained on. 
 - This folder contains directory for each task: 
 
- 
Fine Tune a transformer model for specific task. You can leverage the transformers-tutorials
 - 
Save the model files, tokenizer files and also create a
network.pyscript if using a customized training network. - 
Create a directory within the NLP task with
directory_nameas themodel nameand save all the files in this directory. - 
Update the
config.jsonwith the model details and dataset details. - 
Update the
<service>pro.pywith the correct imports and conditions where the model is imported. For example for a new Bert model in Classification Task, do the following:- 
Create a new directory in
classification/app/api/. Directory namebert. - 
Update
config.jsonwith following:"classification": { "model-1": { "name": "DistilBERT", "info": "This model is trained on News Aggregator Dataset from UC Irvin Machine Learning Repository. The news headlines are classified into 4 categories: **Business**, **Science and Technology**, **Entertainment**, **Health**. [New Dataset](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)" }, "model-2": { "name": "BERT", "info": "Model Info" } }
 - 
Update
classificationpro.pywith the following snippets:Only if customized class used
from classification.bert import BertClass
Section where the model is selected
if model == "bert": self.model = BertClass() self.tokenizer = BertTokenizerFast.from_pretrained(self.path)
 
 - 
 
This project is licensed under the GPL-3.0 License - see the LICENSE.md file for details

