The Diagram Detector project identifies and classifies diagrams in questions and images. It uses machine learning models for text and image detection to help detect diagram-related content in educational and scientific material.
-
data/- datasetsraw/- original datasetstext_dataset.csvimages/diagram/(subfolders: biology, botany, chemistry, mathematics, physics, zoology)no_diagram/
processed/- processed datasetsprocessed_text_dataset.csvprocessed_images/
-
models/- trained modelstext_detector/image_detector.pt
-
src/- training / preprocessing / utilitiespreprocess.pytext_detector.pyimage_detector.py
-
api/- FastAPI applicationmain.py
-
tests/- unit teststest_api.py
Other files: Dockerfile, requirements.txt, setup.sh, start.py.
- Text detection using transformer-based models (BERT or similar)
- Image detection using a ResNet-based classifier
- FastAPI REST endpoints with automatic docs
- Health checks and basic monitoring endpoints
- Configurable via environment variables
- Tests with pytest and included example scripts
- Python 3.8+
- Recommended: create and use a virtual environment (instructions below)
Use a virtual environment to isolate project dependencies.
-
Verify Python is available (use
python3on many Linux/macOS systems):python3 --version
-
Create the virtual environment in the project root:
python3 -m venv .venv
-
Activate the virtual environment:
-
Linux / macOS (bash/zsh):
source .venv/bin/activate -
Windows (PowerShell):
.\.venv\Scripts\Activate.ps1
-
-
Upgrade packaging tools and install requirements:
pip install --upgrade pip setuptools wheel pip install -r requirements.txt
Notes:
- If you need CUDA-enabled PyTorch, use the selector at https://pytorch.org/get-started/locally/ to obtain the correct install command (it may use a custom
--index-url). Example CPU-only command:pip install torch --index-url https://download.pytorch.org/whl/cpu
-
Clone the repository:
git clone https://github.com/ankitrajsh/ML-Detect-Diagram-in-Question-convert-into-Mathjax.git cd ML-Detect-Diagram-in-Question-convert-into-Mathjax -
Create and activate a virtual environment (see the "Virtual environment" section above for commands).
-
Install dependencies:
pip install -r requirements.txt
-
(Optional) Copy environment example and edit values if present:
cp .env.example .env # only if .env.example exists -
Start the API (two equivalent options):
# recommended: helper script python start.py # or run the FastAPI app directly python api/main.py
-
Open the API docs in your browser: http://localhost:8000/docs
- Build the image:
docker build -t diagram-detector . - Run the container (map the port):
docker run -p 8000:8000 diagram-detector
- Documentation:
GET /docs - Health:
GET /health - Info:
GET /
Example: Text detection
curl -X POST "http://localhost:8000/detect_text" \
-H "Content-Type: application/json" \
-d '{"question": "Draw the structure of benzene"}'Example: Image detection
curl -X POST "http://localhost:8000/detect_image" \
-H "Content-Type: multipart/form-data" \
-F "file=@your_image.jpg"Configuration is controlled via environment variables. If a .env.example is provided, copy it to .env and update values.
Important variables (defaults shown where applicable):
API_HOST(default:0.0.0.0)API_PORT(default:8000)LOG_LEVEL(default:info)MAX_FILE_SIZE_MB(default:10)TEXT_MODEL_PATH- path to a saved text model (if required)IMAGE_MODEL_PATH- path to the image model (e.g.,models/image_detector.pt)
-
Run tests:
pytest
-
Run a single test file:
pytest tests/test_api.py
-
Run basic tests without pytest (if the file is executable as a script):
python tests/test_api.py
-
Create sample data (if
preprocess.pysupports it):python src/preprocess.py --create-sample
-
Train models (if training scripts are implemented):
python src/text_detector.py python src/image_detector.py
- If the API fails to start, check that dependencies from
requirements.txtare installed and that the configured model paths exist. - Check logs (the application respects
LOG_LEVEL).
Contributions are welcome. Please open issues or submit pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.