RealTimeOCR is a computer vision project that combines YOLO (You Only Look Once) for object detection and PaddleOCR for optical character recognition (OCR) to identify and read text from objects in real-time video feeds. This project can be used for various applications, such as automated text extraction from documents, license plates, or any text-containing objects in videos.
- Real-time object detection using YOLO
- Text extraction from detected objects using PaddleOCR
- Customizable Region of Interest (ROI) for focused detection
- Easy integration with video streams
Make sure you have the following dependencies installed:
- Python 3.7+
- OpenCV
- Pandas
- NumPy
- PaddleOCR
- Ultralytics YOLO
- CVZone
You can install the necessary packages using pip:
pip install opencv-python pandas numpy paddleocr ultralytics cvzone- Clone the repository:
git clone https://github.com/AmmarMohamed0/RealTimeOCR.git cd RealTimeOCR - Download the YOLO weights:
- Make sure to place your
best.ptweights file in the project directory.
- Prepare the class labels:
- Create a
coco.txtfile in the project directory with the class labels (one per line).
- Capture a video:
- Place a sample video file named
nr.mp4in the project directory or modify the code to use your video file.
- Run the project:
python YOLO10_and_PaaddleOCR.py- The video feed is captured using OpenCV.
- YOLO model predicts the bounding boxes for objects in the video.
- The detected objects' bounding boxes are checked against a defined polygonal Region of Interest (ROI).
- If an object is detected within the ROI, it is cropped, resized, and processed using PaddleOCR to extract any text.
- The detected text is displayed on the video frame.
This project is licensed under the MIT License - see the LICENSE file for details.