Resume Parser

Overview

This project is a Resume Parsing System built in Google Colab to extract and process key information from resumes. It leverages Machine Learning (ML) models, Natural Language Processing (NLP), and predefined libraries to identify structured data from resumes in PDF and DOCX formats. The extracted data is stored in an Excel sheet for further analysis.

Features 🚀

File Handling:
- Supports PDF and DOCX resume files.
- Reads files directly from the /content/ directory.
Data Extraction & Preprocessing:
- Extracts raw text from uploaded resumes.
- Cleans and formats extracted text for analysis.
Named Entity Recognition (NER) with Hugging Face Models:
- Identifies key details like Name, Father’s Name, Date of Birth, Gender, Phone Number, Education, Experience, and Key Skills.
- Uses a hybrid model combining pre-trained NER models and custom-trained models.
Key Skills Matching:
- Uses a predefined dataset for extracting soft skills, technical skills, and programming languages.
- Leaves the field blank if no match is found.
Education & Experience Extraction:
- Extracts university name, degree, and time period.
- Uses an alternative approach when regex-based extraction fails.
User Corrections & Data Labeling:
- Allows users to correct misidentified labels.
- Saves corrections and applies them in subsequent extractions.
Excel Storage & Structured Output:
- Stores parsed data in an Excel file (SampleResumeParsing.xlsx).
- Ensures structured and indexed data output.

Workflow 📌

Upload Resumes → Place resume files in /content/.
Run Google Colab Notebook → Extract and process data.
User Corrections → Adjust incorrect labels if needed.
Save & Export Data → Outputs structured data to an Excel sheet.

Dependencies 🔧

Ensure the following Python libraries are installed in Google Colab:

!pip install pdfplumber python-docx pandas openpyxl transformers

Future Enhancements ✨

Integrate FastAPI for a web-based API service.
Add more datasets for improved accuracy in skill and education extraction.
Improve multi-language support for parsing non-English resumes.

License 📜

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Resumeparser.ipynb		Resumeparser.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Resume Parser

Overview

Features 🚀

Workflow 📌

Dependencies 🔧

Future Enhancements ✨

License 📜

About

Uh oh!

Releases

Packages

Languages

CodingSuru/Resume_Parser

Folders and files

Latest commit

History

Repository files navigation

Resume Parser

Overview

Features 🚀

Workflow 📌

Dependencies 🔧

Future Enhancements ✨

License 📜

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages