A Python desktop application for scraping university websites to find professors and their contact information based on research keywords.
- π― Keyword-based professor search across multiple universities
- π« University configuration management with custom selectors
- π SQLite database storage for universities and professors
- π Export data to CSV format
- π Filter professors by research areas
- π Dark theme UI using CustomTkinter
- π§ Email template generation for contacting professors
-
Clone or download the project
git clone https://github.com/MagusDev/university-scraper.git cd "university-scraper"
-
Create a virtual environment (recommended)
python -m venv .venv .venv\Scripts\activate # On Windows
-
Install dependencies
pip install -r requirements.txt
-
Activate your virtual environment (if using one)
.venv\Scripts\activate # On Windows
-
Run the application
python UI.py
The application will open with a dark-themed GUI containing two main tabs: "Professors" and "Universities".
Before scraping, you need to configure universities with the correct HTML selectors:
- Click on the "Universities" tab in the application
- Name: University name (e.g., "MIT")
- Department: Department name (e.g., "Computer Science")
- URL: The faculty/staff listing page URL
You need to inspect the university's website to find the correct HTML elements:
- Open the university's faculty page in your browser
- Right-click on a professor's card/item and select "Inspect Element"
- Look for the container element that wraps each professor's information
- Note the tag name (e.g.,
div,article,li) β This is your Modal Tag - Note the class name (e.g.,
faculty-card,professor-item) β This is your Modal Class
Example:
<div class="faculty-member-card">
<!-- Modal Tag: div, Modal Class: faculty-member-card -->
<a href="/professor/john-doe">
<h3>Dr. John Doe</h3>
<p>Associate Professor</p>
</a>
</div>- Click on a professor's profile link to go to their individual page
- Right-click on the professor's name and select "Inspect Element"
- Note the tag name (e.g.,
h1,h2,span) β This is your Name Tag - Note the class name (e.g.,
professor-name,page-title) β This is your Name Class
Example:
<h1 class="professor-title">Dr. John Doe</h1>
<!-- Name Tag: h1, Name Class: professor-title -->- On the professor's profile page, look for their email
- Right-click on the email and select "Inspect Element"
- Note the tag name (e.g.,
a,span,div) β This is your Email Tag - Note the class name (e.g.,
email-link,contact-email) β This is your Email Class
Example:
<a class="contact-email" href="mailto:john.doe@university.edu"
>john.doe@university.edu</a
>
<!-- Email Tag: a, Email Class: contact-email -->- Click "Add New University" to save the configuration
- The university will appear in the universities list
- Select universities from the top list by clicking on them
- Click "Add to Selection" to move them to the "Selected Universities" list
- Only universities in the selection list will be scraped
- Click on the "Professors" tab
- In the text box, enter research keywords separated by:
- Commas (
,) - Semicolons (
;) - New lines
- Commas (
Example keywords:
machine learning, artificial intelligence
computer vision; natural language processing
deep learning
neural networks
- Click the "Scrape" button
- The application will search through selected universities
- Progress will be shown in the log panel on the left
- Found professors will appear in the professors table
- Click "Filter" to show only professors with "research" or "lab" in their content
- Select rows in either table
- Click "Delete Selected" to remove them
- Use the "Export Universities" or "Export Professors" buttons in the sidebar
- Data will be saved as CSV files
Name: MIT
Department: EECS
URL: https://www.eecs.mit.edu/people/faculty-advisors/
Modal Tag: div
Modal Class: views-row
Name Tag: h1
Name Class: page-title
Email Tag: a
Email Class: email-link
Name: Stanford
Department: Computer Science
URL: https://cs.stanford.edu/directory/faculty
Modal Tag: div
Modal Class: person-card
Name Tag: h2
Name Class: person-name
Email Tag: span
Email Class: email-address
- No professors found: Check if your HTML selectors are correct
- Application crashes during scraping: Some universities may have anti-bot protection
- Empty email fields: The email selector might be incorrect or emails are not publicly displayed
- Test selectors on a few professor pages manually first
- Some universities load content dynamically (JavaScript) - this scraper works with static HTML only
- Be respectful with scraping frequency to avoid being blocked
- Check if the university has an official API or directory
The application uses SQLite database (scaper_data.db) with two tables:
universities: Stores university configurationsprofessors: Stores scraped professor information
See requirements.txt for all dependencies:
- customtkinter
- requests
- beautifulsoup4
- plyer
Mohammad Abaeiani (MagusDev)
This project is for educational purposes. Please respect robots.txt files and website terms of service when scraping.
