Python Web Scraping: Step-By-Step Guide (2025)

If you want to explore web scraping, Python is the best place to start. Thanks to its simple syntax and great library support, Python makes it easy to extract data from websites.

In this tutorial, you’ll learn how to use Requests and Beautiful Soup to scrape web pages and analyze them. As an example, the project will collect post titles from the r/programming subreddit and determine the most mentioned programming languages.

What Is Web Scraping?

Web scraping is the automated collection of data from websites.
Scrapers fetch a page’s HTML and extract needed data. Advanced tools may even use headless browsers to simulate user actions.

⚠️ Web scraping can break easily when a website’s structure changes. Always check for available APIs before scraping.

Why Use Python?

Python offers unmatched simplicity and a strong ecosystem:

Requests for handling HTTP requests
BeautifulSoup for HTML parsing
Scrapy and Playwright for advanced use cases

These tools are well-documented, reliable, and widely used by developers.

Setup

You’ll need Python installed. Then, install the libraries:

pip install requests
pip install bs4

Create a file named scraper.py for your code.

Fetching HTML

Fetching page data is the first step. The example below loads the front page of r/programming from the old Reddit interface.

import requests

page = requests.get("https://old.reddit.com/r/programming/",
                    headers={'User-agent': 'Sorry, learning Python!'})
html = page.content

Parsing HTML

To extract titles from the HTML, use BeautifulSoup.

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
p_tags = soup.find_all("p", "title")
titles = [p.find("a").get_text() for p in p_tags]

print(titles)

This prints the titles of the posts on the first page.

Scraping Multiple Pages

You can extend the script to scrape multiple pages by looping through them.

import requests
from bs4 import BeautifulSoup
import time

post_titles = []
next_page = "https://old.reddit.com/r/programming/"

for current_page in range(0, 20):
    page = requests.get(next_page, headers={'User-agent': 'Sorry, learning Python!'})
    html = page.content

    soup = BeautifulSoup(html, "html.parser")
    p_tags = soup.find_all("p", "title")
    titles = [p.find("a").get_text() for p in p_tags]
    post_titles += titles

    next_page = soup.find("span", "next-button").find("a")['href']
    time.sleep(3)

print(post_titles)

Finding the Most Mentioned Programming Languages

After scraping, you can analyze which programming languages appear most often in post titles.

language_counter = {
    "javascript": 0, "html": 0, "css": 0, "sql": 0,
    "python": 0, "typescript": 0, "java": 0, "c#": 0,
    "c++": 0, "php": 0, "c": 0, "powershell": 0,
    "go": 0, "rust": 0, "kotlin": 0, "dart": 0, "ruby": 0,
}

words = []
for title in post_titles:
    words += [word.lower() for word in title.split()]

for word in words:
    for key in language_counter:
        if word == key:
            language_counter[key] += 1

print(language_counter)

Using Proxies for Web Scraping

Frequent scraping can get you blocked. Use a proxy server to hide your IP and distribute requests.

Example with IPRoyal Residential Proxies:

PROXIES = {
    "http": "http://yourusername:yourpassword@geo.iproyal.com:22323",
    "https": "http://yourusername:yourpassword@geo.iproyal.com:22323"
}

page = requests.get(next_page,
                    headers={'User-agent': 'Just learning Python, sorry!'},
                    proxies=PROXIES)

This routes all requests through proxy servers to prevent rate-limiting or bans.

Summary

You’ve learned how to:

Fetch and parse HTML with Requests + BeautifulSoup
Scrape multiple pages of Reddit
Count programming language mentions
Add proxy rotation for safer scraping

For more advanced scraping, explore frameworks like Scrapy or Playwright.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
code_snippets		code_snippets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Web Scraping: Step-By-Step Guide (2025)

What Is Web Scraping?

Why Use Python?

Setup

Fetching HTML

Parsing HTML

Scraping Multiple Pages

Finding the Most Mentioned Programming Languages

Using Proxies for Web Scraping

Summary

About

Uh oh!

Releases

Packages

Languages

IPRoyal/python-web-scraping-guide

Folders and files

Latest commit

History

Repository files navigation

Python Web Scraping: Step-By-Step Guide (2025)

What Is Web Scraping?

Why Use Python?

Setup

Fetching HTML

Parsing HTML

Scraping Multiple Pages

Finding the Most Mentioned Programming Languages

Using Proxies for Web Scraping

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages