OSRS Trading Data ETL Pipeline

This repository contains several ETL (Extract, Transform, Load) jobs for updating trading data and news for the OSRS Trading App. The pipeline ingests data from multiple sources (RuneScape Wiki APIs and an RSS feed), processes and transforms it into structured Parquet files, and then loads the results into Google Cloud Storage. The jobs are deployed on Cloud Run and scheduled via Cloud Scheduler.

Job Variants

The repository includes multiple job variants tailored for different data update frequencies and sources:

1-Hour Job:
Processes hourly trading data by fetching new timestamps, retrieving pricing data from the RuneScape Wiki API (1-hour interval), and saving the results as Parquet files in GCS.
5-Minute Job:
Processes data at 5-minute intervals. Similar to the hourly job, it detects new timestamps, fetches 5-minute pricing data, and saves it as Parquet files.
24-Hour Job:
Processes daily data using 24-hour timestamps. This job fetches and processes data for a longer historical period, then consolidates and saves the data.
RSS Feed Job:
Processes news data by fetching the latest RSS feed from RuneScape News, transforming the XML feed into structured data, deduplicating entries, and updating a Parquet file in GCS.

Each job variant follows a similar pattern:

Extract: Determine the relevant time intervals (timestamps) for which data is needed.
Transform: Fetch data from the API or RSS feed, clean and standardize the data (e.g., filling missing values, zeroing nulls), and combine multiple data sources if needed.
Load: Save the processed data as Parquet files to a designated GCS bucket (e.g., osrs-trading-app.appspot.com).

Technology Stack

Python & Flask:
All ETL jobs are written in Python using Flask to expose a /run-job endpoint for triggering the pipeline.
Pandas & PyArrow:
For data manipulation and conversion to Parquet format.
Google Cloud Storage:
Used for storing the processed Parquet files.
Google Cloud Run & Cloud Scheduler:
The jobs are containerized and deployed on Cloud Run, and Cloud Scheduler triggers the jobs at the desired intervals.
Cloud Build:
Deployment is automated via a cloudbuild.yaml file that builds, pushes, and deploys the container image.

Deployment

Build and Deploy

Cloud Build:
Use the following command to build and deploy the 1-hour job (similar steps apply for other job variants):
```
gcloud builds submit --config=cloudbuild.yaml
```
The build process will:
- Build the container image.
- Push the image to Google Container Registry.
- Deploy the image to Cloud Run with the name update-1h-data in the us-central1 region.
Cloud Scheduler:
Set up Cloud Scheduler jobs to trigger the /run-job endpoint of the respective Cloud Run services at the desired intervals (e.g., every hour, every 5 minutes, daily).
IAM Policy:
The provided policy.yaml file configures public invocation of the Cloud Run service (roles/run.invoker) so that Cloud Scheduler can trigger it.

Local Development

Install Dependencies:
```
pip install -r requirements.txt
```
Run Locally:

Start the Flask application locally to test the job endpoints:
```
python job.py  # or any of the job variant scripts
```
Then, trigger the job by sending a POST request to http://localhost:8080/run-job.

Contributing

Contributions are welcome! If you have suggestions or improvements, please open an issue or submit a pull request.

License

This project is open source and available under the MIT License. Please note that the ETL pipeline is provided for educational purposes and should not be used to violate any data usage policies.

📬 Contact & Support

📧 Email: seer@runetick.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cron-1h-data		cron-1h-data
cron-24h-data		cron-24h-data
cron-5m-data		cron-5m-data
cron-6h-data		cron-6h-data
cron-news-data		cron-news-data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OSRS Trading Data ETL Pipeline

Job Variants

Technology Stack

Deployment

Build and Deploy

Local Development

Contributing

License

📬 Contact & Support

About

Uh oh!

Uh oh!

Languages

License

iron-hope-shop/runetick-v1-etl-portfolio

Folders and files

Latest commit

History

Repository files navigation

OSRS Trading Data ETL Pipeline

Job Variants

Technology Stack

Deployment

Build and Deploy

Local Development

Contributing

License

📬 Contact & Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages