Skip to content

End-to-end streaming flight data pipeline on Microsoft Fabric: real-time ingestion with Eventstream, dual sinks to Lakehouse & Eventhouse, star-schema transforms, incremental loads into Warehouse, semantic modeling, and both live & historical dashboards.

Notifications You must be signed in to change notification settings

lkv971/fabric-realtime-flight-ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 

Repository files navigation

Real-Time Flight Operations Analytics

End-to-end streaming flight data pipeline on Microsoft Fabric: real-time ingestion with Eventstream, dual sinks to Lakehouse & Eventhouse, star-schema transforms, incremental loads into Warehouse, semantic modeling, and both live & historical dashboards.

Description

This repository implements a real-time streaming analytics solution for flight operations using Microsoft Fabric. It demonstrates:

  • Real-time ingestion of flight data from an external API into an Eventstream.
  • Field mapping and routing to simultaneously sink events into a Lakehouse (LH_Flights) and an Eventhouse (EH_Flights) backed by a KQL database.
  • Batch transformations to build a star schema in the Lakehouse via PySpark.
  • Incremental, watermark-driven loads from Lakehouse to Warehouse (WH_Flights) orchestrated by a Fabric pipeline.
  • Custom semantic model (SM_Flights) with DAX measures and hierarchies.
  • Delivery of insights through a live KQL-based dashboard and a Power BI report.

Key Features

  • Real-Time Ingestion: Notebook NB_Flights fetches data from the flight API and streams JSON to ES_Flights.
  • Eventstream (ES_Flights): Applies manageFields transformations and routes to:
    • Lakehouse: Raw Delta table flights in LH_Flights (created via createLakehouseTable).
    • Eventhouse: Table in KQL database EH_Flights powering real-time dashboards.
  • Automated Pipeline (PL_Flights):
  • Star Schema Transform: Notebook Transform_Flights reads Delta flights and generates:
    • Dimensions: Airlines, Airports, Dates.
    • Fact table: fact_flights.
  • Incremental Load: Notebook GetLakehouseWatermark and pipeline PL_Flights perform:
    • Lookup of max IngestedAt.
    • Copy & Script activities to upsert new records into WH_Flights.
  • Semantic Modeling: SM_Flights defines measures, display folders, and hierarchies over Warehouse tables.
  • Visualization:
    • Live Dashboard: FlightOpsCenter in Fabric using pinned KQL querysets on EH_Flights.
    • Power BI Report: FlightsOpsCenter.pbix built on SM_Flights.

Folder Structure

fabric-realtime-flight-ops/
├── chore/                     # Fabric artifact provisioning
│   ├── EH_Flights/            # Eventhouse & KQL DB definitions
│   ├── KQL_Querysets/         # Saved KQL queries for dashboards
│   ├── LH_Flights/            # Lakehouse Delta table definitions
│   ├── WH_Flights/            # Warehouse table & view scripts
│   ├── SM_Flights/            # Semantic model metadata
│   └── createLakehouseTable.ipynb
├── orchestration/             # Data ingestion & processing
│   ├── NB_Flights.ipynb       # API ingestion to Eventstream
│   ├── ES_Flights/            # Eventstream manageFields config
│   ├── Transform_Flights.ipynb# PySpark star-schema transforms
│   ├── GetLakehouseWatermark.ipynb  
│   └── PL_Flights/            # Pipeline JSON definitions
└── delivery/                  # Reporting artifacts
    ├── FlightOpsCenter_Dashboard/
    │   └── FlightOpsCenter    # KQL-based real-time dashboard
    └── FlightsOpsCenter.pbix  # Power BI report

Assets: screenshots and GIFs live in docs/assets/.

Prerequisites

  • Microsoft Fabric workspace with:
    • Lakehouses, Warehouses, Pipelines, Notebooks, Eventstream & Eventhouse enabled.
    • KQL database support.
  • API endpoint & credentials for flight data.
  • Power BI Desktop for editing & publishing .pbix files.

Getting Started

  1. Clone Repository
    git clone https://github.com/your-org/fabric-realtime-flight-ops.git
    cd fabric-realtime-flight-ops
  2. Git Integration
    • Connect your Fabric workspace to this repo under Settings → Git integration.
  3. Configure Secrets
    • In Fabric Studio → Manage → Secrets, add your flight API credentials.
  4. Publish Artifacts
    • In the Develop tab, publish all folders: chore, orchestration, delivery.
  5. Set Up Pipeline Trigger
    • Edit the PL_Flights trigger: schedule at your desired interval (e.g., every minute).
  6. Monitor & Visualize
    • In OrchestratePipeline runs, watch PL_Flights.
    • View live dashboard under MonitorReportsFlightOpsCenter.
    • Open the Power BI report in delivery or in Power BI Service.

Contributing

Contributions welcome! Please:

  1. Fork & branch: feature/<name> or fix/<issue>.
  2. Add or update assets in the appropriate folder.
  3. Commit & push, then open a Pull Request against main.

License

MIT License.

About

End-to-end streaming flight data pipeline on Microsoft Fabric: real-time ingestion with Eventstream, dual sinks to Lakehouse & Eventhouse, star-schema transforms, incremental loads into Warehouse, semantic modeling, and both live & historical dashboards.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages