CINECA-datasets-graphs

Overview

This repository contains two programs for working with the M100 dataset:

Signal Processing — Reads raw sensor data, detects significant state changes, and outputs new parquet files.
Graph Processing — Reads processed state files and prepares them for graph-based analysis.

Graph Processing

How to Use

Navigate to the GraphProcessing directory.
Place your .parquet file into the StateFiles directory.
- If the directory does not exist, create it.
- You can use either:
  - Parquet files modified by the Signal Processing step (described below), or
  - The original dataset files.
By default, the program reads the file named state.parquet.
- To change this, modify the state_file variable at the top of the main function in run_pipeline.py.

Running

# (Optional) Create a virtual environment
pip install -r requirements.txt
python -m run_pipeline

How it works

Read and emit

Pushes rows from the specified file, to the buffer in the format: {node: {node, timestamp, rack_id, metric1, metric2,...},... }

Signal processing

How to Use

Navigate to the SignalProcessing directory.
Create a directory named TarFiles.
- Place your .tar files inside.
- Extract them so they become regular folders containing the .parquet files.
- Dataset link: Zenodo Record
Create two more directories:
- outputs — will store processed parquet files.
- logs — will store logs.

Configuration

At the top of the run function in run_pipeline.py, you can adjust:

limit_racks (int | None) — Process only a specified rack by ID. None means process all racks as one.
limit_nodes (int | None) — Limit the number of nodes for faster testing.
delta (float, 0–1) — Sensitivity of change detection (ADWIN parameter).
clock (int) — Frequency of checks (ADWIN parameter).
rows_in_mem (int) — Number of rows loaded into memory at once.
bq_max_size — Default is 2 * rows_in_mem, number of items, each queue can hold

At the bottom of the file, there is also an option to change which rack you process, and an option to process all of them in parallel. See limit_racks, at the bottom of the file.

Running

# (Optional) Create a virtual environment
pip install -r requirements.txt
python -m run_pipeline

How it works

File Reading

node_manager reads parquet files from the TarFiles directory for the specified rack.
It processes data in batches and pushes each batch into a queue.
Each queue element contains one row of synchronized sensor readings for all nodes at a given timestamp.
If a node has no data for a timestamp:
- Readings are set to None, or
- If available, replaced with the last known reading.

Format of a queue element:

{
	node_id:{
	'timestamp': to_json_serializable_timestamp(self.current_time),
	'rack_id': str,
	'sensor_data': {'<metric name>': float}
	}
}

Outputs

The program outputs one parquet file for the rack, into the outputs directory.
row format: node, timestamp, rack_id, metric1, metric2, ..., metric_k
Each row contains a significant state for one of the nodes. The rows should be sorted by timestamp.

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
GNN		GNN
GraphCreation/MyCode_rj		GraphCreation/MyCode_rj
SignalProcessing		SignalProcessing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CINECA-datasets-graphs

Overview

Graph Processing

How to Use

Running

How it works

Read and emit

Signal processing

How to Use

Configuration

Running

How it works

File Reading

Outputs

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

graph-massivizer/CINECA-datasets-graphs

Folders and files

Latest commit

History

Repository files navigation

CINECA-datasets-graphs

Overview

Graph Processing

How to Use

Running

How it works

Read and emit

Signal processing

How to Use

Configuration

Running

How it works

File Reading

Outputs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages