feat: enhance subscription management with metrics and reconnection #621

thlorenz · 2025-11-06T21:51:28Z

Summary by CodeRabbit

New Features
- Subscription metrics and new account-fetch counters (success/failed/found/not-found)
- Automatic pub/sub reconnection with health checks and subscription resync
- Test helpers and an integration helper to fetch monitored-account metrics remotely
Bug Fixes
- More robust eviction/subscription flow with rollback on upstream failures
- Improved undelegation detection, counting, and clearer logs
Improvements
- Config option to enable/disable subscription metrics and refined LRU subscription management
- Exposed inspection helpers for subscription cache and metrics; expanded tests for large-scale subscriptions

Enhances subscription management and reliability by adding metrics, and robust reconnection logic with automatic resubscription.

Details

Subscription Management Improvements

Atomic Subscribe/Unsubscribe: Improved subscribe logic to handle evictions and rollbacks atomically, preventing race conditions between LRU cache and pubsub client
Enhanced Error Handling: Better error propagation and logging for subscription failures

Metrics and Monitoring

Subscription Metrics: Added subscription_count method to ChainPubsubClient trait for tracking active subscriptions across all clients
Monitored Accounts Gauge: Updated metrics to accurately report monitored accounts count
Configurable Metrics: Added metrics enable/disable flag to RemoteAccountProviderConfig

Reconnection and Reliability

Reconnection Logic: Replaced periodic connection recycling with event-driven reconnection using the new ReconnectableClient trait
Fibonacci Backoff: Implemented exponential backoff with fibonacci delays for reconnection attempts (max 10 minutes)
Automatic Resubscription: Clients automatically resubscribe to existing subscriptions after successful reconnection
Improved SubMuxClient: Simplified constructor and removed deprecated recycling logic

Testing and Integration

Integration Test: Added 07_subscription_limits.rs to test large-scale subscription scenarios (400 accounts)
Reconnection Tests: Comprehensive tests for reconnection logic with failed resubscription recovery
Updated Test Configurations: All tests now use metrics-enabled configurations

Code Quality

Trait Refactoring: Introduced ReconnectableClient trait for better abstraction
Logging Improvements: Enhanced debug/trace logging throughout subscription and reconnection flows
Dependency Updates: Added ureq and url for metrics fetching in integration tests

…ition

…le loop

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77c33e1 and 2cea109.

📒 Files selected for processing (1)

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: run_make_ci_test
GitHub Check: run_make_ci_lint
GitHub Check: Build Project

🔇 Additional comments (1)

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)

69-69: Verify the intent and scope of blacklisting the SPL Token program.

The implementation correctly adds the SPL Token program to the blacklist. However, please confirm:

This blacklists only the Token program account itself (TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA), not individual token accounts created by the program.

The Token-2022/Token Extensions program (TokenzQdBNbLqP5VEhdkAS6EPFLC1PHnBqCXEpPxuEb) remains unblacklisted—is this intentional?

What subscription management issue prompted this addition?

coderabbitai · 2025-11-10T16:43:18Z

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs

+    const NATIVE_TOKEN_PROGRAM_ID: Pubkey =
+        solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");


🧹 Nitpick | 🔵 Trivial

Consider adding a comment explaining the blacklisting rationale.

Similar to NATIVE_SOL_ID (lines 10-11), consider adding a comment explaining why the SPL Token program account is being blacklisted. This helps future maintainers understand the intent.

Example:

+ // The SPL Token program account should not be monitored directly + // (individual token accounts are fine, but the program account itself is blacklisted) const NATIVE_TOKEN_PROGRAM_ID: Pubkey = solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const NATIVE_TOKEN_PROGRAM_ID: Pubkey =

solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");

// The SPL Token program account should not be monitored directly

// (individual token accounts are fine, but the program account itself is blacklisted)

const NATIVE_TOKEN_PROGRAM_ID: Pubkey =

solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");

🤖 Prompt for AI Agents

In magicblock-chainlink/src/chainlink/blacklisted_accounts.rs around lines 51 to 52, add a concise comment above NATIVE_TOKEN_PROGRAM_ID explaining why the SPL Token program account is blacklisted (for example: to prevent token-program-driven account creation/interaction or to avoid treating the token program as a user-owned account in checks), mirroring the style and level of detail used for NATIVE_SOL_ID on lines 10–11; keep the comment short, action-oriented, and focused on intent for future maintainers.

* master: fix: fetching CUs for metrics (#623)

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cea109 and 917f177.

📒 Files selected for processing (2)

magicblock-api/src/tickers.rs (1 hunks)
magicblock-metrics/src/service.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

📚 Learning: 2025-11-07T14:20:31.457Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

magicblock-metrics/src/service.rs

🧬 Code graph analysis (1)

magicblock-api/src/tickers.rs (2)

magicblock-api/src/magic_validator.rs (1)

ledger (735-737)

magicblock-metrics/src/metrics/mod.rs (1)

set_ledger_size (297-299)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build Project
GitHub Check: run_make_ci_test
GitHub Check: run_make_ci_lint

🔇 Additional comments (1)

magicblock-api/src/tickers.rs (1)

111-130: LGTM! Clean implementation with proper cancellation handling.

The simplified implementation correctly:

Handles ledger storage size retrieval errors with appropriate logging

Uses tokio::select! for graceful cancellation

Clones the Arc<Ledger> before moving into the async block

coderabbitai · 2025-11-13T12:21:29Z

magicblock-api/src/tickers.rs

 #[allow(unused_variables)]
 pub fn init_system_metrics_ticker(
    tick_duration: Duration,
    ledger: &Arc<Ledger>,
    accountsdb: &Arc<AccountsDb>,
    token: CancellationToken,
 ) -> tokio::task::JoinHandle<()> {


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the unused accountsdb parameter is intentional.

The accountsdb parameter is no longer used in the simplified implementation, yet it remains in the function signature. While the #[allow(unused_variables)] attribute acknowledges this, keeping unused parameters can mislead callers about the function's behavior.

Please confirm:

Is this parameter kept for API compatibility during a transition period?

Should it be removed in a future breaking change?

Should the function documentation explicitly note that accountsdb is currently unused but kept for compatibility?

To verify the impact of removing this parameter, run:

🏁 Script executed:

#!/bin/bash # Search for all call sites of init_system_metrics_ticker rg -n -A5 -B2 'init_system_metrics_ticker\s*\(' --type rust

Length of output: 1143

--- 🏁 Script executed: ```shell #!/bin/bash # Get the complete function implementation to see if accountsdb is used sed -n '104,130p' magicblock-api/src/tickers.rs

Length of output: 838

🏁 Script executed:

#!/bin/bash # Search comprehensively for all references to init_system_metrics_ticker (function calls only, not definition) rg -n 'init_system_metrics_ticker' --type rust | grep -v '^\s*pub fn init_system_metrics_ticker'

Length of output: 353

🏁 Script executed:

#!/bin/bash # Check the call site in magic_validator.rs to see the full context sed -n '220,235p' magicblock-api/src/magic_validator.rs

Length of output: 466

Remove the unused accountsdb parameter from function signature and call site.

The accountsdb parameter is declared but never used in the function implementation. The #[allow(unused_variables)] attribute is suppressing a compiler warning that indicates incomplete refactoring.

Required fixes:

Remove accountsdb: &Arc<AccountsDb>, from function signature at magicblock-api/src/tickers.rs:108

Remove &accountsdb, from the function call at magicblock-api/src/magic_validator.rs:227

🤖 Prompt for AI Agents

In magicblock-api/src/tickers.rs around lines 104–110, the function init_system_metrics_ticker declares an unused parameter accountsdb and silences the warning with #[allow(unused_variables)]; remove the accountsdb: &Arc<AccountsDb>, parameter from the function signature and delete the #[allow(unused_variables)] attribute, then update all call sites (specifically remove the &accountsdb, argument at magicblock-api/src/magic_validator.rs:227) so the function is invoked with the remaining parameters; also remove any now-unused AccountsDb imports or bindings to keep the code clean.

coderabbitai · 2025-11-13T12:21:29Z

magicblock-api/src/tickers.rs

+/*
+#[allow(unused_variables)]
+pub fn init_system_metrics_ticker_old(
+    tick_duration: Duration,
+    ledger: &Arc<Ledger>,
+    accountsdb: &Arc<AccountsDb>,
+    token: CancellationToken,
+) -> tokio::task::JoinHandle<()> {
+    fn try_set_ledger_counts(ledger: &Ledger) {
+        macro_rules! try_set_ledger_count {
+            ($name:ident) => {
+                paste::paste! {
+                    match ledger.[< count_ $name >]() {
+                        Ok(count) => {
+                            metrics::[< set_ledger_ $name _count >](count);
+                        }
+                        Err(err) => warn!(
+                            "Failed to get ledger {} count: {:?}",
+                            stringify!($name),
+                            err
+                        ),
+                    }
+                }
+            };
+        }
+        try_set_ledger_count!(block_times);
+        try_set_ledger_count!(blockhashes);
+        try_set_ledger_count!(slot_signatures);
+        try_set_ledger_count!(address_signatures);
+        try_set_ledger_count!(transaction_status);
+        try_set_ledger_count!(transaction_successful_status);
+        try_set_ledger_count!(transaction_failed_status);
+        try_set_ledger_count!(transactions);
+        try_set_ledger_count!(transaction_memos);
+        try_set_ledger_count!(perf_samples);
+        try_set_ledger_count!(account_mod_data);
+    }
+
+    fn try_set_ledger_storage_size(ledger: &Ledger) {
+        match ledger.storage_size() {
+            Ok(byte_size) => metrics::set_ledger_size(byte_size),
+            Err(err) => warn!("Failed to get ledger storage size: {:?}", err),
+        }
+    }
+    fn set_accounts_storage_size(bank: &Bank) {
+        let byte_size = bank.accounts_db_storage_size();
+        metrics::set_accounts_size(byte_size);
+    }
+    fn set_accounts_count(bank: &Bank) {
+        metrics::set_accounts_count(bank.accounts_db.get_accounts_count());
+    }
+
+    let ledger = ledger.clone();
+    let bank = bank.clone();
+    tokio::task::spawn(async move {
+        loop {
+            tokio::select! {
+                _ = tokio::time::sleep(tick_duration) => {
+                    try_set_ledger_storage_size(&ledger);
+                    set_accounts_storage_size(&bank);
+                    try_set_ledger_counts(&ledger);
+                    set_accounts_count(&bank);
+                },
+                _ = token.cancelled() => {
+                    break;
+                }
+            }
+        }
+    })

    tokio::task::spawn(async move {})
 }
+*/


🧹 Nitpick | 🔵 Trivial

Consider removing the commented-out old implementation.

The 72-line commented block preserves the previous multi-metric implementation. While this may serve as temporary reference during the transition, large commented sections can clutter the codebase and create maintenance overhead.

Recommendations:

If this is a temporary reference during a transition period, add a TODO comment with a timeline for removal (e.g., "TODO: Remove after v2.0 release").

If the old implementation may be needed for rollback, consider preserving it in a separate branch or commit rather than in comments.

If permanently removed, delete the commented code—Git history will preserve it if needed for future reference.

🤖 Prompt for AI Agents

In magicblock-api/src/tickers.rs around lines 132 to 204, there is a 72-line commented-out old implementation (init_system_metrics_ticker_old) cluttering the file; remove the large commented block or replace it with a short TODO note. If you need a temporary reference during the transition, replace the commented code with a single-line TODO including a removal timeline (e.g., "TODO: remove after vX.Y") or move the old implementation to a separate commit/branch, otherwise delete the commented block entirely so Git history retains the prior code.

coderabbitai · 2025-11-13T12:21:29Z

magicblock-metrics/src/service.rs

+    // We must consume the body fully to keep the connection alive. We
+    // iterate over all chunks and simply drop them. This prevents garbage
+    // data of previous requests from being stuck in connection buffer.
+    let mut body = req.into_body();
+    while (body.frame().await).is_some() {}
+
+    result


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Add timeout and error handling to body draining loop.

The body draining logic has the following concerns:

DoS vulnerability: The loop has no timeout, allowing a malicious or slow client to tie up the server indefinitely by sending a large body slowly. Consider adding a timeout (e.g., using tokio::time::timeout) or a maximum frame count limit.

Silent error handling: Errors from body.frame().await are silently ignored. At minimum, log errors for debugging purposes.

Performance: The body is drained for every request, including GET requests that typically have no body. While the overhead is minimal for empty bodies, consider checking req.headers().get("content-length") first to skip draining when there's no body.

Apply this diff to add timeout and error logging:

- // We must consume the body fully to keep the connection alive. We - // iterate over all chunks and simply drop them. This prevents garbage - // data of previous requests from being stuck in connection buffer. - let mut body = req.into_body(); - while (body.frame().await).is_some() {} + // We must consume the body fully to keep the connection alive. We + // iterate over all chunks and simply drop them. This prevents garbage + // data of previous requests from being stuck in connection buffer. + let mut body = req.into_body(); + let drain_timeout = tokio::time::Duration::from_secs(30); + match tokio::time::timeout(drain_timeout, async { + while let Some(frame_result) = body.frame().await { + if let Err(e) = frame_result { + warn!("Error draining request body: {:?}", e); + } + } + }) + .await + { + Err(_) => warn!("Timeout draining request body"), + Ok(_) => {} + }

Additionally, verify whether this body draining is necessary with the current version of Hyper. Modern versions typically handle request body consumption automatically for keep-alive connections:

🌐 Web query:

Does Hyper automatically consume request bodies for HTTP/1.1 keep-alive connections?

💡 Result:

Short answer: No. Hyper does not implicitly drain/consume an unread request body for you — if you want the TCP connection to be reused for HTTP/1.1 keep‑alive you must read (or explicitly discard) the request body before returning a response, or configure your server/connection behavior accordingly. See hyper's docs on request bodies and collecting them, and the HTTP/1.1 messaging guidance that servers should read or discard a body if they intend to keep a connection open. [1][2][3]

Sources:

hyper guide: request/response bodies and collecting the request body. [1]

hyper server/conn::Http options (http1_keep_alive behavior). [2]

HTTP messaging spec: servers SHOULD read/discard request bodies if they will keep the transport open (HTTP/1.1 guidance). [3]

Add timeout and error handling to body draining loop to prevent DoS and improve observability.

The body draining logic correctly implements HTTP/1.1 keep-alive support, as Hyper requires explicit body consumption before reusing the connection. However, the implementation needs hardening:

DoS vulnerability: The loop has no timeout, allowing a malicious or slow client to tie up the server indefinitely by sending a large body slowly. Add a timeout (e.g., tokio::time::timeout) or a maximum frame count limit.

Silent error handling: Errors from body.frame().await are silently ignored, making debugging difficult. At minimum, log errors.

Performance optimization (optional): Skip body draining for requests without a body by checking req.headers().get("content-length") first.

Apply this diff to add timeout and error logging:

- // We must consume the body fully to keep the connection alive. We - // iterate over all chunks and simply drop them. This prevents garbage - // data of previous requests from being stuck in connection buffer. - let mut body = req.into_body(); - while (body.frame().await).is_some() {} + // We must consume the body fully to keep the connection alive. We + // iterate over all chunks and simply drop them. This prevents garbage + // data of previous requests from being stuck in connection buffer. + let mut body = req.into_body(); + let drain_timeout = tokio::time::Duration::from_secs(30); + match tokio::time::timeout(drain_timeout, async { + while let Some(frame_result) = body.frame().await { + if let Err(e) = frame_result { + warn!("Error draining request body: {:?}", e); + } + } + }) + .await + { + Err(_) => warn!("Timeout draining request body"), + Ok(_) => {} + }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In magicblock-metrics/src/service.rs around lines 131 to 137, the body-draining loop needs hardening: wrap the await in a tokio::time::timeout (e.g., a few seconds) to avoid slow-client DoS, track a maximum frame count (e.g., a reasonable upper bound) to prevent infinite loops, and check whether the request actually has a body first (inspect Content-Length or absence of transfer-encoding) to skip draining when unnecessary. On timeout or any Err from body.frame().await, log the error with context and break the loop; on Ok(None) stop normally. Ensure the timeout and frame count are configurable constants and that logging includes request id/context for observability.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

test-integration/test-schedule-intent/tests/test_schedule_intents.rs (1)

128-141: Add verification after second undelegation at line 140.

The test verifies undelegation after the first schedule_intent (line 128), but the second schedule_intent at line 132 with Some(vec![102]) also triggers undelegation since any Some(...) value in counter_diffs sets is_undelegate = true. For consistency and completeness, add verify_undelegation_in_ephem_via_owner(&[payer.pubkey()], &ctx); after the assertion block ends at line 140.
magicblock-chainlink/src/remote_account_provider/mod.rs (2)
911-970: Retry scope for ForUser errors could be narrower.

The code retries on all RpcError::ForUser errors (line 930), but the comment (lines 915-919) suggests only the "AccountNotFound: Minimum context slot has not been reached" case requires retries. Other ForUser errors (authentication failures, rate limits, etc.) won't resolve with retries and waste retry budget.

This is acceptable for a background task with limited retries, but consider narrowing the retry condition:
 RpcError::ForUser(ref rpc_user_err) => {
-    // When an account is not present for the desired
-    // min-context slot then we normally get the below
-    // handled `RpcResponseError`, but may also get the
-    // following error from the RPC.
-    // See test::ixtest_existing_account_for_future_slot
-    // ```
-    // RpcError(
-    //   ForUser(
-    //       "AccountNotFound: \
-    // pubkey=DaeruQ4SukTQaJA5muyv51MQZok7oaCAF8fAW19mbJv5: \
-    //        RPC response error -32016: \
-    //        Minimum context slot has not been reached; ",
-    //   ),
-    // )
-    // ```
-    retry!("Fetching accounts failed: {rpc_user_err:?}");
+    // Retry only for context slot issues
+    if rpc_user_err.contains("Minimum context slot") 
+        || rpc_user_err.contains("AccountNotFound") {
+        retry!("Fetching accounts failed: {rpc_user_err:?}");
+    } else {
+        let err_msg = format!(
+            "RpcError fetching accounts {}: ForUser({rpc_user_err})", 
+            pubkeys_str(&pubkeys)
+        );
+        notify_error(&err_msg);
+        return;
+    }
 }
728-758: Subscription leak on eviction failure—reinsertion of evicted account required for rollback consistency.

When pubsub_client.unsubscribe(evicted) fails at line 738, the code correctly removes pubkey from the LRU cache to prevent subscription of the new account. However, evicted was already removed from the LRU by add(*pubkey) on line 734 and is not reinserted during rollback.

This creates a mismatch: evicted remains subscribed in the pubsub client but is no longer tracked in the LRU cache, causing a subscription leak. The diagnostic code at lines 223–254 will detect and warn about this inconsistency.

Restore the pre-add state by reinserting evicted:
 if let Err(err) = self.pubsub_client.unsubscribe(evicted).await {
     warn!(
         "Failed to unsubscribe from pubsub for evicted account {evicted}: {err:?}");
     // Rollback the LRU add since eviction failed
     self.lrucache_subscribed_accounts.remove(pubkey);
+    // Restore evicted to maintain consistency between LRU and pubsub
+    self.lrucache_subscribed_accounts.add(evicted);
     return Err(err);
 }
Since pubkey was just removed, capacity is available and reinserting evicted will not trigger another eviction.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 917f177 and 0b1b245.

📒 Files selected for processing (3)

magicblock-chainlink/src/chainlink/mod.rs (5 hunks)
magicblock-chainlink/src/remote_account_provider/mod.rs (28 hunks)
test-integration/test-schedule-intent/tests/test_schedule_intents.rs (5 hunks)

🧰 Additional context used

🧠 Learnings (7)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

📚 Learning: 2025-11-07T14:20:31.457Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/remote_account_provider/mod.rs

📚 Learning: 2025-10-21T14:00:54.642Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/remote_account_provider/mod.rs

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/remote_account_provider/mod.rs

📚 Learning: 2025-10-14T09:56:14.047Z

Learnt from: taco-paco
Repo: magicblock-labs/magicblock-validator PR: 564
File: test-integration/programs/flexi-counter/src/processor/call_handler.rs:122-125
Timestamp: 2025-10-14T09:56:14.047Z
Learning: The file test-integration/programs/flexi-counter/src/processor/call_handler.rs contains a test smart contract used for integration testing, not production code.

Applied to files:

test-integration/test-schedule-intent/tests/test_schedule_intents.rs

📚 Learning: 2025-11-07T13:09:52.253Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: test-kit/src/lib.rs:275-0
Timestamp: 2025-11-07T13:09:52.253Z
Learning: In test-kit, the transaction scheduler in ExecutionTestEnv is not expected to shut down during tests. Therefore, using `.unwrap()` in test helper methods like `schedule_transaction` is acceptable and will not cause issues in the test environment.

Applied to files:

test-integration/test-schedule-intent/tests/test_schedule_intents.rs

📚 Learning: 2025-10-26T16:53:29.820Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.

Applied to files:

magicblock-chainlink/src/remote_account_provider/mod.rs

🧬 Code graph analysis (3)

magicblock-chainlink/src/chainlink/mod.rs (2)

magicblock-chainlink/src/chainlink/fetch_cloner.rs (1)

new (134-158)

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)

blacklisted_accounts (6-30)

test-integration/test-schedule-intent/tests/test_schedule_intents.rs (1)

test-integration/programs/flexi-counter/src/state.rs (1)

pda (32-35)

magicblock-chainlink/src/remote_account_provider/mod.rs (6)

magicblock-metrics/src/metrics/mod.rs (5)

inc_account_fetches_failed (399-401)

inc_account_fetches_found (403-405)

inc_account_fetches_not_found (407-409)

inc_account_fetches_success (395-397)

set_monitored_accounts_count (357-359)

magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs (1)

new (126-158)

magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs (3)

new (46-54)

new (311-322)

try_new_from_url (165-180)

magicblock-chainlink/src/submux/mod.rs (1)

new (147-158)

magicblock-chainlink/src/remote_account_provider/lru_cache.rs (4)

new (25-34)

pubkeys (38-41)

pubkeys (129-135)

len (117-123)

magicblock-chainlink/src/remote_account_provider/config.rs (2)

default (65-72)

try_new_with_metrics (27-42)

🔇 Additional comments (16)

test-integration/test-schedule-intent/tests/test_schedule_intents.rs (3)

66-69: LGTM! Verification helper appropriately placed.

The undelegation verification is correctly positioned after the schedule_intent operation. Minor observation: line 247 includes a debug log after verification, but this call site doesn't—consider adding one for consistency with other tests.

193-196: LGTM! Multi-account verification correctly implemented.

The verification helper is properly invoked with multiple pubkeys, correctly mapping from the payers vector. This ensures all undelegated accounts are verified in the ephemeral layer.

243-247: LGTM! Good example of verification with logging.

This call site demonstrates the complete pattern with a debug log after verification. This pattern could be applied to other call sites (lines 69, 128) for consistency.

magicblock-chainlink/src/chainlink/mod.rs (4)

1-4: LGTM!

The atomic imports are appropriate for the new counter-based account removal tracking in reset_accounts_bank.

139-191: LGTM!

The refactored account removal logic with atomic counters provides clear telemetry. The corrected non_empty calculation using saturating_sub (from previous fixes) properly computes non-empty accounts without the fetch_sub pitfall.

317-338: LGTM!

The enhanced logging correctly gates on Trace level and uses the trace!() macro consistently. The mark_empty_if_not_found details provide valuable debugging context for account fetching.

361-377: LGTM!

Upgrading undelegation logging from trace!() to debug!() is appropriate for these important operational events, making them visible without requiring trace-level verbosity.

magicblock-chainlink/src/remote_account_provider/mod.rs (9)

1-74: LGTM!

The new imports support the enhanced functionality: Entry API for atomic fetch operations, metrics integration, and background task management. The FetchResult type alias improves code clarity, and the Helius-specific error code constant documents a provider-specific variation.

75-109: LGTM!

The field rename to lrucache_subscribed_accounts is more descriptive, and the _active_subscriptions_task_handle properly maintains the background task lifetime through Drop semantics.

272-330: LGTM!

The constructor properly initializes the LRU cache and conditionally starts the metrics updater based on configuration. The NonZeroUsize construction is safe due to the config validation invariant documented in the comment.

332-377: LGTM!

The per-endpoint abort channels enable independent lifecycle management for each pubsub client, supporting the event-driven reconnection strategy mentioned in the PR objectives.

610-706: LGTM!

The Entry API usage (lines 634-643) correctly appends to existing waiter lists instead of replacing them, fixing the concurrent fetch issue from previous reviews. The nested Result handling for FetchResult is appropriate for the channel-based architecture.

708-725: LGTM!

The subscription setup cleanly delegates to the subscribe method for each pubkey, which properly handles deduplication via is_watching checks.

797-825: LGTM!

The unsubscribe logic correctly maintains consistency between the LRU cache and pubsub client by only removing from the LRU after successful pubsub unsubscription (lines 809-813). The early return for non-tracked pubkeys (lines 801-807) prevents spurious errors.

849-886: LGTM!

The notify_error helper and retry! macro cleanly encapsulate error handling and retry logic, properly integrating metrics and notifying all waiting requests on failure.

976-1055: LGTM!

The metrics integration properly tracks found/not-found counts and updates gauges after successful fetches. The handling of mark_empty_if_not_found accounts (lines 991-1004) correctly distinguishes between truly missing accounts and those explicitly marked as empty.

coderabbitai · 2025-11-13T13:54:47Z

magicblock-chainlink/src/remote_account_provider/mod.rs

+    /// Creates a background task that periodically updates the active subscriptions gauge
+    fn start_active_subscriptions_updater<PubsubClient: ChainPubsubClient>(
+        subscribed_accounts: Arc<AccountsLruCache>,
+        pubsub_client: Arc<PubsubClient>,
+    ) -> task::JoinHandle<()> {
+        task::spawn(async move {
+            let mut interval = time::interval(Duration::from_millis(
+                ACTIVE_SUBSCRIPTIONS_UPDATE_INTERVAL_MS,
+            ));
+            let never_evicted = subscribed_accounts.never_evicted_accounts();
+
+            loop {
+                interval.tick().await;
+                let lru_count = subscribed_accounts.len();
+                let (pubsub_total, pubsub_without_never_evict) = pubsub_client
+                    .subscription_count(Some(&never_evicted))
+                    .await;
+
+                let all_pubsub_subs = if log::log_enabled!(log::Level::Debug) {
+                    pubsub_client.subscriptions()
+                } else {
+                    vec![]
+                };
+                if lru_count != pubsub_without_never_evict {
+                    warn!(
+                        "User account subscription counts LRU cache={} pubsub client={} don't match",
+                        lru_count, pubsub_without_never_evict
+                    );
+                    if log::log_enabled!(log::Level::Debug) {
+                        // Log all pubsub subscriptions for debugging
+                        trace!(
+                            "All pubsub subscriptions: {:?}",
+                            all_pubsub_subs
+                        );
+
+                        // Find extra keys in pubsub that are not in LRU cache
+                        let lru_pubkeys = subscribed_accounts.pubkeys();
+                        let pubsub_subs_without_never_evict: HashSet<_> =
+                            all_pubsub_subs
+                                .iter()
+                                .filter(|pk| !never_evicted.contains(pk))
+                                .copied()
+                                .collect();
+                        let lru_pubkeys_set: HashSet<_> =
+                            lru_pubkeys.into_iter().collect();
+
+                        let extra_in_pubsub: Vec<_> =
+                            pubsub_subs_without_never_evict
+                                .difference(&lru_pubkeys_set)
+                                .cloned()
+                                .collect();
+                        let extra_in_lru: Vec<_> = lru_pubkeys_set
+                            .difference(&pubsub_subs_without_never_evict)
+                            .cloned()
+                            .collect();
+
+                        if !extra_in_pubsub.is_empty() {
+                            debug!("Extra pubkeys in pubsub client not in LRU cache: {:?}", extra_in_pubsub);
+                        }
+                        if !extra_in_lru.is_empty() {
+                            debug!("Extra pubkeys in LRU cache not in pubsub client: {:?}", extra_in_lru);
+                        }
+                    }
+                }
+
+                debug!("Updating active subscriptions: count={}", pubsub_total);
+                trace!("All subscriptions: {}", pubkeys_str(&all_pubsub_subs));
+                set_monitored_accounts_count(pubsub_total);
+            }
+        })
+    }


🧹 Nitpick | 🔵 Trivial

LGTM with optional improvement suggestion.

The background updater provides valuable subscription-count reconciliation. The diagnostic logging (lines 228-262) is properly gated behind debug-level checks to avoid overhead in production.

Optional: Consider graceful shutdown for the metrics task.

The background task runs indefinitely and will be aborted when the provider is dropped. For a metrics task this is acceptable, but you could add a CancellationToken parameter to allow clean shutdown if needed in the future.

fn start_active_subscriptions_updater<PubsubClient: ChainPubsubClient>( subscribed_accounts: Arc<AccountsLruCache>, pubsub_client: Arc<PubsubClient>, + shutdown: tokio_util::sync::CancellationToken, ) -> task::JoinHandle<()> { task::spawn(async move { let mut interval = time::interval(Duration::from_millis( ACTIVE_SUBSCRIPTIONS_UPDATE_INTERVAL_MS, )); let never_evicted = subscribed_accounts.never_evicted_accounts(); loop { - interval.tick().await; + tokio::select! { + _ = interval.tick() => {}, + _ = shutdown.cancelled() => { + debug!("Active subscriptions updater shutting down"); + break; + } + }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

magicblock-chainlink/src/remote_account_provider/mod.rs around lines 200 to 270: the background updater currently runs forever and is only aborted when the provider is dropped; add an explicit CancellationToken parameter to start_active_subscriptions_updater and use it inside the spawned task to stop the loop cleanly (e.g., use tokio::select! to await either interval.tick().await or token.cancelled().await and break/return on cancellation), propagate the token from the caller when spawning the task, and ensure the task exits promptly so the JoinHandle completes gracefully.

coderabbitai · 2025-11-13T13:54:47Z

test-integration/test-schedule-intent/tests/test_schedule_intents.rs

+fn verify_undelegation_in_ephem_via_owner(
+    pubkeys: &[Pubkey],
+    ctx: &IntegrationTestContext,
+) {
+    const RETRY_LIMIT: usize = 20;
+    let mut retries = 0;
+
+    loop {
+        ctx.wait_for_next_slot_ephem().unwrap();
+        let mut not_verified = vec![];
+        for pk in pubkeys.iter() {
+            let counter_pda = FlexiCounter::pda(pk).0;
+            let owner = ctx.fetch_ephem_account_owner(counter_pda).unwrap();
+            if owner == delegation_program_id() {
+                not_verified.push(*pk);
+            }
+        }
+        if not_verified.is_empty() {
+            break;
+        }
+        retries += 1;
+        if retries >= RETRY_LIMIT {
+            panic!(
+                "Failed to verify undelegation for pubkeys: {}",
+                not_verified
+                    .iter()
+                    .map(|k| k.to_string())
+                    .collect::<Vec<_>>()
+                    .join(", ")
+            );
+        }
+    }
+}


🧹 Nitpick | 🔵 Trivial

Consider adding progress logging to aid debugging.

The verification logic is correct: it polls until all accounts' owners are no longer delegation_program_id() or the retry limit is exceeded. The retry limit of 20 and the polling approach are reasonable for test scenarios.

Consider adding debug logging to track verification progress, which would help diagnose issues if the retry limit is reached:

fn verify_undelegation_in_ephem_via_owner( pubkeys: &[Pubkey], ctx: &IntegrationTestContext, ) { const RETRY_LIMIT: usize = 20; let mut retries = 0; loop { ctx.wait_for_next_slot_ephem().unwrap(); let mut not_verified = vec![]; for pk in pubkeys.iter() { let counter_pda = FlexiCounter::pda(pk).0; let owner = ctx.fetch_ephem_account_owner(counter_pda).unwrap(); if owner == delegation_program_id() { not_verified.push(*pk); } } if not_verified.is_empty() { + debug!("✅ Verified undelegation for {} accounts", pubkeys.len()); break; } retries += 1; + debug!( + "⏳ Undelegation verification retry {}/{}: {} accounts remaining", + retries, + RETRY_LIMIT, + not_verified.len() + ); if retries >= RETRY_LIMIT { panic!( "Failed to verify undelegation for pubkeys: {}", not_verified .iter() .map(|k| k.to_string()) .collect::<Vec<_>>() .join(", ") ); } } }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In test-integration/test-schedule-intent/tests/test_schedule_intents.rs around lines 460 to 492, add progress logging inside the polling loop to emit the current retry count and the list of pubkeys still not verified each iteration and a success log when verification completes; specifically, log (e.g., via tracing::info! or println!) the retries and not_verified pubkeys before incrementing retries so failures are easier to diagnose, and optionally log a final confirmation once not_verified.is_empty() before breaking.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b1b245 and 7880a7d.

📒 Files selected for processing (3)

magicblock-chainlink/src/chainlink/fetch_cloner.rs (5 hunks)
magicblock-chainlink/src/chainlink/mod.rs (5 hunks)
magicblock-metrics/src/metrics/mod.rs (9 hunks)

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

📚 Learning: 2025-10-14T09:56:14.047Z

Learnt from: taco-paco
Repo: magicblock-labs/magicblock-validator PR: 564
File: test-integration/programs/flexi-counter/src/processor/call_handler.rs:122-125
Timestamp: 2025-10-14T09:56:14.047Z
Learning: The file test-integration/programs/flexi-counter/src/processor/call_handler.rs contains a test smart contract used for integration testing, not production code.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs

📚 Learning: 2025-10-21T14:00:54.642Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/chainlink/fetch_cloner.rs
magicblock-metrics/src/metrics/mod.rs

📚 Learning: 2025-11-07T14:20:31.457Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/chainlink/fetch_cloner.rs

📚 Learning: 2025-10-26T16:53:29.820Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.

Applied to files:

magicblock-chainlink/src/chainlink/fetch_cloner.rs

🧬 Code graph analysis (2)

magicblock-chainlink/src/chainlink/mod.rs (2)

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)

blacklisted_accounts (6-30)

magicblock-metrics/src/metrics/mod.rs (1)

inc_undelegation_requested (427-429)

magicblock-chainlink/src/chainlink/fetch_cloner.rs (2)

magicblock-metrics/src/metrics/mod.rs (1)

inc_undelegation_completed (431-433)

magicblock-chainlink/src/remote_account_provider/config.rs (1)

try_new_with_metrics (27-42)

coderabbitai · 2025-11-13T14:08:04Z

magicblock-chainlink/src/chainlink/fetch_cloner.rs

+                    if let Some(in_bank) =
+                        self.accounts_bank.get_account(&pubkey)
+                    {
+                        if !in_bank.delegated()
+                            && in_bank.owner().eq(&dlp::id())
+                            && !account.owner().eq(&dlp::id())
+                        {
+                            debug!(
+                                "Undelegation completed for account: {pubkey}"
+                            );
+                            magicblock_metrics::metrics::inc_undelegation_completed();
+                        }


⚠️ Potential issue | 🟠 Major

Fix undelegation completion detection logic.

During cloning we explicitly set delegated accounts to delegated = true and overwrite their owner with the delegation record’s owner (see the set_owner(...).set_delegated(is_delegated_to_us) path in this file). That means a still-delegated account lives in the bank with delegated() == true and owner() != dlp::id(). The new guard checks the exact opposite (!in_bank.delegated() and in_bank.owner() == dlp::id()), so it never trips for the happy path. As a result inc_undelegation_completed() will never run, making the new metric useless. Flip the predicate to look for an entry we previously considered delegated and an update whose owner is no longer dlp::id(), e.g.:

- if !in_bank.delegated() - && in_bank.owner().eq(&dlp::id()) - && !account.owner().eq(&dlp::id()) - { + if in_bank.delegated() + && !account.delegated() + && !account.owner().eq(&dlp::id()) + {

This fires exactly once on the first post-undelegation update and keeps the metric aligned with the new observability story.

🤖 Prompt for AI Agents

In magicblock-chainlink/src/chainlink/fetch_cloner.rs around lines 224 to 235, the undelegation completion check is inverted and never matches the actual cloned state; change the guard to detect an entry that was previously marked delegated in the bank (in_bank.delegated() == true and in_bank.owner() == dlp::id()) and which is now updated so its owner is no longer dlp::id() (i.e. !account.owner().eq(&dlp::id())), then run the debug log and magicblock_metrics::metrics::inc_undelegation_completed() in that branch so the metric fires once on the first post-undelegation update.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7880a7d and 9f8a0b2.

📒 Files selected for processing (1)

magicblock-ledger/src/ledger_truncator.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

📚 Learning: 2025-10-28T13:15:42.706Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 596
File: magicblock-processor/src/scheduler.rs:1-1
Timestamp: 2025-10-28T13:15:42.706Z
Learning: In magicblock-processor, transaction indexes were always set to 0 even before the changes in PR #596. The proper transaction indexing within slots will be addressed during the planned ledger rewrite.

Applied to files:

magicblock-ledger/src/ledger_truncator.rs

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-ledger/src/ledger_truncator.rs

🔇 Additional comments (1)

magicblock-ledger/src/ledger_truncator.rs (1)

206-207: Good use of spawn_blocking to offload heavy deletion work.

Moving RocksDB deletions to a blocking task prevents blocking the Tokio async runtime, which improves responsiveness during large truncation operations.

coderabbitai · 2025-11-13T17:15:29Z

magicblock-ledger/src/ledger_truncator.rs

+        if let Err(err) = delete_handle.await {
+            error!("Ledger delete task cancelled: {err}");
        }

        Self::compact_slot_range(ledger, from_slot, to_slot).await;


⚠️ Potential issue | 🟠 Major

Consider skipping compaction if the deletion task fails.

If the blocking deletion task is cancelled or panics (line 233), the function logs the error but still proceeds to compact (line 237). This means compaction runs even when slot deletions may have failed or only partially completed, which could be wasteful or lead to unexpected ledger state.

Consider returning early if delete_handle.await fails, or at minimum document why compaction should proceed despite deletion failures.

Apply this diff to skip compaction when deletion fails:

- if let Err(err) = delete_handle.await { - error!("Ledger delete task cancelled: {err}"); + match delete_handle.await { + Ok(_) => { + Self::compact_slot_range(ledger, from_slot, to_slot).await; + } + Err(err) => { + error!("Ledger delete task cancelled, skipping compaction: {err}"); + return; + } } - - Self::compact_slot_range(ledger, from_slot, to_slot).await;

🤖 Prompt for AI Agents

In magicblock-ledger/src/ledger_truncator.rs around lines 233 to 237, the code logs an error when delete_handle.await fails but still proceeds to call Self::compact_slot_range(...); modify the flow to skip compaction when the deletion task failed by returning early (or otherwise short-circuiting) after logging the error so compaction only runs on successful deletion; ensure the function returns or exits appropriately in the error branch and keep the existing log message (or augment it) to preserve diagnostics.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 021ef0d and a8900e1.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (1)

Cargo.toml (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

📚 Learning: 2025-10-26T16:54:39.084Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: test-manual/Cargo.toml:0-0
Timestamp: 2025-10-26T16:54:39.084Z
Learning: In the magicblock-validator repository, use git branch references (not commit hashes or tags) for the helius-laserstream dependency to allow automatic updates when the branch is pushed to.

Applied to files:

Cargo.toml

🔇 Additional comments (2)

Cargo.toml (2)

94-94: Confirm the log feature removal is intentional.

Removing features = ["release_max_level_info"] means debug and trace logs will now compile into release builds. This increases binary size and runtime logging overhead in production. Verify this aligns with the PR's metrics/logging objectives and the trade-off is acceptable.

94-232: Verification complete—no issues found.

The ureq = "2.9.6" dependency is correctly added in test-integration/Cargo.toml, with test-integration/test-tools/Cargo.toml referencing it as a workspace dependency. The url = "2.5.0" dependency is present in the root workspace Cargo.toml (line 218 in the provided snippet) and also declared in integration test modules. Both dependencies are properly positioned for integration test metrics fetching as described in the PR objectives.

Cargo.toml

* master: fix: move delete onto separate thread (#629) feat: return ledger + accountsdb metrics (#624)

This reverts commit a8900e1.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37d6839 and c6f75fd.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

Cargo.toml (3 hunks)
magicblock-processor/src/executor/processing.rs (4 hunks)
test-integration/Cargo.toml (2 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-processor/src/executor/processing.rs

📚 Learning: 2025-11-13T09:38:43.804Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/locks.rs:64-102
Timestamp: 2025-11-13T09:38:43.804Z
Learning: In magicblock-processor's TransactionScheduler (scheduler/mod.rs line 59), the executor count is clamped to MAX_SVM_EXECUTORS (63) at initialization time, and executor IDs are assigned sequentially from 0 to count-1. This architectural guarantee ensures that executor IDs used in the bitmask-based AccountLock (scheduler/locks.rs) will always be within valid bounds for bit shifting operations, making runtime bounds checks unnecessary.

Applied to files:

magicblock-processor/src/executor/processing.rs

📚 Learning: 2025-10-21T14:00:54.642Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

magicblock-processor/src/executor/processing.rs

📚 Learning: 2025-10-26T16:54:39.084Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: test-manual/Cargo.toml:0-0
Timestamp: 2025-10-26T16:54:39.084Z
Learning: In the magicblock-validator repository, use git branch references (not commit hashes or tags) for the helius-laserstream dependency to allow automatic updates when the branch is pushed to.

Applied to files:

Cargo.toml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: run_make_ci_lint
GitHub Check: run_make_ci_test
GitHub Check: Build Project

🔇 Additional comments (6)

test-integration/Cargo.toml (2)

100-101: Approve: ureq and url dependencies for test metrics collection.

The addition of ureq 2.9.6 and url 2.5.0 as workspace dependencies is appropriate for the new metrics collection capability in test integration (as referenced in the PR summary for fetching monitored account counts). ureq 2.9.6 upgrades rustls to 0.22 and ring to 0.17, without introducing breaking changes. While ureq 2.9.6 is not the latest version (3.x or 2.12.x exist), it is suitable for test-only usage and maintains backward compatibility.

77-77: I need to gather more context about the pull request to assess stability. Let me check the PR status and details.

Use a stable released version or main branch of solana-account, not a feature branch.

The revision 8f7050a is on the feature branch bmuddha/fix/pre-cow-check rather than a stable release or main branch. The commit was made today, allowing no time for testing or validation. Feature branch dependencies introduce significant risk and should be reserved for development only—not production or CI/CD test suites.

Replace this with either:

A tagged release from the solana-account repository, or

The main branch revision if the fix is already merged

Lines affected: test-integration/Cargo.toml 77, 109

Cargo.toml (3)

94-94: Confirm intent: log feature removal enables runtime logging in release builds.

Removing the release_max_level_info feature from the log dependency (line 94) changes compile-time behavior. Libraries should avoid using the max level features because they're global and can't be changed once they're set. This change allows all log levels (debug, trace) to be available at runtime in release builds, rather than being stripped at compile time.

Confirm:

Is this intentional to support enhanced observability for the new reconnection and metrics features described in the PR?

Are there any performance or binary-size implications in production that should be documented?

218-218: Confirm url dependency is shared across workspaces.

The url = "2.5.0" dependency is already declared in the main workspace dependencies (line 218), so test-integration will correctly reuse it. This is consistent with the dependency management pattern and avoids duplication.

154-154: Solana-account revision 8f7050a is consistently referenced across all manifests and exists in the repository.

Verification confirms that revision 8f7050a is uniformly specified in:

Main Cargo.toml workspace dependencies (line 154)

Main Cargo.toml patch.crates-io (line 230)

test-integration/Cargo.toml (lines 77 and 109)

The revision exists in the solana-account repository on branch bmuddha/fix/pre-cow-check and is associated with PR #20. All references are consistent across manifests.

magicblock-processor/src/executor/processing.rs (1)

10-21: New imports align with downstream usage

Importing ReadableAccount for lamports() and TransactionError for the new gasless guard is consistent with later code; no issues here.

magicblock-processor/src/executor/processing.rs

coderabbitai · 2025-11-15T10:04:42Z

magicblock-processor/src/executor/processing.rs

+        // The first loaded account is always a feepayer, check
+        // whether we are running in privileged execution mode
+        let privileged = accounts
+            .first()
+            .map(|feepayer| feepayer.1.privileged())
+            .unwrap_or_default();
+
        for (pubkey, account) in accounts {
            // only persist account's update if it was actually modified, ignore
-            // the rest, even if an account was writeable in the transaction
-            if !account.is_dirty() {
+            // the rest, even if an account was writeable in the transaction.
+            //
+            // We also don't persist accounts that are empty, with an exception
+            // for special cases, when those are inserted forcefully as placeholders
+            // (for example by the chainlink), those cases can be distinguished from
+            // others by the fact that such a transaction is always running in a
+            // privileged mode.
+            if !account.is_dirty() || (account.lamports() == 0 && !privileged) {
                continue;
            }


🧹 Nitpick | 🔵 Trivial

Account persistence filter depends on is_dirty and zero‑lamport semantics — please confirm invariants

The new gating logic:

Uses the first loaded account to infer a per‑transaction privileged mode, and

Skips persistence when !account.is_dirty() or when account.lamports() == 0 && !privileged.

This is a good direction to avoid unnecessary writes and to special‑case chainlink/placeholder flows, but there are a couple of subtle invariants to verify:

Fee‑only / rollback path: In the ProcessedTransaction::FeesOnly branch, the single (feepayer, fee_payer_account) pair is now subject to !account.is_dirty(). You need fee_payer_account.is_dirty() to be reliably true whenever fees should be charged, otherwise the fee payer state will not be persisted (and the validator may under‑charge). Please confirm the implementation of RollbackAccounts::FeePayerOnly ensures this.

Zero‑lamport account deletions: For non‑privileged transactions that intentionally close accounts by setting lamports to zero, the condition account.lamports() == 0 && !privileged means we will not call insert_account at all for those keys. That’s fine if AccountsDb treats “no insert” as the correct deletion semantics (e.g., it will already have removed the account or infer deletion from elsewhere), but if insert_account expects to see a zero‑lamport write to purge an existing account, this could leave stale state in the DB and omit “account closed” notifications on accounts_tx.

Notifications vs. storage: Because the same if gates both insert_account and the accounts_tx notification, observers on accounts_tx will no longer see updates for non‑privileged zero‑lamport accounts. If external consumers rely on seeing a final update when an account is closed, you may need to differentiate “store or purge in AccountsDb” from “notify observers”.

If the above invariants all hold, the optimization looks good. Otherwise, you may want to adjust the conditions (e.g., always ensuring fee‑payer writes occur when fees are due, and explicitly modeling deletions in AccountsDb rather than skipping inserts).

🤖 Prompt for AI Agents

In magicblock-processor/src/executor/processing.rs around lines 315 to 333, the new persistence gate (skipping when !account.is_dirty() or account.lamports()==0 && !privileged) can suppress necessary fee-payer writes and account-closure semantics; ensure invariants by (1) guaranteeing fee-payer writes are always persisted in the FeesOnly / RollbackAccounts::FeePayerOnly path — either by ensuring fee_payer_account.is_dirty() is set when fees are deducted or by unconditionally persisting the fee-payer when fees were charged, (2) handling zero-lamport deletions explicitly rather than skipping insert_account when a non-privileged tx sets lamports to zero — either call the AccountsDb deletion/purge API or still call insert_account with the zero-lamports representation if AccountsDb expects that to trigger removal, and (3) separating storage vs notification semantics so observers still receive an accounts_tx update for account closures even if physical storage is handled differently (i.e., always emit the notification for closed accounts or add an explicit notify-on-close step). Ensure these changes are covered by unit tests for FeesOnly and zero-lamport close cases.

thlorenz added 30 commits October 31, 2025 10:44

chore: subscribe sequentially and remove possible LRU cache race cond…

999953c

…ition

fix: cleanup subscriptions and resubscribe on upstream connection close

a0774c2

feat: sub metrics via LRU cache

fb6e73e

feat: clients also return subscription count

dae6cc4

chore: warn if LRU cache count is not matching pubsub count

7ecfc4a

chore: fix unsub on already evicted + metric counts

533ccde

chore: log discrepant account

638622d

chore: don't remove pubkey from LRU cache if unsub fails

f6a0270

chore: only removing sub when unsubscribe completed

402e0b7

chore: improve subs logging

8d31a09

chore: simplify unsub and remove invalid resub

754663f

chore: eliminate sub/unsub race condition

7943a7d

chore: prevent overwriting existing sub

361b0da

chore: tracing fetch + subs for cleaner debug logs

33430a3

chore: minor comments

98351e7

chore: update correct metric + log on info for now

56e05f5

chore: add metrics query to test context

399c37d

chore: add manual ix test to diagnose subscriptions

d12b74f

chore: merge bmuddha/fix/ws-reconnects, adjusting the changes

0d638f4

chore: adding warn logs when recverr occurs

5ec5287

chore: fix max log level override

e64166c

chore: more robust handling of fetch failure

07b170e

chore: fix recycle connections deadlock

0c7a7cd

chore: clippy

21f91d1

fix: the extra task was overkill and not awaited

cf69692

chore: log delegation issues on debug

c193be6

fix: rely on cancellation tokens, remove join_set to fix endles recyc…

9dce1a9

…le loop

chore: recycle with backoff

9f0a972

chore: try to resub before recycle

7d635c4

chore: pubsub client includes client id in all logs

bf366ae

bmuddha and others added 4 commits November 10, 2025 14:52

fix(metrics): drain the connection of request body

5ebf091

fix: found vs not-found metric counting

d9d5ee5

fix: use more realistic slot for program deploy

77c33e1

chore: blacklist native token program

2cea109

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

thlorenz added 3 commits November 13, 2025 10:19

Merge branch 'master' into thlorenz/subscription-metrics

c0e98fb

* master: fix: fetching CUs for metrics (#623)

chore: enable ledger size metric

9e9e14a

chore: fix clippy

917f177

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

thlorenz added 2 commits November 13, 2025 15:48

chore: more logs around undelegate requests

1f2d684

chore: verifying completed undelegation in intent tests

0b1b245

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

feat: add undelegation requested/completed metrics

7880a7d

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

fix: move delete onto separate thread

9f8a0b2

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

thlorenz and others added 2 commits November 13, 2025 20:41

chore: fix conditition to inc delegation complete

021ef0d

fix: use the latest SVM with gasless feepayer check

a8900e1

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

Luca Cillario and others added 8 commits November 14, 2025 12:31

ci: trigger synchronize

0a042a3

fix: check for privileged mode when filtering empty accounts

8f2e12b

chore: fix lint

82f4248

chore: add extra check that unescrowed payer cannot pay/write for tx

ae73b6d

Merge branch 'master' into thlorenz/subscription-metrics

e1e6b15

* master: fix: move delete onto separate thread (#629) feat: return ledger + accountsdb metrics (#624)

Revert "fix: use the latest SVM with gasless feepayer check"

37d6839

This reverts commit a8900e1.

fix: post execution check for feepayer mutation

e7c8a03

fix: perform feepayer modification check with privileged exception

c6f75fd

coderabbitai bot reviewed Nov 15, 2025

View reviewed changes

bmuddha added 2 commits November 15, 2025 14:05

fix: cleanup print logs

7162b3b

fix: allow feepayer mutations if delegated

5e66fa2

		const NATIVE_TOKEN_PROGRAM_ID: Pubkey =
		solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");

feat: enhance subscription management with metrics and reconnection #621

Are you sure you want to change the base?

feat: enhance subscription management with metrics and reconnection #621

Uh oh!

Conversation

thlorenz commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Details

Subscription Management Improvements

Metrics and Monitoring

Reconnection and Reliability

Testing and Integration

Code Quality

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thlorenz commented Nov 6, 2025 •

edited by coderabbitai bot

Loading