Bound the number of nodes in gossip section #2746

hpatro · 2025-10-17T18:29:36Z

I would prefer us to keep the node count bounded in gossip section rather than unbounded. In a 2,000 nodes cluster, the worst case is 1998 nodes in the gossip section which seems quite expensive on both sender and receiver end.

Also, wanted others to explore other node count and see if we should update the default. In #2291 Viktor suggestion was to try out sqrt(n) rather than 10% of total node count.

Related to #2291

Bound the node count in gossip section
Prioritize PFAIL nodes in the gossip section
Introduce a config to control the percentage of node in gossip section control the overhead
Default to 10% of total node count

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

roshkhatri · 2025-10-17T20:35:18Z

src/cluster_legacy.c

     * Since we have non-voting replicas that lower the probability of an entry
     * to feature our node, we set the number of entries per packet as
     * 10% of the total nodes we have. */


Do we need to update the comment?

roshkhatri · 2025-10-17T20:55:11Z

src/cluster_legacy.c

-    wanted = floor(dictSize(server.cluster->nodes) / 10);
-    if (wanted < 3) wanted = 3;
-    if (wanted > freshnodes) wanted = freshnodes;
+    int overall = server.cluster_ping_message_gossip_max_count;


Can we have a it as a percentage? cluster_ping_message_gossip_max_perc

Also we are naming it to be max but will the number of nodes ever be less than that?

Yeah, I was going to suggest the same. The default is a percentage so it seems appropriate to configure it as a percentage.

Yeah, I like this. Will be easier for folks to deal with scale in/out situations.

How about supporting both options?

roshkhatri · 2025-10-17T20:57:02Z

src/cluster_legacy.c

+     * information would be broadcasted. */
    int pfail_wanted = server.cluster->stats_pfail_nodes;
+    if (pfail_wanted >= overall) {
+        pfail_wanted = overall - 1;


can we set the pfail_wanted = overall

why are we reserving one spot in overall for wanted?

Yeah, I suggested that. Will update it.

Do we foresee any regression we don't gossip healthy nodes at all? I am wondering in scenarios where PFAIL nodes are never actually marked as FAIL or healthy.

zuiderkwast · 2025-10-17T21:23:52Z

What are the theoretical implications of lowering the number?

Sending 10 pings with n/10 gossips achieves the same information-spreading effect as sending 20 pings n/20 gossips? So failure detection and convenrgence of any changes slows down linearly with this config?

I'm fine with a config like this, but I (and others, you included?) have a feeling we can gossip smarter without sacrificing anything.

I really liked the idea of prioritizing gossips about node for which there was a recent change, this idea: #1897 (comment)

Can we add a last-modified timetamp to each node and do a weighted random?

Another idea is to increment a score for each node we gossip about and then prioritize the ones with lower score next time.

hpatro · 2025-10-17T22:05:53Z

I really liked the idea of prioritizing gossips about node for which there was a recent change, this idea: #1897 (comment)

I came across this idea in hashicorp's serf. It requires a bit of work to index node by different type and have logic around which ones to prioritise more. Quite achievable to have smarter gossip.

This PR is a guardrail for the current system to avoid CPU/network spikes.

hpatro · 2025-10-18T00:01:33Z

What are the theoretical implications of lowering the number?

We need to guarantee a message to be received directly or indirectly from another node within node-timeout/2 period. If that is met we don't send out another message.

So, this might lead to more direct pings which has higher overhead. Gossip node information is 106 bytes. However, the entire payload is around 2200B.

sarthakaggarwal97 · 2025-10-24T23:09:44Z

src/config.c

    createIntConfig("rdma-port", NULL, MODIFIABLE_CONFIG, 0, 65535, server.rdma_ctx_config.port, 0, INTEGER_CONFIG, NULL, updateRdmaPort),
    createIntConfig("rdma-rx-size", NULL, IMMUTABLE_CONFIG, 64 * 1024, 16 * 1024 * 1024, server.rdma_ctx_config.rx_size, 1024 * 1024, INTEGER_CONFIG, NULL, NULL),
    createIntConfig("rdma-completion-vector", NULL, IMMUTABLE_CONFIG, -1, 1024, server.rdma_ctx_config.completion_vector, -1, INTEGER_CONFIG, NULL, NULL),
+    createIntConfig("cluster-ping-message-gossip-max-count", NULL, MODIFIABLE_CONFIG, 0, 2000, server.cluster_ping_message_gossip_max_count, 0, INTEGER_CONFIG, NULL, NULL),


can the max be a function of dictSize(server.cluster->nodes)? I mean it would be good to validate that we don't oversend the number of nodes in gossip.

hpatro requested a review from madolson October 17, 2025 18:29

github-actions bot assigned hpatro Oct 17, 2025

hpatro force-pushed the cluster_ping_gossip_count branch from 908e10e to 7b29e59 Compare October 17, 2025 18:35

Bound the number of nodes in gossip section

c5642e8

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro force-pushed the cluster_ping_gossip_count branch from 7b29e59 to c5642e8 Compare October 17, 2025 18:40

roshkhatri reviewed Oct 17, 2025

View reviewed changes

sarthakaggarwal97 reviewed Oct 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bound the number of nodes in gossip section #2746

Bound the number of nodes in gossip section #2746

hpatro commented Oct 17, 2025 •

edited

Loading

Uh oh!

roshkhatri Oct 17, 2025

Uh oh!

roshkhatri Oct 17, 2025

Uh oh!

zuiderkwast Oct 17, 2025

Uh oh!

hpatro Oct 17, 2025

Uh oh!

sarthakaggarwal97 Oct 24, 2025

Uh oh!

roshkhatri Oct 17, 2025

Uh oh!

hpatro Oct 17, 2025

Uh oh!

sarthakaggarwal97 Oct 24, 2025

Uh oh!

zuiderkwast commented Oct 17, 2025

Uh oh!

hpatro commented Oct 17, 2025 •

edited

Loading

Uh oh!

hpatro commented Oct 18, 2025 •

edited

Loading

Uh oh!

sarthakaggarwal97 Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Bound the number of nodes in gossip section #2746

Are you sure you want to change the base?

Bound the number of nodes in gossip section #2746

Conversation

hpatro commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuiderkwast commented Oct 17, 2025

Uh oh!

hpatro commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hpatro commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hpatro commented Oct 17, 2025 •

edited

Loading

hpatro commented Oct 17, 2025 •

edited

Loading

hpatro commented Oct 18, 2025 •

edited

Loading