Skip to content

Conversation

@rustonbsd
Copy link
Owner

@rustonbsd rustonbsd commented Oct 30, 2025

Refactoring distributed-topic-tracker to use new mainline primitives announce_signed_peer and get_signed_peer discussed in @Nuhvi draft proposal:


  • host supported pkarr relay node
  • refactor with announce_signed_peer and get_signed_peer
  • cicd test
  • bench and findings
  • add as iroh-gossip integration directly

@rustonbsd
Copy link
Owner Author

Pkarr relay server at: http://pkarr.rustonbsd.com:6881

@rustonbsd
Copy link
Owner Author

Thoughts:

  • iroh-gossip feature: if adopted this should be a feature in iroh-gossip for discovering EndpointId's for a specified topic automatically since iroh-gossip doesn't have a native access control model anyways, I will see if I can find a shape for that feature after the distributed-topic-tracker refactor experiment and might shoot a PR to iroh-gossip (if the iroh team is interested). I think it would fit better there then with the distributed topic tracker, since the distributed-topic-tracker includes an encryption based access and validation schema and projects like iroh-lan and rustpatcher depend on that. Also almost all of the complexity of the distributed-topic-tracker comes from working around the limitations of native ed25519 secret key derivation as the basis for topic discovery. (I will still refactor this and maybe make the access control part an optional component, but first i will refactor this to be an iroh-gossip bootstrap integration with the new announce_signed_peer and get_signed_peer without any access controls.

@Nuhvi
Copy link

Nuhvi commented Oct 30, 2025

@rustonbsd happy to see you take this seriously, and I will try to contact Libtorrent maintainer to ask for his support, at least after we validate this implementation a bit further.

Pkarr relay server at: http://pkarr.rustonbsd.com:6881/

I want to note that: 1) the pkarr-relay crate doesn't use the dht crate, and for now I am using the nuhvi/pkarr repo directly on my server relay.pkarr.org to run a relay that uses the dht crate, but more importantly; 2) I noticed that the DHT server used for bootstrapping, is better off being in a separate process than the relay, if only because the relay might use the node too much and interfere with its capacity to act as a passive router, so I am using this adhoc script instead:

fn main() {
    tracing_subscriber::fmt()
        .with_env_filter(tracing_subscriber::EnvFilter::from_default_env())
        .init();

    dht::Dht::builder()
        .server_mode()
        .port(6881)
        .build()
        .unwrap()

    loop {
        std::thread::sleep(std::time::Duration::from_secs(10));
        let info = dht.info();
        tracing::info!(?info);
    }
}

That being said, I am pleased to see that my laptop is seeing your node (as an extra node plus mine):

RUST_LOG=debug cargo run --example get_signed_peers FFBFBF52B8B91B946C688028AE1D45C8D4A3048D
...
Populated the routing table self_id=Id(....) table_size=87 signed_peers_table_size=2 << Horray
...
 Done query id=Id(ffbfbf52b8b91b946c688028ae1d45c8d4a3048d) closest=5 visited=6 responders=2 << Extra Horray

Edit: double checked, and yes indeed the extra node is your pkarr.rustonbsd.com ip 116.203.180.147 and not another node I ran and forgot about on my server.

Edit2: Note; I didn't manually add your node to my bootstrapping list, I used the default, but your node was registered in the signed_peers_routing_table at relay.pkarr.org:6881 node, as intended, which as you can tell from my tone, I don't take for granted yet. All my previous work was a client-side DHT relying on the excellent routers by Libtorrent and Utorrent out there, this is the first time my implementation has to prove itself as a router too.

@rustonbsd
Copy link
Owner Author

Hi @Nuhvi,

I want to note that: 1) the pkarr-relay crate doesn't use the dht crate, and for now I am using the nuhvi/pkarr repo directly on my server relay.pkarr.org to run a relay that uses the dht crate, but more importantly; 2) I noticed that the DHT server used for bootstrapping, is better off being in a separate process than the relay, if only because the relay might use the node too much and interfere with its capacity to act as a passive router, so I am using this adhoc script instead:

my node was setup as follows:

git clone https://github.com/Nuhvi/pkarr
cd pkarr/relay
cargo build --release

# run  "../target/release/pkarr-relay" in tmux session for the moment

let me know if i should make any adjustments!

Edit2: Note; I didn't manually add your node to my bootstrapping list, I used the default, but your node was registered in the signed_peers_routing_table at relay.pkarr.org:6881 node, as intended, which as you can tell from my tone, I don't take for granted yet. All my previous work was a client-side DHT relying on the excellent routers by Libtorrent and Utorrent out there, this is the first time my implementation has to prove itself as a router too.

That is really cool 🎉
I haven't looked through your code in detail yet, just skimmed the impl details but I will take a closer look also to understand this better ->

  1. I noticed that the DHT server used for bootstrapping, is better off being in a separate process than the relay, if only because the relay might use the node too much and interfere with its capacity to act as a passive router, so I am using this adhoc script instead

I have done a quick refactor taking the feature route. I added a new feature experimental and modified two tests to use the new announce_signed_peer and get_signed_peer route switched on with the experimental feature.

You can test if you want to with the following two tests:

cargo run --example e2e_test_experimental --features="iroh-gossip experimental"
cargo run --example chat_experimental --features="iroh-gossip experimental"

I am fighting github actions atm but we should be able to look at the e2e test step in the actions workflows with and without the experimental feature enabled and we should see the difference in bootstrapping speed.
Next I will take a look at your code and do some benchmarking and maybe some larger node count tests.
Will continue tomorrow or the day after.

@rustonbsd
Copy link
Owner Author

execution times form gh actions e2e_test vs e2e_test_experimental:

e2e_test = 58s
e2e_test_experimental = 23s

@Nuhvi
Copy link

Nuhvi commented Oct 31, 2025

let me know if i should make any adjustments!

Not necessarily, this setup work, but for example if you run a binary with Dht::builder() only as I mentioned above, you get access to useful options like DhtBuilder::server_settings() which in turn allows you to set max_info_hashes and max_peers_per_info_hash beyond the defaults... which might be useful while the number of nodes is so small.

Another thing you might want to do is to explicitly use your node as an DhtBuilder::extra_bootstrap() in your clients, just in case my node went down, also see the cache_bootstrap example which might help to cache any other nodes, again in case the couple nodes that we are running and using as bootstrap go down, people can still leverage other nodes that we don't know about, but gets cached between session.

self.reset()?;
}
let mut hasher = sha2::Sha512::new();
hasher.update(topic_bytes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful here to add a namespace like b'Iroh' just to make sure that anyone using the same topic and sha512 but for another overlay network than Iroh, won't get the same info_hash.. I described this in the BEP, but maybe I should have added it to the function signature? I didn't want to force it on people.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely! 1. it should be sha512 and 2. added "/iroh/distributed-topic-tracker" as extra bytes to hash for namespacing.

Did some more minimal refactoring to move experimental features into their own submodule, for a cleaner api cut.

@rustonbsd
Copy link
Owner Author

@Nuhvi my current status is:

  • test are still a bit flaky some more tuning needed to not run into ratelimits when testing with too many nodes behind the same IP in a row
  • Two options:
    • add time based schema back in for situation where the ~300 TTL (might remember that wrong here) is longer then the expected active period of nodes and with the 20 nodes returned per infohash cutoff we run into sea of dead nodes +ratelimits very quickly if we spawn many nodes behind a single ip that are very short lived for benchmarking or in a big network scenario where nodes join many topics for short periods of time (problem: +ratelimits +20 node return limit). to help with overcrowding the single dht entry, we could reintroduce the rolling unix minute schema back in. This would reduce the problem (as seen with the normal distribued_topic_tracker).
    • Add a custom new_then parameter to the get_signed_peer function that cuts off entires older then specified and returns up to 20 nodes that fit that filter (-sea of nodes, ~ratelimits every call should return active nodes).

What do you think? I will play with the timestamps of SignedAnnounce a bit more.

@Nuhvi
Copy link

Nuhvi commented Nov 3, 2025

@rustonbsd I am not fully aware of what you are doing or testing, so I will have to spend some time going through it and let you know if there is any obvious foot guns you can avoid. Will get back to you later tonight or early tomorrow.

@Nuhvi
Copy link

Nuhvi commented Nov 3, 2025

Until then... since you are testing live and not with a local testnet, and since you are using pkarr-relay to run the dht node, well that relay is using a rate limiter with brutal defaults (for a good reason in that case), but since this is one of only two nodes supporting the new extension, you really need to relax these limits drastically, if not remove them altogether by running dht node separately from the relay as I mentioned before.

but I will also later test myself and see how does my own node that is running without any rate limits is behaving.

But also, note that typically in practice, nodes in the same ip address aren't expected to make too many announcements/lookups, no?

@rustonbsd
Copy link
Owner Author

@Nuhvi thank you for taking a look!

But also, note that typically in practice, nodes in the same ip address aren't expected to make too many announcements/lookups, no?
Yes I fully agree just some edge cases where I am just thinking of a university networks or any other proxied connection with many users having the same perceived external ip. But thats a different problem and has nothing necessarily todo with the new signed peer discovery.

I agree that we can be much more chill about the timeouts and retry timings, I just want to know where the limits are and figure out why this was more flaky then the more complex distributed-topic-trackers native behaviour that is also not very chill with the timeouts.

I forked mainline and added a more_recent_than param to get_signed_peers function and that seems to work perfectly for nodes that are coming and going quickly (regardless of ratelimits).

IDK if that is interesting to you but this would make it possible to avoid namespaceing by time e.g. unix minute or something like that. What do you think?

see: Nuhvi/mainline@aad2154

@Nuhvi
Copy link

Nuhvi commented Nov 3, 2025

I just want to know where the limits are and figure out why this was more flaky then the more complex distributed-topic-trackers native behaviour that is also not very chill with the timeouts.

Well, likely because these requests were sharded over a massive DHT, and any rate limits that are encountered in the bootstrapping nodes are irrelevant, since they are only used to populate the routing table... while now, there is only two nodes, if they have any rate limits, things will fail.

I forked mainline and added a more_recent_than param to get_signed_peers function and that seems to work perfectly for nodes that are coming and going quickly (regardless of ratelimits).

I noticed that, and it is interesting... not sure how does fix the rate limits? is there a way to simulate the situation with or without this option using a local dht::Testnet?

My best guest here is that this indicates that rate limits aren't relevant at all (which makes sense since at least one node has none), but the fact that you are announcing too many nodes on the same topic, and storage nodes return random 20, and these random 20 might be all old and dead by the time you make a lookup.

It would really help if you separate (in your tests) the notion of failure to two: failiure to get responses, and failure to get data. The former is a failure of the DHT, the later is just a thing that any Bittorrent-like system needs to deal with... no matter how "recent" the announcement is, a malicious peer (or just a flaky client) can announce 100s of peers, but none of them is actually either listening to incoming requests, or have the data.

So please try again and focus on assuming that peers are a hit and miss, so get peers from the DHT, try them, and they fail, ask for another 20 random ones.

Apologies if you are already doing that, I am trying to use my intuition without reviewing the code, because this way I can answer earlier than I could otherwise.

@rustonbsd
Copy link
Owner Author

rustonbsd commented Nov 3, 2025

Well, likely because these requests were sharded over a massive DHT, and any rate limits that are encountered in the bootstrapping nodes are irrelevant, since they are only used to populate the routing table... while now, there is only two nodes, if they have any rate limits, things will fail.

I agree.

I noticed that, and it is interesting... not sure how does fix the rate limits? is there a way to simulate the situation with or without this option using a local dht::Testnet?

I will look deeper into the querying of 20 nodes at a time sequentially and not randomly. For this highly specific scenario (rapidly running tests in succession for example on every pr sync event or locally without a test net). I usually run my test like 10 times on local in quick succession just to see if or where we break with too many stale records/pub keys still on the dht.

It would really help if you separate (in your tests) the notion of failure to two: failiure to get responses, and failure to get data. The former is a failure of the DHT, the later is just a thing that any Bittorrent-like system needs to deal with...

I will rewrite the e2e test to reflect more nuanced failure cases but if you run it locally and enable debug tracing you should get more insight.

no matter how "recent" the announcement is, a malicious peer (or just a flaky client) can announce 100s of peers, but none of them is actually either listening to incoming requests, or have the data.

Yes the scenario you describe isn't helped by time based filtering. I just came across the issue during testing on live. i go into my reasons a bit more below.

So please try again and focus on assuming that peers are a hit and miss, so get peers from the DHT, try them, and they fail, ask for another 20 random ones.

Implemented is two loops a publisher that sleeps for a while after success and a bootstrap loop that checks based on num_peers > 0 what timing to use for the next get_signed_perrs call, get 0->20 pubkeys from the topic (infohash) back, check against already tried hashset and try join the unknown peers, add unknown peers to hashset. repeat.

Maybe I am just blind or haven't seen a way to get non random lists of 20 peers to try? like pagination? or maybe a "seen before" filter that gets applied before selecting 20 random peers? so far i just use the get_signed_peers function. but better then more_recent would be a general filter ability either by a passed "filter list" or something. I will take another look at the code if i missed something and I could just through sequential calls get the right peers and no overlap between the random samples from call to call (could very well be 🙈).

I am also noticing i wasn't really clear about the state of this right now. It's all overly aggressive because i want to find the best params for the shortest gossip network bootstrap that can be run over and over again on the same system without breaking. Whenever I use the distributed-topic-tracker in a project repeated cold bootstraps are just part of the testing and if the lib feels flaky there (even if not realistic) i would not use the Library since i would be afraid it would do that on users machines if they do a couple of rapid restarts. but let me dial it in with max aggression before i add time based name spacing and then benchmark the tradeoffs off slower and fewer retries etc. then we have some numbers and can quantify the tradeoffs.

The failure behavior:
Tests run over and over again, for example:
cargo run --example e2e_test_experimental -- features="iroh-gossip experimental" -- 2 topic-postfix

I can run the above in three terminals about six to ten times before when running again I read 0 peers found in the logs, the first six to ten worked fine and read more and more known peers for the topic. Maybe some longer term block based on ip? But when i change the topic-postfix and rerun it works again for 6-10 iterations. If i wait a few minutes i can reuse an old topic-postfix again. Probably a two tiered rate limit? I will look through mainline in more detail and figure out what this is exactly.

Still working on the details before benchmarking. I just want to make the test work reliably first and get the lowest, repeatable, bootstrap times on the live network (our two nodes for now ^^)

FIY: my main objectives are 1. fast cold bootstrap times: time till we connect first, live iroh peer and 2. (rapid) repeatability without impacting the speed of bootstraping a completely new network with some stale entries still around.

More reasonable request limits and more nodes would probably solve all the rate limit issues in a "balanced" version. I am just not quite at balanced node yet.

I will test the "normal" distributed-topic-tracker with two custom nodes only and see how it behaves in that environment.

Might be a couple days. If you have something specific i should test, let me know. I am happy to run some experiments. So far I have just been playing with the new functions and working towards benchmarking and optimizing.

@Nuhvi
Copy link

Nuhvi commented Nov 3, 2025

The reason BEP_0005 suggests that nodes return random subset of peers (which is what I am doing in the new extension of signed peers) is because that is the only secure way to let honest peers pass through as fast as possible, pagination favors earliest peers (who might be stale) or if reversed they favor spammy attackers who keep announcing on a loop.

Please notice that you only need just one honest peer, once you find them you can ask them for more peers they trust.

The mental model using DHTs should be:

  1. You will never get reliable service.
  2. The more you abuse it with requests trying to achieve more reliability, the more your packets will get dropped if not your IP blocked.

You should always assume that the DHT works rarely, and design your system so that it leverages the DHT for censorship resistance, that occasionally works, then go super hard on caching and p2p gossip after you find the first peers etc...

Whenever you find your tests flaky, please read the first sentence in BEP_0005:

BitTorrent uses a "distributed sloppy hash table" (DHT)

If you need reliable and high quality service, you hav no other option but to use centralized trackers.

@Nuhvi
Copy link

Nuhvi commented Nov 3, 2025

If I am designing a content discovery system, that needs to be reliable and censorship resistant, I would:

  1. Hardcode trackers (or at least reputable peers) in the equivalent of the "magent" link.
  2. Query the DHT on a very very relaxed background process to discover more peers, to enhance the download process, and be more resilient to censorship or current peers going offline.

This isn't a novel idea either, this is how Bittorrent works in the first place, it started with trackers then added the DHT to enhance censorship resistance, not to have reliable peer discovery, and to this day that's how magnet links work, for example here is a random one, notice how many hard coded trackers there are, yes the infohash is enough to find peers on the DHT, but they still hardcode as many trackers as they can... because that is how things work reliably:

magnet:?xt=urn:btih:DBB3FEC49D40EE29CC18B65236FCBDB7DEF443E5&dn=The%20Witcher%20S04E01%201080p%20WEB%20h264-ETHEL&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Ftracker.bittor.pw%3A1337%2Fannounce&tr=udp%3A%2F%2Fpublic.popcorn-tracker.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.dler.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftorrent.gresille.org%3A80%2Fannounce&tr=udp%3A%2F%2Fp4p.arenabg.com%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337

@Nuhvi
Copy link

Nuhvi commented Nov 3, 2025

Oh, I forgot to mention another important point; getting 20 random peers from one node is not representative of what a DHT should be like, usually, you should be getting random 20 peers from each node, and you should have so many, that more or less you get 90% if not all the announced peers in one round trip.

I think if you are going to judge the performance of the new extension, in public network and not testnet, then you are forced to run 10s of nodes to get a correct empirical results representative of the state after enough adoption.

@rustonbsd
Copy link
Owner Author

Yes I agree, I wouldn't build any production system with the distributed-topic-tracker as its primary peer discovery mechanism. Not if any scale is expected.

I am sorry I misunderstood. The original distributed-topic-tracker was an exercise in "can I use mainline to get reliable peer discovery by topic working and can I make it reasonably fast" (it was always intended for very small projects (see my github ^^)) and I did the same thing again with the new signed get/announce peer functions.

If you are still interested I will build a balanced version that is intended for long running systems with eventual consistency and no continuous bootstrapping after the first peer has been found. I can refactor and create some benchmark results?

It of course makes more sense to build a reasonable PoC and not a very aggressive engineering exercise since this is a protocol proposal. I apologies for the misunderstanding, I got carried away.

@Nuhvi
Copy link

Nuhvi commented Nov 4, 2025

No need to apologize, I am just trying to manage expectations. Happy to help in any way you need going forward.

@rustonbsd
Copy link
Owner Author

Hi @Nuhvi , sorry for the long delay,

So what are we thinking would be the best way to benchmark this and compare it to the current distributed-topic-tracker?

Maybe we can go at this backwards. What do you think would be convincing evidence that this repo could show to make a strong case for your rfc to pass?

Ideas:

  • spin up ~200 vps for an hour (~50€) and have multiple nodes and clients per, create old vs new node comparison on this setup (mid size cluster with identical conditions and run old and new distributed-topic-tracker benchmarks then compare).
  • run old version of dtt on only our two nodes, same as above but for a tiny cluster of two nodes.
  • maybe simulation? but I wouldn't know how.
    -...

How do you test and compare different versions of mainline and pkarr against each other? What are the metrics you track? Any ideas?

Or maybe something completely different that shows the use case for this new bittorrent feature?
Let me know what you think would help the case to get this approved.

If adopted it would make this whole project fold nicely into iroh gossip or even generally as a discovery mechanism for more than gossip topics i.e. as a generalized discovery mechanism at the iroh endpoint level.

Let me know your thoughts.

@Nuhvi
Copy link

Nuhvi commented Nov 17, 2025

@rustonbsd I am biased of course, but I don't think we need to do benchmarking for the following reasons:

  1. We know that this solution is cleaner than any other (short of hardcoding trackers in magic urls).
  2. We know that get_peers() works, at least once you have enough nodes to fight attacks.
  3. We don't actually face any attacks and will most probably won't for long time if ever.

Yes running 200 nodes is going to be more resilient than running 2, but if you are going to shut them down later, it doesn't matter much.

I think the most promising ways forward are:

  1. Building apps that users are willing to run on servers with open ports...
  2. Convince more Iroh users to spin up these nodes
  3. Convince Libtorrent maintainer to implement this rfc and hope that it eventually lands in populare Bittorrent clients.

For (1) and (2), we need to convince ourselves firs that the implementation is solid and stable, then go ask Iroh devs and users that they should switch from mainline to dht crate. for that, I appreciate your role stress testing my implementation, and I am trying to find if I can break it too.

For (3), I think the RFC is enough, the maintainer is perfectly capable of judging the logic himself, the only issue is that I sent him an email and he hasn't responded yet, and from what I see from the commit history, I am assuming he is currently busy, but eventually we may open an issue on https://github.com/arvidn/libtorrent to get his attention, and having multiple people already testing (thank you) would hopefully lend more credibility to my claims about both relevance and efficacy.

So, to summarize: let's get as many people as we can run the node for earnest, especially in Iroh community, and maybe let's open an issue in Libtorrent.

@rustonbsd
Copy link
Owner Author

That sounds like a solid plan. I will go over this implementation one more time and make it cleaner then publish the new discovery mechanism as an experimental feature and merge this PR. I have a channel on the iroh discord server for the development of the distributed-topic-tracker by the same name. I will write something about this in there after I merge the pr with the experimental feature.

Question: can we/should we do the implementation work in libtorrent ourselves, not only the proposal, so it is easier to judge and merge for the maintainer? I haven't done this before so I'm just curious ^^

@Nuhvi
Copy link

Nuhvi commented Nov 17, 2025

@rustonbsd I am not familiar enough with Libtorrent code base nor with writing C++ so I wouldn't dare to be honest, and even if I would, the first step would be to ask the maintainer regardless. If he asked for us to open a PR, maybe we can try then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants