-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Dear Cell-ranger team,
It seems scan-rs is still using the very old vantage point data structure for nearest neighbor search, a key step of UMAP and t-SNE. However, recent breakthroughs in nearest neighbor search, e.g., proximity graph based algorithm has been proposed, which can be much faster and also accurate in terms of recall (e.g. HNSW, NSG). More important, it can be efficiently parallelized. In addition to the NNS step, UMAP steps, including cross entropy optimization, embedding space initialization, are all single threaded, thus slow for large dataset such as millions or billions of samples (it will be soon easy to have such large-scale dataset). I think the non-linear dimension reductions step can be further improved/accelerated.
Thanks,
Jianshu