- 
                Notifications
    You must be signed in to change notification settings 
- Fork 141
Spectral Clustering #1425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Spectral Clustering #1425
Conversation
| Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. | 
| Update: Update: | 
| Update: Update: | 
| @@ -0,0 +1,38 @@ | |||
| /* | |||
| * Copyright (c) 2025, NVIDIA CORPORATION. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we name this file spectral.hpp instead of spectral_clustering. The clustering is already in the namespace, so it's redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in f1ab56a
| * @} | ||
| */ | ||
|  | ||
| void create_connectivity_graph(raft::resources const& handle, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not part of the standard "fit()" "predict()" APIs that one would normally use day to day, so please add it to the cuvs::preprocessing::spectral_embedding::helpers namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in f1ab56a
…into spectral-clustering
2bbd1e5    to
    2d90bb5      
    Compare
  
    f364560    to
    f377524      
    Compare
  
    | Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the  You can disable this status message by setting the  ✨ Finishing touches🧪 Generate unit tests (beta)
 Comment  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, LGTM! It looks like the connectivity graph uses a int as its nnz type. I guess this is intentional and temporary?
        
          
                cpp/src/cluster/spectral.cu
              
                Outdated
          
        
      | raft::linalg::transpose(handle, | ||
| embedding_col_major.data_handle(), | ||
| embedding_row_major.data_handle(), | ||
| n_samples, | ||
| config.n_components, | ||
| raft::resource::get_cuda_stream(handle)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that it is necessary to double VRAM usage and perform a transposition here, but would be great if we could find a workaround in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Yes, we have to switch from col to row major since spectral embedding needs col major and kmeans needs row major. Yes, we are using int as nnz type currently until we resolve the scale up work in spectral embedding.
| int n_components; | ||
| int n_init; | ||
| int n_neighbors; | ||
| uint64_t seed; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we expose the rng_state instead of the seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in d9d4bc8
|  | ||
| void fit_predict(raft::resources const& handle, | ||
| params config, | ||
| raft::device_coo_matrix_view<float, int, int, int> connectivity_graph, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raft::device_coo_matrix_view<double, int, int, int> is not supported while cuGraph has templates for both float and double. Can we add support for the latter as well. Otherwise, i get the below error
error: no suitable user-defined conversion from "raft::device_coo_matrix_view<double, int, int, int>" to "raft::device_coo_matrix_view<float, int, int, int>" exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added double support in 5c608f3
Resolves #1201
This PR introduces the Spectral Clustering algorithm which builds on top of Spectral Embedding.
This PR will enable us to remove the legacy and inaccurate spectral clustering algorithm called
partitionThis PR also adds
doubleprecision support for spectral embedding and spectral clustering. The new templates require a refactor of the source files to.cuhand the template instantiations in.cuGtests have been added to this PR with synthetic data in a similar fashion as kmeans gtests.
I have tested this PR through python bindings and by comparing the plots with sklearn's SpectralClustering.
Here is a plot of sklearn vs ours.
