Trainable fast and memory-efficient sparse attention
          transformers          pytorch          english          transformer          triton          chinese          cuda-kernels          cutlass          attention-mechanism          attention-is-all-you-need          self-attention          pytorch-implementation          flash-attention          triton-kernels          dynamic-mask-attention      
    - 
            Updated
            Oct 30, 2025 
- C++