A transformer model based on sliding kernel self attention mechanism. This model is based on a implementation of Swin Transformer. See Swin Transformer repository for the original implementation.
| Model | Params | Val. Acc. | 
|---|---|---|
| Swin Transformer (tiny) | 26,598,166 | 82.19% @200eps | 
| Swin Transformer (tiny) | 26,598,166 | 83.34% @300eps | 
| Kernel Transformer (tiny) | 26,600,362 | 85.83% @300eps | 
