Metadata-Version: 2.1
Name: fast_norm_cuda
Version: 0.2.0
Summary: A fast, yet specialized, RMSNorm/LayerNorm implementation
Home-page: https://github.com/yuantailing/fast-norm-cuda
Author: Tailing Yuan
Author-email: yuantailing@gmail.com
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: GPU :: NVIDIA CUDA
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.3
License-File: LICENSE

# fast-norm-cuda

A fast, yet specialized, RMSNorm/LayerNorm implementation

This library is under development. Currently, only some special cases are supported, and the performance is not yet fully optimized.

- [x] RMSNorm
- [ ] LayerNorm
- [x] Float16 and BFloat16
- [ ] More data types
- [x] More shapes
- [ ] Accelerate if no wgrad
- [ ] Performance tuning


## Statement

This work was independently completed by me at home using my personal RTX 3080.
