Metadata-Version: 2.1
Name: quanto
Version: 0.0.1
Summary: A quantization toolkit for pytorch.
Author-email: David Corvoysier <david@huggingface.co>
License: Apache-2.0
Keywords: torch,quantization
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8.0
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.1.0

# pytorch quantization toolkit

Uses a torch.Tensor subclass QuantizedTensor to dispatch aten base operations to operations using integer ops.

All operations accept QuantizedTensor with int8 data.

Most arithmetic operations return a QuantizedTensor with int32 data.

Uses quantized modules to:

- store quantized weights,
- gather input and output scales to rescale QuantizedTensor int32 data to int8.

For now only Linear is quantizable.

Eventually, the produced quantized graph should be passed to a specific inductor backend to fuse rescale into the previous operation.

Examples of fused operations can be found in https://github.com/Guangxuan-Xiao/torch-int.
