Metadata-Version: 2.1
Name: pykernsformer
Version: 0.0.4
Summary: Kernel attention implementation of Pytorch TransformerEncoderLayer
Home-page: https://github.com/egrigokhan/pykernsformer
Author: Gokhan Egri
Author-email: <gegri@g.harvard.edu>
License: MIT
Keywords: pytorch,transformer,attention
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Description-Content-Type: text/markdown
Requires-Dist: torch
Requires-Dist: numpy

# pykernsformer

![alt text](https://img.shields.io/pypi/v/pykernsformer)
![alt text](https://img.shields.io/pypi/dd/pykernsformer?color=green&logo=red&logoColor=red)
![alt text](https://img.shields.io/pypi/pyversions/pykernsformer)

The pykernsformer module extends the `torch.nn.TransformerEncoderLayer` class to include custom attention formulas.

# Installation

You can install the pykernsformer package using pip as

`pip install pykernsformer`

# Usage

pykernsformer comes with the following built in attention kernels.

pykernsformer | Attention          | Formula | Citation       |
|-------------|--------------------|---------|----------------|
| `attention` | Regular            | $softmax(\frac{QK^T}{\sqrt{d_k}})V$   | Vaswani et al. |
| `attention_linear` | Linear             | $\frac{QK^T}{\sum_k QK^T}V$  |                |
| `attention_periodic` | Periodic           | $softmax(-\frac{2\sin^2(\pi\frac{\sqrt{2 - 2q_ik_j^T}}{p})}{\sqrt{d_k}})V$ |                |
| `attention_LP` | Locally Periodic     | $softmax(-\frac{2\sin^2(\pi\frac{\sqrt{2 - 2\hat{q}_i\hat{k}_j^T}}{p})}{\sqrt{d_k}} + \frac{{q_i}{k_j^T}}{\sqrt{d_k}})V$ |                |
| `attention_RQ` | Rational Quadratic | $\frac{\left( 1 + \frac{1}{\alpha \sqrt{d_k}} - \frac{2QK^T}{2 \alpha \sqrt{d_k}} \right)^{-\alpha}}{\sum_k \left( 1 + \frac{1}{\alpha \sqrt{d_k}} - \frac{2QK^T}{2 \alpha \sqrt{d_k}} \right)^{-\alpha}}V$ |                |

You can also implement your own attention function with the following signature:

```python
def attention_custom(query, key, value, mask=None, dropout=None):

    [...]

    p_attn = [...] # the attention matrix

    [...]

    return torch.matmul(p_attn, value), p_attn
```


