Metadata-Version: 2.1
Name: opt-einsum-torch
Version: 0.1.0
Summary: Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.
Home-page: http://github.com/hhaoyan/opt-einsum-torch
Author: Haoyan Huo
Author-email: hhaoyann@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Environment :: GPU
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# opt-einsum-torch

There have been many implementations of Einstein's summation. numpy's 
`numpy.einsum` is the least efficient one as it only runs in single thread on 
CPU. PyTorch's `torch.einsum` works for both CPU and CUDA tensors. However,
since there is no virtual CUDA memory, `torch.einsum` will run out of CUDA 
memory for large tensors. 

This code aims at implementing a memory-efficient `einsum` function using
PyTorch as the backend. This code also uses the `opt_einsum` package to 
optimizes the contraction path to achieve the minimal FLOPS.

### Usage

```python
from opt_einsum_torch import EinsumPlanner
import torch

# Some huge tensors
arr1, arr2 = ..., ...
ee = EinsumPlanner(torch.device('cuda:0'), cuda_mem_limit=0.9)
result = ee.einsum('ijk,jkl->il', arr1, arr2)

```

The resulting tensor `result` will be a PyTorch CPU tensor. You could convert
it into numpy array by simply calling `result.numpy()`.

### Future works

- Support multiple GPUs.
- Memory efficient einsum kernels.
- CUDA data transfer profilers.

