torch
local-attention>=1.0.3
product-key-memory
mixture-of-experts>=0.2.0
axial-positional-embedding>=0.1.0
