Metadata-Version: 2.1
Name: hat-cl
Version: 0.1.0
Summary: HAT (Hard Attention to the Task) Modules for Continual Learning
License: MIT
Author: Xiaotian Duan
Author-email: xduan7@gmail.com
Requires-Python: >=3.9,<3.11
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Dist: numpy (>=1.23,<2.0)
Requires-Dist: timm (>=0.6.12,<0.7.0)
Requires-Dist: torch (>=1.9.0,<2.0.0)
Description-Content-Type: text/markdown

# HAT-CL

Re-design of the hard-attention-to-the-task (HAT) mechanism for continual learning (CL).







## limitations so far:


### Refreshing optimizer state

It is highly advised to refresh the optimizer state after every task, to avoid carrying over the momentum from the previous task.
This can be done by simply re-initializing the optimizer.

[feature_importance.ipynb](examples%2Ffeature_importance.ipynb)
### Weight decay (L2 regularization)

Cannot use weight decay (L2 regularization) due to how weight decay works.
The way weight decay works is that, it adds the weight parameter values to the gradients:

```python3

for i, param in enumerate(params):
    
    # Maximize the params based on the objective if necessary
    # ...

    # noinspection PyUnresolvedReferences
    d_p = d_p.add(param, alpha=weight_decay)
    
    # Add momentum if necessary
    # ...
    
    # Update the params
    param.add_(d_p, alpha=-lr)

```

The code segment above is what happens in the step function. 
At this point, the gradient is already modified (compensated or conditioned) by the HAT mechanism.
However, the weight decay changes the gradient on top of the mechanism.
This might change the parameters that are supposed to be locked out by the HAT mechanism, and therefore causing forgetting.



