Metadata-Version: 2.4
Name: mombai
Version: 3.0.0
Summary: A deep learning library for advanced neural network layers.
Home-page: https://github.com/joaquinsc999/mombai
Author: Joaquín Francisco Solórzano Corea
Author-email: joaquinscorea@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tensorflow>=2.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Mombai

A deep learning library for multi-activation neural network layers built on TensorFlow/Keras.

Mombai implements two families of layers that combine multiple activation functions in a single transformation:

- **WAF (Weighted Activation Fusion)**: Applies multiple activations with learned per-branch affine transforms, then compresses via sum or average.
- **MoA (Mixture of Activations)**: Input-dependent activation selection using attention mechanisms.

## Installation

```bash
pip install mombai
```

## Layers

### WAFLayer
Weighted Activation Fusion. Each activation branch has its own learned scale and bias.

```python
from mombai import WAFLayer

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape=(10,)),
    WAFLayer(units=32, activations=['relu', 'swish', 'gelu'], compressor="sum"),
    tf.keras.layers.Dense(1)
])
```

### WAFLayerNorm / WAFLayerNormV2
WAF with normalization. `WAFLayerNorm` normalizes before activations (pre-norm), `WAFLayerNormV2` normalizes after each activation branch (post-norm).

```python
from mombai import WAFLayerNormV2

layer = WAFLayerNormV2(units=64, activations=['relu', 'gelu', 'swish'], compressor="avg")
```

### MAXLayerWithAttention (MoA-USM)
Input-dependent gating via Dense + softmax. Each input sample gets its own activation mixture weights.

```python
from mombai import MAXLayerWithAttention

layer = MAXLayerWithAttention(units=64, activations=['relu', 'swish', 'gelu', 'tanh', 'sigmoid'])
```

### MAXLayerWithSelfAttention (MoA-GSM V1)
Self-attention over activation branches. Best for sequential data (T > 1).

```python
from mombai import MAXLayerWithSelfAttention

layer = MAXLayerWithSelfAttention(units=32, activations=['relu', 'swish', 'gelu'])
```

### MAXLayerWithSelfAttentionV2 (MoA-GSM V2)
Self-attention with two modes:
- `attention_mode="channel"`: Features attend to each other to produce gates. **Best for non-sequential (tabular/image) data.**
- `attention_mode="temporal"`: Standard temporal self-attention. Best for sequential data.

```python
from mombai import MAXLayerWithSelfAttentionV2

# For tabular/image data (most common)
layer = MAXLayerWithSelfAttentionV2(
    units=64,
    activations=['relu', 'swish', 'gelu', 'tanh', 'sigmoid'],
    attention_mode="channel"
)

# For sequential data
layer = MAXLayerWithSelfAttentionV2(
    units=64,
    activations=['relu', 'swish', 'gelu'],
    attention_mode="temporal"
)
```

## Supported Activations

relu, sigmoid, tanh, softmax, softplus, softsign, elu, selu, swish, gelu, leaky_relu, relu6, hard_sigmoid, exponential, linear, log_softmax

## Requirements

- Python >= 3.8
- TensorFlow >= 2.0

## License

MIT License. See LICENSE for details.
