Metadata-Version: 2.1
Name: tabular-tf
Version: 0.1.0
Summary: TF Tabular simplifies the experimentation and preprocessing of tabular datsets for TensorFlow models.
Author-email: Mathias Claassen <mathias@xmartlabs.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: numpy >= 1.26
Requires-Dist: pandas >= 2.2.1
Requires-Dist: tensorflow <= 2.15.1
Requires-Dist: jupyter ; extra == "examples"
Requires-Dist: tensorflow_datasets >= 4.9.4 ; extra == "examples"
Requires-Dist: tensorflow-recommenders >= 0.7.3 ; extra == "examples"
Requires-Dist: matplotlib >= 3.8.4 ; extra == "examples"
Requires-Dist: ruff>=0.2.0 ; extra == "test"
Requires-Dist: mypy==1.10.0 ; extra == "test"
Requires-Dist: pre-commit==3.7.1 ; extra == "test"
Requires-Dist: pytest-cov==5.0.0 ; extra == "test"
Requires-Dist: pytest-mock<3.14.1 ; extra == "test"
Requires-Dist: pytest-runner ; extra == "test"
Requires-Dist: pytest==8.2.2 ; extra == "test"
Requires-Dist: pytest-github-actions-annotate-failures ; extra == "test"
Project-URL: Documentation, https://github.com/mats-claassen/tf-tabular/tree/main#readme
Project-URL: Source, https://github.com/mats-claassen/tf-tabular
Project-URL: Tracker, https://github.com/mats-claassen/tf-tabular/issues
Provides-Extra: examples
Provides-Extra: test

# TF Tabular

TF Tabular is a project aimed at simplifying the process of handling tabular data in TensorFlow. It provides utilities for building models on top of numeric, categorical, multihot, and sequential data types.

## Features

- **Create input layers based on lists of columns**
- **Support custom embeddings**: Useful to include external embeddings for example obtained from an LLM.
- **Support sequence layers**: Useful for time series or when building recommenders on top of user interaction data.
- **Support multi-hot categorical columns**
- **No model building or training**: Build whatever you want on top


## Installation

To get started with TF Tabular, you will need to install it using pip:

```sh
pip install tabular-tf
```

## Usage

Here is a basic example of how to use TF Tabular:

```python
from tf_tabular.builder import InputBuilder

# Define columns to use and specify additional parameters:
categoricals = ['Pclass', 'Embarked']
numericals = ['Age', 'Fare']
# ....

# Build model:
input_builder = InputBuilder()
input_builder.add_inputs_list(categoricals=categoricals,
                              numericals=numericals,
                              normalization_params=norm_params,
                              vocabs=vocabs,
                              embedding_dims=embedding_dims)
inputs, output = input_builder.build_input_layers()
output = Dense(1, activation='sigmoid')(output)

model = Model(inputs=inputs, outputs=output)
```

Which will produce a model like this:
![Netron Model View](/media/images/example_netron.png)


## Examples
The **examples** folder includes more complete examples including:
* [Titanic](examples/titanic/titanic.ipynb): A simple binary classification example using the Titanic dataset.
* [MovieLens](examples/movielens/movielens.ipynb): A two tower retrieval model using the MovieLens dataset.
* [MovieLens Sequential](examples/sequential/movielens_sequential.ipynb): Another two tower retrieval model build on the MovieLens dataset preprocessed so that the input of the model is the list of movies the user has interacted with.


## Contributing
Contributions to TF Tabular are welcome. Check out the [contributing](https://github.com/xmartlabs/tf_tabular/blob/main/CONTRIBUTING.md) guidelines for more details.

### Setting up local development environment
To set up a local development environment, you will need to first clone the repo and then install the required dependencies:
1. Install Poetry follow the instructions on the [official Poetry website](https://python-poetry.org/docs/#installation).
2. Run `poetry install`
3. Run `poetry run pre-commit install` to install git pre-commit

## Roadmap:
This is a list of possible features to be added in the future depending on need and interest expressed by the community.

- [ ] Parse dataset to separate numeric vs categoricals, multihots and sequencials
- [ ] Implement other types of normalization
- [ ] Support computing vocab and normalization params instead of receiving them as parameters
- [ ] Improve documentation and provide more usage examples

## License
TF Tabular is licensed under the MIT License. See the LICENSE file for more details.

