Metadata-Version: 2.1
Name: transtab
Version: 0.0.1
Summary: A flexible tabular prediction model that handles variable-column input tables.
Home-page: https://github.com/RyanWangZf/transtab
Author: Zifeng Wang
Author-email: zifengw2@illinois.edu
License: UNKNOWN
Keywords: tabular data,machine learning,data mining,data science
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: loguru
Requires-Dist: numpy
Requires-Dist: scikit-learn
Requires-Dist: setuptools
Requires-Dist: transformers
Requires-Dist: tqdm
Requires-Dist: pandas (>=1.3.*)
Requires-Dist: openml (>=0.10.0)
Requires-Dist: torch (~=1.0)

# TransTab: A flexible tabular prediction model

Document is available at https://transtab.readthedocs.io/en/latest/index.html.

This repository provides the python package `transtab` for flexible tabular prediction model. The basic usage of `transtab` can be done in a couple of lines!

```python
import transtab

# load dataset by specifying dataset name
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
     = transtab.load_data('credit-g')

# build classifier
model = transtab.build_classifier(cat_cols, num_cols, bin_cols)

# start training
transtab.train(model, trainset, valset, **training_arguments)

# make predictions, df_x is a pd.DataFrame with shape (n, d)
# return the predictions ypred with shape (n, 1) if binary classification;
# (n, n_class) if multiclass classification.
ypred = transtab.predict(model, df_x)
```

It's easy, isn't it?



## Transfer learning across tables

A novel feature of `transtab` is its ability to learn from multiple distinct tables. It is easy to trigger the training like

```python
# load the pretrained transtab model
model = transtab.build_classifier(checkpoint='./ckpt')

# load a new tabular dataset
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
     = transtab.load_data('credit-approval')

# update categorical/numerical/binary column map of the loaded model
model.update({'cat':cat_cols,'num':num_cols,'bin':bin_cols})

# then we just trigger the training on the new data
transtab.train(model, trainset, valset, **training_arguments)
```



## Contrastive pretraining on multiple tables

We can also conduct contrastive pretraining on multiple distinct tables like

```python
# load from multiple tabular datasets
dataname_list = ['credit-g', 'credit-approval']
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
     = transtab.load_data(dataname_list)

# build contrastive learner, set supervised=True for supervised VPCL
model, collate_fn = transtab.build_contrastive_learner(
    cat_cols, num_cols, bin_cols, supervised=True)

# start contrastive pretraining training
transtab.train(model, trainset, valset, collate_fn=collate_fn, **training_arguments)
```

## Citation

If you find this package useful, please consider citing the following paper:

```latex
@article{wang2022transtab,
	title = {TransTab: Learning Transferable Tabular Transformers Across Tables},
	author = {Wang, Zifeng and Sun, Jimeng},
     journal={arXiv preprint arXiv:2205.09328},
	year = {2022},
}
```


