Metadata-Version: 2.4
Name: tipeft
Version: 0.0.5
Summary: Tabular-Infused Parameter Efficient Finetuning (tipeft)
Author: Charles Alba
Author-email: alba@wustl.edu
Keywords: Parameter Efficient Finetuning,PEFT,AI in Medicine,AI in Healthcare,Postoperative Risk Prediction,IA3,LORA
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Requires-Python: >=3.9
Description-Content-Type: text/x-rst
License-File: license.txt
Requires-Dist: numpy>=2.0.2
Requires-Dist: pandas>=2.2.2
Requires-Dist: scikit-learn>=1.5
Requires-Dist: tqdm>=4.67
Requires-Dist: torch==2.8.0
Requires-Dist: transformers==4.57.0
Requires-Dist: peft==0.17.1
Requires-Dist: accelerate==1.10.1
Requires-Dist: evaluate==0.4.2
Requires-Dist: datasets==2.21.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: keywords
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

tipeft

======



Tabular-infused Parameter Efficient Finetuning (tipeft) is a novel PEFT method designed to infuse tabular features into the initialization process of re-parameterization parameter efficient finetuning (PEFT) methods.



.. image:: https://raw.githubusercontent.com/cja5553/peft_postoperative_risk_prediction/main/Figure_1.jpg





It is specifically designed for postoperative predictions in clinical care, where predictive and valuable pre-operative tabular features are often under-utilized in language model finetuning. For now, it supports both ``LoRA`` and ``IA3``.



Requirements

============



Dependencies

------------



The following Python packages are required for ``tipeft``:



- torch

- transformers

- peft

- accelerate

- numpy

- pandas

- scikit-learn

- tqdm



Install dependencies with:



.. code-block:: bash



   pip install torch transformers peft accelerate numpy pandas scikit-learn tqdm



Note on PyTorch installation

----------------------------



Because PyTorch wheels vary by CUDA version and hardware, it is recommended to install PyTorch manually following the instructions at:



https://pytorch.org/



System Requirements

-------------------



``tipeft`` has been tested and verified on the following configuration:



+-----------+----------------+

| Component | Tested Version |

+===========+================+

| OS        | Windows 10     |

+-----------+----------------+

| Python    | 3.9.19         |

+-----------+----------------+

| CUDA      | 12.6           |

+-----------+----------------+



Important Notes

---------------



- **Environment**: Must be run in a Jupyter notebook. Running as a standalone Python script may cause multiprocessing issues.

- **CPU cores**: At least 10 CPU cores recommended (uses ``Pool(processes=10)`` internally).

- **GPU**: CUDA-compatible GPU required.

- **OS**: Tested on Windows. Linux/Mac compatibility not yet verified.



Known Compatibility Limitations

-------------------------------



1. **Jupyter only** - Uses ``tqdm.notebook`` which may not display correctly outside Jupyter.

2. **Multiprocessing** - May behave differently on Linux/Mac due to different multiprocessing backends.



GPU Requirements

----------------



``tipeft`` is designed for GPU acceleration.



- At least 1 GPU is recommended

- Suggested minimum: 16GB VRAM

- Memory usage depends on:

  - sequence length

  - model size

  - batch size

  - peft configuration



Installation

============



To install in python, simply do the following:



.. code-block:: bash



   pip install tipeft



Usage

=====



``train_tabular_infused_IA3``

-----------------------------



Trains a tabular-infused IA3 model for binary classification.



.. code-block:: python



   from tipeft import train_tabular_infused_IA3



   model, tokenizer = train_tabular_infused_IA3(

       train=train_df,

       val=val_df,

       pretrained_model_name="emilyalsentzer/Bio_ClinicalBERT",

       label_col="in_hospital_mortality",

       text_col="clinical_notes",

       columns_unique_labels_of_tabular_features={

           "gender": 2,

           "insurance": 3,

           "marital_status": 4,

           "anchor_age": 1,

           "anchor_year": 1

       },

       lr=0.001,

       num_epochs=5,

       lr_of_tabular_infused_features=0.0001

   )



Parameters

----------



+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+

| Parameter                                 | Type              | Description                                                                              |

+===========================================+===================+==========================================================================================+

| ``train``                                 | pandas.DataFrame  | Training dataframe containing text, label, and tabular feature columns                  |

+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+

| ``val``                                   | pandas.DataFrame  | Validation dataframe with same structure as train                                       |

+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+

| ``pretrained_model_name``                 | str               | Base model to fine-tune. Supports Bio_ClinicalBERT or BioGPT                            |

+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+

| ``label_col``                             | str               | Column name of the binary outcome label (must contain True/False values)               |

+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+

| ``text_col``                              | str               | Column name containing the clinical text                                                |

+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+

| ``columns_unique_labels_of_tabular_features`` | dict          | Map feature → num unique values (1 continuous, >1 categorical)                          |

+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+

| ``lr``                                    | float             | Learning rate (default: 0.001)                                                          |

+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+

| ``num_epochs``                            | int               | Epochs (default: 5)                                                                     |

+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+

| ``lr_of_tabular_infused_features``        | float             | LR for tabular pre-training (default: 0.0001)                                           |

+-------------------------------------------+-------------------+------------------------------------------------------------------------------------------+



Returns

-------



+------------+--------------+---------------------------+

| Return     | Type         | Description               |

+============+==============+===========================+

| ``model``  | PeftModel    | The trained IA3 model     |

+------------+--------------+---------------------------+

| ``tokenizer`` | AutoTokenizer | The tokenizer for the model |

+------------+--------------+---------------------------+



Notes

-----



- The ``label_col`` must contain boolean values (True/False)

- Categorical features should have ``>1`` unique labels

- Continuous features should use ``1`` in the mapping dictionary

- Ensure all categorical values appear in both train and val sets

- The trained model is saved to ``trained_models/IA3_{pretrained_model_name}_{label_col}``



Questions?

==========



Contact me at alba@wustl.edu

