Metadata-Version: 2.4
Name: lazyqsar
Version: 1.0
Summary: A library to quickly build QSAR models
Author-email: Ersilia Open Source Initiative <hello@ersilia.io>, Miquel Duran Frigola <miquel@ersilia.io>, Gemma Turon Rodrigo <gemma@ersilia.io>, Abel Legese Shibiru <abel@ersilia.io>
License: GPLv3
Project-URL: Source Code, https://github.com/ersilia-os/lazy-qsar
Keywords: qsar,machine-learning,chemistry,computer-aided-drug-design
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests==2.32.3
Requires-Dist: xgboost==3.0.2
Requires-Dist: joblib==1.5.1
Requires-Dist: pandas==2.3.0
Requires-Dist: scikit-learn==1.6.1
Requires-Dist: optuna==4.4.0
Requires-Dist: h5py==3.14.0
Requires-Dist: psutil==7.0.0
Requires-Dist: flaml==2.3.5
Provides-Extra: tune-tables
Provides-Extra: descriptors
Requires-Dist: numpy==1.26.4; extra == "descriptors"
Requires-Dist: rdkit==2023.9.5; extra == "descriptors"
Requires-Dist: chemprop<=2.2.0; extra == "descriptors"
Dynamic: license-file

# Ersilia's LazyQSAR

A library to build supervised models for chemistry fastly.

## Installation

Install LazyQSAR from source:

```bash
git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .
```

To use the default Lazy QSAR descriptors, please install them:
```bash
python -m pip install -e .[descriptors]
```

And to use a light version of [TuneTables](https://github.com/ersilia-os/TuneTablesLight/tree/main) as an estimator, also please install it:
```bash
pip install "lazyqsar[tune-tables]"
pip install "git+https://github.com/ersilia-os/TuneTablesLight.git@main"
```

## Binary Classification

LazyQSAR's binary classifier can run either with default descriptors or with custom descriptors passed by the user.

### Built-in descriptors

Instantiate the LazyBinaryQSAR class with either of the available descriptors (Chemeleon or Morgan fingerprints) and estimators (Logistic Regression, Random Forest or Tune Tables) and simply fit and predict results:

```python
import lazyqsar

model = lazyqsar.LazyBinaryQSAR(
    descriptor_type="chemeleon", model_type="logistic_regression"
    )
model.fit(smiles_train, y_train)
model.save_model(model_path)
y_hat = model.predict_proba(smiles_test)
```

### Custom-made descriptors
Pre-calculate your descriptors using the preferred method. We recommend using the [Ersilia Model Hub](https://github.com/ersilia-os/ersilia) to that end. The `.h5` format generated by Ersilia can be directly passed to the LazyQSAR pipeline, or, alternatively, an array with the descriptors.

```python
import lazyqsar

X_train = "my_descriptors" #path to descriptors
X_test = "my_descriptors" #path to descriptors

 model = lazyqsar.LazyBinaryClassifier(
    model_type="logistic_regression"
    )
model.fit(X_train, y_train)
model.save_model(model_path)
y_hat = model.predict_proba(X_test)
```

## Tests and benchmarks

In the `/benchmark` folder you will find the performance of the default estimators and descriptors on the TDCommons ADMET dataset. In the `/tests` folder you can find a quick implementation of the methods described for easily checking any change in the code. The Bioavailability dataset is used as an example. 

## Disclaimer

This library is only intended for quick-and-dirty QSAR modeling. For a more complete automated QSAR modeling, please refer to [Zaira Chem](https://github.com/ersilia-os/zaira-chem)

## About us

Learn about the [Ersilia Open Source Initiative](https://ersilia.io)!
