Metadata-Version: 2.4
Name: lazyqsar
Version: 2.0.2
Summary: A library to quickly build QSAR models
License: GPLv3
License-File: LICENSE
Keywords: qsar,machine-learning,chemistry,computer-aided-drug-design
Author: Ersilia Open Source Initiative
Author-email: hello@ersilia.io
Requires-Python: >=3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: descriptors
Requires-Dist: chemprop (<=2.2.0) ; extra == "descriptors"
Requires-Dist: h5py (==3.14.0)
Requires-Dist: joblib (==1.5.1)
Requires-Dist: loguru (==0.7.3)
Requires-Dist: numpy (==2.1.3)
Requires-Dist: onnxruntime (==1.20.1)
Requires-Dist: optuna (==4.4.0)
Requires-Dist: pandas (==2.3.0)
Requires-Dist: psutil (==7.0.0)
Requires-Dist: rdkit (==2025.9.1) ; extra == "descriptors"
Requires-Dist: rich (==14.1.0)
Requires-Dist: scikit-learn (==1.6.1)
Requires-Dist: skl2onnx (==1.19.1)
Requires-Dist: torch (==2.8.0)
Project-URL: Source Code, https://github.com/ersilia-os/lazy-qsar
Description-Content-Type: text/markdown

# Ersilia's LazyQSAR

A library to build supervised models for chemistry fastly.

## Installation

Install LazyQSAR from source:

```bash
git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .
```

To use the default Lazy QSAR descriptors, please install them:
```bash
python -m pip install -e .[descriptors]
```

## Binary Classification

LazyQSAR's binary classifier can run either with default descriptors or with custom descriptors passed by the user.

### Built-in descriptors

Instantiate the LazyBinaryQSAR class with either of the available descriptors (`chemeleon` or `morgan`) and mode (`fast`, `default`, `slow`):

```python
from lazyqsar.qsar import LazyBinaryQSAR

model = LazyBinaryQSAR(descriptor_type="chemeleon")
model.fit(smiles_list=smiles_train, y=y_train)
model.save(model_dir)
y_hat = model.predict_proba(smiles_list=smiles_test)[:,1]
```

### Custom-made descriptors
Pre-calculate your descriptors using the preferred method. We recommend using the [Ersilia Model Hub](https://github.com/ersilia-os/ersilia) to that end. The `.h5` format generated by Ersilia can be directly passed to the LazyQSAR pipeline, or, alternatively, an array with the descriptors.

```python
from lazyqsar.agnostic import LazyBinaryClassifier

model = LazyBinaryClassifier()
model.fit(X=X_train, y=y_train)
model.save(model_dir)
y_hat = model.predict_proba(X=X_test)[:,1]
```

### Using saved models at inference time
By default, models are saved as ONNX files. When a model is trained, you can simply load it using an artifact. In this case, the only crucial dependency is `onnxruntime`.

```python
from lazyqsar.artifacts import LazyBinaryClassifierArtifact

model = LazyBinaryClassifier.load(model_dir)
y_hat = model.predict_proba(X=X)[:,1]
```

## Tests and benchmarks

In the benchmark repository you will find the performance of the default estimators and descriptors on the TDCommons ADMET dataset. In the `/tests` folder you can find a quick implementation of the methods described for easily checking any change in the code. The Bioavailability dataset and chemeleon descriptors are used as an example. 

## Disclaimer

This library is only intended for quick-and-dirty QSAR modeling. For a more complete automated QSAR modeling, please refer to [Zaira Chem](https://github.com/ersilia-os/zaira-chem)

## About us

Learn about the [Ersilia Open Source Initiative](https://ersilia.io)!

