Metadata-Version: 2.4
Name: lazyqsar
Version: 2.1.2
Summary: A library to quickly build QSAR models
License: GPLv3
License-File: LICENSE
Keywords: qsar,machine-learning,chemistry,computer-aided-drug-design
Author: Ersilia Open Source Initiative
Author-email: hello@ersilia.io
Requires-Python: >=3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: descriptors
Requires-Dist: chemprop (<=2.2.0) ; extra == "descriptors"
Requires-Dist: h5py (==3.14.0)
Requires-Dist: joblib (==1.5.1)
Requires-Dist: loguru (==0.7.3)
Requires-Dist: numpy (==2.1.3)
Requires-Dist: onnxconverter-common (==1.16.0)
Requires-Dist: onnxruntime (==1.20.1)
Requires-Dist: optuna (==4.4.0)
Requires-Dist: pandas (==2.3.0)
Requires-Dist: psutil (==7.0.0)
Requires-Dist: rdkit (==2025.9.1) ; extra == "descriptors"
Requires-Dist: rich (==14.1.0)
Requires-Dist: scikit-learn (==1.6.1)
Requires-Dist: skl2onnx (==1.19.1)
Requires-Dist: torch (==2.8.0)
Project-URL: Source Code, https://github.com/ersilia-os/lazy-qsar
Description-Content-Type: text/markdown

# Ersilia's LazyQSAR

A library to build supervised models for chemistry fastly.

## Installation

Install LazyQSAR from source:

```bash
git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .
```

To use the default Lazy QSAR descriptors, please install them:
```bash
python -m pip install -e .[descriptors]
```

## Binary Classification

LazyQSAR's binary classifier can run either with default descriptors or with custom descriptors passed by the user.

### Built-in descriptors

Instantiate the LazyBinaryQSAR class with a mode of choice (`fast`, `default`, `slow`):

```python
from lazyqsar.qsar import LazyBinaryQSAR

model = LazyBinaryQSAR(mode="fast")
model.fit(smiles_list=smiles_train, y=y_train)
y_hat = model.predict_proba(smiles_list=smiles_test)[:,1]
```

### Custom-made descriptors

Pre-calculate your descriptors using the preferred method. We recommend using the [Ersilia Model Hub](https://github.com/ersilia-os/ersilia) to that end. The `.h5` format generated by Ersilia can be directly passed to the LazyQSAR pipeline. Alternatively, just pass the descriptors as an array in-memory.

```python
from lazyqsar.agnostic import LazyBinaryClassifier

model = LazyBinaryClassifier()
model.fit(X=X_train, y=y_train)
y_hat = model.predict_proba(X=X_test)[:,1]
```

### Using saved models at inference time

By default, models are saved as ONNX files. When a model is trained, you can simply load it using an artifact. In this case, the only crucial dependency is the ONNX runtime.

To save a model, simply run:

```python
model.save(model_dir)
```

This will create a folder with ONNX files in it. You can use with the artifact.

```python
from lazyqsar.artifacts import LazyBinaryClassifierArtifact

model = LazyBinaryClassifier.load(model_dir)
y_hat = model.predict_proba(X=X)[:,1]
```

## Tests and benchmarks

### Quick testing

In the `/tests` folder you can find a quick implementation of the methods described for easily checking that code is working. The Bioavailability dataset and Chemeleon descriptors are used as an example.

```bash
python test/test_binary_classification.py
python test/test_binary_classification.py --agnostic
```

### Benchmarking

In the [benchmark repository](https://github.com/ersilia-os/zaira-chem-tdc-benchmark) you will find the performance of the default estimators and descriptors on the TDCommons ADMET dataset. This is a provisional benchmark. The team is working on a more exhaustive one.

## Disclaimer

This library is only intended for quick-and-dirty QSAR modeling. For a more complete automated QSAR modeling, please refer to [Zaira Chem](https://github.com/ersilia-os/zaira-chem).

## About us

Learn about the [Ersilia Open Source Initiative](https://ersilia.io)!

