Metadata-Version: 2.1
Name: biofilter-loki
Version: 3.0.0
Summary: Biofilter v3 – Legacy Edition, modernized with Poetry and APSW
Home-page: https://ritchielab.org
License: MIT
Author: Ritchie Lab
Author-email: Software_RitchieLab@pennmedicine.upenn.edu
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: apsw (==3.46.1.0)
Requires-Dist: click (==8.1.7)
Requires-Dist: iniconfig (==2.0.0)
Requires-Dist: packaging (==24.1)
Requires-Dist: platformdirs (==4.3.6)
Requires-Dist: pluggy (==1.5.0)
Requires-Dist: wget (>=3.2,<4.0)
Description-Content-Type: text/markdown

# 🧬 Biofilter-LOKI 3.0.0

**Biofilter-LOKI 3.0.0** is a lightweight, command-line–driven knowledge base builder designed to support **BioBin** and other legacy Biofilter workflows.  
This version preserves the traditional **LOKI architecture**, while modernizing the codebase and deployment for current HPC environments.

---

## 🎯 Purpose & Design Goals

Biofilter-LOKI 3.0.0 was built to:

- Maintain **full compatibility with BioBin**
- Preserve the **classic LOKI data model**
- Provide a **simple CLI-based workflow**
- Support **HPC module deployments**
- Enable **rapid database builds** for analysis pipelines


---

## 🧠 What Is LOKI?

LOKI (Library Of Knowledge Integration) is the **knowledge ingestion engine** behind Biofilter.  
It builds a **SQLite knowledge database** by integrating multiple biological data sources, such as:

- SNP ↔ Gene
- Gene ↔ Pathway
- Gene ↔ Ontology
- Identifier mappings across databases


---

## 🏗️ Architecture Overview

```

External Sources
        │
        ▼
┌──────────────┐
│  loki-build  │  ← CLI entry point
└──────────────┘
        │
        ▼
┌─────────────────────┐
│ SQLite Knowledge DB │
│    (LOKI schema)    │
└─────────────────────┘
        │
        ▼
┌──────────────┐
│   Biofilter  │  ← CLI entry point
└──────────────┘

````

Key characteristics:

- **SQLite backend**
- **Immutable batch loads**
- **No entity-level curation**
- **Optimized for downstream queries**

---

## 📦 Included Data Sources

Depending on build options, Biofilter-LOKI can ingest:

- **dbSNP**
- **Entrez Gene**
- **Gene Ontology (GO)**
- **Pathways** (KEGG / Reactome, if enabled)
- **Chain files** (genome build liftover)
- **Identifier mappings**

The available sources depend on how the package was built and deployed.

---

## 🚀 Installation

### Option 1 — HPC Module (Recommended)

```bash
module load biofilter/3.0.0
````

Verify installation:

```bash
loki-build --version
```

---

### Option 2 — Python Environment

```bash
pip install biofilter==3.0.0
```

or using Conda:

```bash
conda install -c conda-forge biofilter
```


---

## 🛠️ Building a Knowledge Database

Basic example:

```bash
loki-build \
  --knowledge loki.db \
  --load dbsnp entrez go
```

Update existing database:

```bash
loki-build \
  --knowledge loki.db \
  --update
```

Build from an archive:

```bash
loki-build \
  --from-archive loki_sources.tar.gz \
  --knowledge loki.db
```

---

## 🔍 Common CLI Options

| Option           | Description              |
| ---------------- | ------------------------ |
| `--knowledge`    | Output SQLite database   |
| `--load`         | Load specific sources    |
| `--update`       | Update existing DB       |
| `--from-archive` | Load from source archive |
| `--to-archive`   | Save source archive      |
| `--no-optimize`  | Skip DB optimization     |
| `--verbose`      | Verbose logging          |

Run `loki-build --help` for full details.

---

## 🧪 Integration with BioBin

Biofilter-LOKI 3.0.0 is **the reference backend** for:

* **BioBin 2.x**
* Existing LOKI-based analysis pipelines
* Legacy workflows used in ADSP, ECHO, and related projects

Example:

```bash
biobin \
  --settings biobin.conf \
  --knowledge loki.db
```

---

## ❌ Known Limitations

* No entity-level conflict resolution
* No incremental curation
* No Parquet or OLAP support
* No variant-level functional annotations (e.g. VEP)
* Schema is not extensible without breaking compatibility

These limitations are **intentional**, to preserve stability.

---

## 🔮 Future Direction

All future innovation is happening in:

**👉 Biofilter3R**

Key differences:

| Biofilter-LOKI    | Biofilter3R         |
| ----------------- | ------------------- |
| SQLite            | PostgreSQL / hybrid |
| Immutable batches | Master entities     |
| CLI only          | CLI + Python        |
| BioBin-focused    | Multi-domain        |
| Legacy schema     | Modern relational   |

---

## 📚 Documentation

* BioBin documentation
* Internal Ritchie Lab Confluence pages
* Historical Biofilter publications

---

## 🧑‍🔬 Maintainers

Developed and maintained by the **Ritchie Lab**
University of Pennsylvania

---

## 📜 License

Distributed under the original Biofilter license.
See `LICENSE` file for details.



---

## Development Documentation in:

https://ritchielab.github.io/biofilter/
