Metadata-Version: 2.4
Name: giql
Version: 0.1.1
Summary: Genomic Interval Query Language - SQL dialect for genomic range queries
Author-email: Nezar Abdennur <nabdennur@gmail.com>, Conrad Bzura <conradbzura@gmail.com>
Maintainer-email: Nezar Abdennur <nabdennur@gmail.com>, Conrad Bzura <conradbzura@gmail.com>
Requires-Python: >=3.11
Requires-Dist: click>=8.3.0
Requires-Dist: duckdb>=1.4.0
Requires-Dist: oxbow>=0.4.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: psycopg2-binary>=2.9.10
Requires-Dist: sqlglot>=20.0.0
Requires-Dist: sqlparse>=0.4.0
Provides-Extra: all
Requires-Dist: duckdb>=0.9.0; extra == 'all'
Requires-Dist: mysql-connector-python>=8.0.0; extra == 'all'
Requires-Dist: psycopg2-binary>=2.9.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: hypothesis; extra == 'dev'
Requires-Dist: pybedtools; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: duckdb
Requires-Dist: duckdb>=0.9.0; extra == 'duckdb'
Provides-Extra: mysql
Requires-Dist: mysql-connector-python>=8.0.0; extra == 'mysql'
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9.0; extra == 'postgres'
Provides-Extra: sqlite
Description-Content-Type: text/markdown

# GIQL - Genomic Interval Query Language

A SQL dialect for genomic range queries with multi-database support.


## Overview

GIQL extends SQL with spatial operators for genomic interval queries. It transpiles to standard SQL that works across multiple database backends including DuckDB and SQLite.

GIQL provides a familiar SQL syntax for bioinformatics workflows, allowing you to express complex genomic range operations without writing intricate SQL expressions. Whether you're filtering variants by genomic region, finding overlapping features, or calculating distances between intervals, GIQL makes these operations intuitive and portable across databases.

## Features

- **SQL-based**: Familiar SQL syntax with genomic extensions
- **Multi-backend**: Works with DuckDB, SQLite, and more
- **Spatial operators**: INTERSECTS, CONTAINS, WITHIN for range relationships
- **Distance operators**: DISTANCE, NEAREST for proximity queries
- **Aggregation operators**: CLUSTER, MERGE for combining intervals
- **Set quantifiers**: ANY, ALL for multi-range queries
- **Transpilation**: Convert GIQL to standard SQL for debugging or external use

## Installation

### From PyPI

Install the latest stable release:

```bash
pip install giql
```

Or the latest release candidate:

```bash
pip install --pre giql
```

### From Source

Clone the repository and install locally:

```bash
# Clone the repository
git clone https://github.com/abdenlab/giql.git
cd giql

# Install in development mode
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"
```

### Building Documentation

To build the documentation locally:

```bash
cd docs

# Install documentation dependencies
pip install -r requirements.txt

# Build HTML documentation
make html

# View the documentation
# The built docs will be in docs/_build/html/
# Open docs/_build/html/index.html in your browser
```

## Quick Start

```python
from giql import GIQLEngine

# Create engine with DuckDB backend
with GIQLEngine(target_dialect="duckdb") as engine:
    # Load genomic data
    engine.load_csv("variants", "variants.csv")
    engine.register_table_schema(
        "variants",
        {
            "id": "INTEGER",
            "chromosome": "VARCHAR",
            "start_pos": "BIGINT",
            "end_pos": "BIGINT",
        },
        genomic_column="interval",
    )

    # Query with genomic operators (returns cursor for streaming)
    cursor = engine.execute("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

    # Process results lazily
    for row in cursor:
        print(row)

    # Or just transpile to SQL without executing
    sql = engine.transpile("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)
    print(sql)  # See the generated SQL
```

## Operators at a Glance

### Spatial Relationships

| Operator | Description |
|----------|-------------|
| `INTERSECTS` | Returns true when ranges overlap by at least one base pair |
| `CONTAINS` | Returns true when one range fully contains another |
| `WITHIN` | Returns true when one range is fully within another |

### Distance and Proximity

| Operator | Description |
|----------|-------------|
| `DISTANCE` | Calculate genomic distance between two intervals |
| `NEAREST` | Find k-nearest genomic features |

### Aggregation

| Operator | Description |
|----------|-------------|
| `CLUSTER` | Assign cluster IDs to overlapping intervals |
| `MERGE` | Combine overlapping intervals into unified regions |

### Set Quantifiers

| Quantifier | Description |
|------------|-------------|
| `ANY` | Match if condition holds for any of the specified ranges |
| `ALL` | Match if condition holds for all of the specified ranges |

## Documentation

For complete documentation, build the docs locally (see above) or visit the hosted documentation.

The documentation includes:

- **Operator Reference**: Detailed documentation for each operator with examples
- **Recipes**: Common query patterns for intersections, distance calculations, and clustering
- **Bedtools Migration Guide**: How to replicate bedtools operations with GIQL
- **Guides**: Performance optimization, multi-backend configuration, and schema mapping

## Development

This project is in active development.
