Metadata-Version: 2.3
Name: ee-metadata
Version: 0.2.2
Summary: A high-performance CLI tool from eDNA-Explorer to generate metadata from a directory of fastq files.
License: MIT
Keywords: bioinformatics,fastq,metadata,genomics,cli,edna
Author: Jim Jeffers
Author-email: jim@ednaexplorer.org
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Dist: polars (>=1.3.0,<2.0.0)
Requires-Dist: python-dateutil (>=2.8.2,<3.0.0)
Requires-Dist: rapidfuzz (>=3.6.0,<4.0.0)
Requires-Dist: rich (>=14.1.0,<15.0.0)
Requires-Dist: typer (>=0.16.0,<0.17.0)
Project-URL: Homepage, https://github.com/eDNA-Explorer/ee-metadata
Project-URL: Repository, https://github.com/eDNA-Explorer/ee-metadata
Description-Content-Type: text/markdown

# ee-metadata

A high-performance, command-line tool from eDNA-Explorer to scan a directory of .fastq.gz files, identify primer sequences, pair forward and reverse reads, and generate a metadata CSV file. Built with Polars for maximum speed and efficiency.

## Features

- **Fast Analysis**: Leverages the Polars DataFrame library for rapid processing of primer and metadata files.
- **Primer Detection**: Scans the beginning of each FASTQ file to detect known forward and reverse primer sequences.
- **Intelligent Pairing**: Pairs forward (R1) and reverse (R2) read files based on common naming conventions.
- **Flexible & Robust**: Handles ambiguous IUPAC codes in primer sequences and provides options for forcing file pairs.
- **User-Friendly CLI**: Clean, documented command-line interface powered by Typer, with rich terminal output.

## Installation

### Quick Install with uv or pipx (Recommended)

The easiest way to install `ee-metadata` is using `uv` or `pipx`, which will install it as a standalone CLI tool:

**Using uv (fastest):**
```bash
uv tool install ee-metadata
```

**Using pipx:**
```bash
pipx install ee-metadata
```

After installation, you can run the tool directly:
```bash
ee-metadata --help
```

### Install from PyPI with pip

```bash
pip install ee-metadata
```

### Development Installation

For development or if you prefer Poetry:

#### 1. Install Poetry
Follow the official instructions: [Poetry Installation Guide](https://python-poetry.org/docs/#installation).

#### 2. Clone the Repository
```bash
git clone https://github.com/eDNA-Explorer/ee-metadata.git
cd ee-metadata
```

#### 3. Install Dependencies
```bash
poetry install
```

#### 4. Run with Poetry
```bash
poetry run ee-metadata
```

### 4. Enable Tab Completion (Optional)
To enable tab completion for file paths and options, install shell completion:

**For Bash:**
```bash
poetry run ee-metadata --install-completion bash
```

**For Zsh:**
```bash
poetry run ee-metadata --install-completion zsh
```

**For Fish:**
```bash
poetry run ee-metadata --install-completion fish
```

## Usage

Once installed, you can run the tool using poetry run.

### Command
```bash
poetry run ee-metadata [OPTIONS] [INPUT_DIR]
```

### Arguments
- **INPUT_DIR**: (Optional) The path to the directory containing your .fastq.gz files. If not provided, you'll be prompted to enter it interactively.

### Options

| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| `--primers` | `-p` | (Required) Path to the primers CSV file. | - |
| `--output` | `-o` | Path for the output metadata CSV file. | `metadata.csv` |
| `--num-records` | `-n` | Number of records to scan in each FASTQ file for primer detection. | `100` |
| `--force-pairing` | | Force pairing of files based on common base names if primer analysis fails. | `False` |
| `--help` | | Show the help message and exit. | - |

### Examples

**Interactive mode (prompts for input directory):**
```bash
poetry run ee-metadata
```

**With directory specified (supports tab completion):**
```bash
# Use tab completion to autocomplete paths
poetry run ee-metadata <TAB>  # Press tab to see available directories

# Analyze FASTQ files in the 'data/raw_reads' directory using default primers
poetry run ee-metadata ./data/raw_reads

# Analyze FASTQ files with custom primers and output file (use tab completion)
poetry run ee-metadata ./data/raw_reads --primers <TAB> --output <TAB>
```

## Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue for bugs, feature requests, or questions.
