Metadata-Version: 2.1
Name: varstrat
Version: 0.1.1
Summary: A tool to annotate VCF files using genome stratification files, targeting difficult regions.
Author: Thinh Quyen
Author-email: thinhquyen9461@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE

# VarStrat

A tool for annotating VCF files using genome stratification files, targeting difficult regions.

## Description
VarStrat is designed to annotate VCF (Variant Call Format) files using genome stratification files. It focuses on difficult regions, providing comprehensive annotations to aid in variant analysis.


## Installation

Use the package manager [pip](https://pip.pypa.io/en/stable/) to install VarStrat.

```bash
pip install varstrat
```

## Usage
To use the tool from the command line:
```bash
varstrat --input_vcf input.vcf --output_vcf output.vcf --data_source data_source
```

## Command-line Options
```
usage: varstrat [-h] --input_vcf INPUT_VCF --output_vcf OUTPUT_VCF --data_source DATA_SOURCE

A tool for stratifying genetic variants.

optional arguments:
  -h, --help            show this help message and exit
  --input_vcf INPUT_VCF
                        Path to the input VCF file. The file can be in VCF or VCF.GZ format.
  --output_vcf OUTPUT_VCF
                        Path to the output VCF file. The output will be a .vcf file.
  --data_source DATA_SOURCE
                        Path to the data source directory containing stratification regions.
```

## Parameters Explanation
- input_vcf: Path to the input VCF file. The file can be in .vcf or .vcf.gz format.

- output_vcf: Path to the output VCF file. The output will be a .vcf file.

- data_source: Path to the data source directory containing stratification regions. The directory should have the following structure:
    ```
    data_source/
    ├── Region_name_1
    │   ├── sub_region_1.bed.gz
    │   ├── sub_region_2.bed.gz
    ├── Region_name_2
    │   ├── sub_region_1.bed.gz
    │   ├── sub_region_2.bed.gz
    ...
    ```
    Each region is represented by a folder containing one or more .bed.gz files that specify sub-regions.

    You can check at [genome-stratification](https://github.com/genome-in-a-bottle/genome-stratifications) for more information about the data source.
