Metadata-Version: 2.1
Name: gff2bed
Version: 0.2.0
Summary: Convert GFF3-formatted data to BED format
Project-URL: Homepage, https://gitlab.com/salk-tm/gff2bed
Project-URL: Documentation, https://salk-tm.gitlab.io/gff2bed
Author-email: Anthony Aylward <aaylward@salk.edu>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Description-Content-Type: text/markdown

# gff2bed

## Overview

[GFF3](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) and [BED](https://bedtools.readthedocs.io/en/latest/content/general-usage.html) are common formats for storing the coordinates of genomic features such as genes. GFF3 format is more versatile, but BED format is simpler and enjoys a rich ecosystem of utilities such as [bedtools](https://bedtools.readthedocs.io/en/latest/index.html). For this reason, it is often convenient to store genomic features in GFF3 format and convert them to BED format for genome arithmetic.

This package provides two convenience functions to streamline converting data from GFF3 to BED format for bioinformatics analysis: `parse()`, which reads data from a GFF3 file, and `convert()`, which converts GFF3-formatted data to BED-formatted data that can be passed on e.g. to [pybedtools](https://daler.github.io/pybedtools/).

## Installation

Install `gff2bed` with `pip`

```sh
pip install gff2bed
```

## Example

```python
import urllib3
import shutil
import pandas as pd
import pybedtools
import gff2bed

GFF3_URL = 'https://gitlab.com/salk-tm/gff2bed/-/raw/main/test/data/ColCEN_AT1G01010-20_TAIR10.gff3.gz'

# Download the example GFF3 file
http = urllib3.PoolManager()
with http.request('GET', GFF3_URL, preload_content=False) as r, open('ColCEN_AT1G01010-20_TAIR10.gff3.gz', 'wb') as dest_file:
    shutil.copyfileobj(r, dest_file)

# Parse the GFF3 data into a Pandas data frame
genes_df = pd.DataFrame(gff2bed.parse('ColCEN_AT1G01010-20_TAIR10.gff3.gz'))
genes_df.head()

# Parse the GFF3 data into a pybedtools BedTool
genes_bt = pybedtools.BedTool(gff2bed.convert(gff2bed.parse('ColCEN_AT1G01010-20_TAIR10.gff3.gz'))).saveas('ColCEN_AT1G01010-20_TAIR10.bed')
genes_bt.head()
```

## API