Metadata-Version: 2.4
Name: brisket
Version: 0.1.2
Summary: Fast Cython-powered one-hot encoding for DNA sequences
License: MIT
License-File: LICENSE
Keywords: bioinformatics,dna,one-hot-encoding,cython,genomics
Author: Natalie Gill
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Dist: numpy (>=1.21,<3.0)
Project-URL: Homepage, https://github.com/gladstone-institutes/brisket
Project-URL: Repository, https://github.com/gladstone-institutes/brisket
Description-Content-Type: text/markdown

# brisket

Fast cython powered 1 hot encoding for DNA sequences.

## Installation

```bash
$ pip install brisket
```

## Usage

```python
import numpy as np
from brisket import encode_seq

# Encode a DNA sequence to one-hot format
dna_sequence = "ATCG"
encoded = encode_seq(dna_sequence)

print(encoded)
# Output: 2D numpy array with shape (4, seq_length) - PyTorch convention
# [[1 0 0 0]  # A channel: positions 0, 1, 2, 3
#  [0 0 1 0]  # C channel: positions 0, 1, 2, 3
#  [0 0 0 1]  # G channel: positions 0, 1, 2, 3
#  [0 1 0 0]] # T channel: positions 0, 1, 2, 3

# The encoding uses channels-first format:
# - Row 0 = A channel, Row 1 = C channel, Row 2 = G channel, Row 3 = T channel
# - Each row represents one nucleotide type across all positions
# - Each column represents one position in the sequence

# Invalid characters (not A, T, C, G) result in all-zero columns
encoded_with_n = encode_seq("ATCGN")  # Last column will be [0 0 0 0]

```

## Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

## License

`brisket` was created by Natalie Gill and Sean Whalen. It is licensed under the terms of the MIT license.

## Credits

`brisket` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).

