Metadata-Version: 2.1
Name: partake
Version: 0.5.3
Summary: Shard input to persistent jobs at raw pipe speed.
License: MIT
Author: Ben Skubi
Author-email: skubi@ohsu.edu
Requires-Python: >=3.10
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: click
Requires-Dist: loguru
Requires-Dist: subfeed
Description-Content-Type: text/markdown

# partake
**Shard input to persistent parallel jobs at the rate of a pipe.**

# Example

```bash
seq 0 10000000 | partake -n 10 'cat'
```

# Install

CLI version:
```bash
pip install partake
```
With development dependencies:
```bash
pip install partake[dev]
```

# Recipes

Split a SAM/BAM/CRAM file into 2 .bam chunks.

```bash
samtools view input.bam | partake -n 2 -s 1 -o {id}.bam "bash -c 'cat <(samtools view -H input.bam) - | samtools view -b'"
```

Here, we convert the input to SAM (plaintext) and pipe the records to stdin. We use the `-s 1` option to ensure that lines are not truncated. In the command, we prepend the header to each worker's input record stream and convert back to bam. Output is `0.bam` and `1.bam`.

Read global line indexes within a custom Python script:

`report.py`
```python
import os
import sys
read_fd = os.environ.get("PARTAKE_LINE_NUMBERS")
if read_fd is not None:
    read_fd = int(read_fd)
    line_numbers = os.fdopen(read_fd, buffering=1)

for line_number, data_line in zip(line_numbers, sys.stdin):
    print("Line number: ", line_number.strip(), "Data line: ", data_line.strip())
```

```
seq 1000 10000000 | partake -n 2 -s 1 "python report.py"
head 0.out
Line number:  0 Data line:  1000
Line number:  1 Data line:  1001
Line number:  2 Data line:  1002
Line number:  3 Data line:  1003
```

