Metadata-Version: 2.4
Name: talk-tag
Version: 0.1.0
Summary: GPU-only transcript annotator for speaker-scoped CHAT corpus correction
Requires-Python: >=3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: huggingface_hub>=0.29.3
Requires-Dist: openpyxl>=3.1.5
Requires-Dist: orjson>=3.10.15
Requires-Dist: pandas>=2.2.3
Requires-Dist: torch>=2.7.0
Requires-Dist: transformers>=4.52.0
Requires-Dist: tqdm>=4.67.1
Provides-Extra: dev
Requires-Dist: pytest>=8.3.5; extra == "dev"
Dynamic: license-file

# talk-tag

GPU-only Hugging Face transcript annotator for speaker-scoped CHAT correction tasks.

## CLI

```bash
talk-tag annotate \
  --input-dir ./input \
  --output-dir ./output \
  --target-speaker "*CHI" \
  --investigator-speaker "*INV"
```

Optional Hugging Face overrides:

```bash
talk-tag annotate \
  --input-dir ./input \
  --output-dir ./output \
  --target-speaker "*CHI" \
  --hf-repo-id your-org/your-model \
  --hf-filename config.json \
  --hf-token "$HF_TOKEN" \
  --hf-cache-dir ./hf-cache
```

## Notes

- Input files are never modified in-place. Outputs are mirrored into `--output-dir`.
- Supported inputs: `.cha`, `.txt`, `.csv`, `.json`, `.jsonl`, `.xlsx`.
- CHAT-like files (`.cha`, `.txt`, `.csv`) annotate only the selected target speaker.
