Metadata-Version: 2.4
Name: signlangtk
Version: 0.1.11
Summary: Sign Language Toolkit for sign language research
Author: Sign Language Research Team
License-Expression: CC-BY-NC-ND-4.0
Project-URL: Repository, https://github.com/ed-fish/Sign-Language-Toolkit
Project-URL: Documentation, https://sign-language-toolkit.readthedocs.io/
Project-URL: Issues, https://github.com/ed-fish/Sign-Language-Toolkit/issues
Project-URL: Changelog, https://github.com/ed-fish/Sign-Language-Toolkit/releases
Keywords: sign language,computer vision,machine learning,linguistics,ELAN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.10.0
Requires-Dist: h5py>=3.8.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: click>=8.1.0
Requires-Dist: defusedxml>=0.7.0
Requires-Dist: nltk>=3.8.0
Requires-Dist: huggingface_hub>=0.20.0
Provides-Extra: mediapipe
Requires-Dist: mediapipe>=0.10.20; extra == "mediapipe"
Provides-Extra: wilor
Requires-Dist: torch>=2.0.0; extra == "wilor"
Requires-Dist: smplx>=0.1.28; extra == "wilor"
Requires-Dist: pytorch-lightning>=2.0.0; extra == "wilor"
Requires-Dist: yacs>=0.1.8; extra == "wilor"
Requires-Dist: ultralytics>=8.0.0; extra == "wilor"
Requires-Dist: timm>=0.9.0; extra == "wilor"
Requires-Dist: dill>=0.3.0; extra == "wilor"
Provides-Extra: nlf
Requires-Dist: torch>=2.0.0; extra == "nlf"
Provides-Extra: teaser
Requires-Dist: torch>=2.0.0; extra == "teaser"
Requires-Dist: mediapipe>=0.10.20; extra == "teaser"
Requires-Dist: timm>=0.9.0; extra == "teaser"
Provides-Extra: rtmpose
Requires-Dist: torch>=2.0.0; extra == "rtmpose"
Requires-Dist: mmpose>=1.1.0; extra == "rtmpose"
Requires-Dist: mmdet>=3.0.0; extra == "rtmpose"
Requires-Dist: mmengine>=0.7.0; extra == "rtmpose"
Requires-Dist: mmcv>=2.0.0; extra == "rtmpose"
Requires-Dist: openmim>=0.3.0; extra == "rtmpose"
Requires-Dist: decord>=0.6.0; extra == "rtmpose"
Provides-Extra: smplfx
Requires-Dist: torch>=2.0.0; extra == "smplfx"
Requires-Dist: smplx>=0.1.28; extra == "smplfx"
Requires-Dist: h5py>=3.10.0; extra == "smplfx"
Requires-Dist: hdf5plugin>=4.0.0; extra == "smplfx"
Requires-Dist: decord>=0.6.0; extra == "smplfx"
Provides-Extra: torch
Requires-Dist: torch>=2.0.0; extra == "torch"
Requires-Dist: torchvision>=0.15.0; extra == "torch"
Provides-Extra: data
Requires-Dist: lmdb>=1.4.0; extra == "data"
Requires-Dist: msgpack>=1.0.0; extra == "data"
Provides-Extra: metrics
Requires-Dist: sacrebleu>=2.3.0; extra == "metrics"
Requires-Dist: rouge-score>=0.1.2; extra == "metrics"
Provides-Extra: metrics-neural
Requires-Dist: signlangtk[metrics]; extra == "metrics-neural"
Requires-Dist: bleurt-pytorch>=0.0.1; extra == "metrics-neural"
Requires-Dist: bert-score>=0.3.13; extra == "metrics-neural"
Provides-Extra: metrics-video
Requires-Dist: scikit-image>=0.21.0; extra == "metrics-video"
Requires-Dist: pytorch-fid>=0.3.0; extra == "metrics-video"
Provides-Extra: analysis
Requires-Dist: scikit-learn>=1.3.0; extra == "analysis"
Requires-Dist: umap-learn>=0.5.0; extra == "analysis"
Requires-Dist: hdbscan>=0.8.0; extra == "analysis"
Requires-Dist: albumentations>=1.3.0; extra == "analysis"
Provides-Extra: vis
Requires-Dist: matplotlib>=3.7.0; extra == "vis"
Requires-Dist: opencv-python>=4.8.0; extra == "vis"
Provides-Extra: mesh-viz
Requires-Dist: pyrender>=0.1.45; extra == "mesh-viz"
Requires-Dist: trimesh>=3.21.0; extra == "mesh-viz"
Provides-Extra: api
Requires-Dist: fastapi>=0.109.0; extra == "api"
Requires-Dist: uvicorn[standard]>=0.25.0; extra == "api"
Requires-Dist: pydantic>=2.5.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: slowapi>=0.1.9; extra == "api"
Requires-Dist: openai>=1.12.0; extra == "api"
Requires-Dist: anthropic>=0.39.0; extra == "api"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.5.0; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "docs"
Requires-Dist: pymdown-extensions>=10.0; extra == "docs"
Provides-Extra: all
Requires-Dist: signlangtk[analysis,api,data,mediapipe,mesh-viz,metrics,metrics-neural,metrics-video,nlf,teaser,torch,vis,wilor]; extra == "all"
Dynamic: license-file

<p align="center">
  <img src="https://img.shields.io/pypi/v/signlangtk?color=blue&label=PyPI" alt="PyPI">
  <img src="https://img.shields.io/pypi/pyversions/signlangtk" alt="Python">
  <img src="https://img.shields.io/github/license/ed-fish/Sign-Language-Toolkit" alt="License">
  <img src="https://img.shields.io/badge/tests-1697%20passed-brightgreen" alt="Tests">
  <a href="https://sign-language-toolkit.readthedocs.io/"><img src="https://readthedocs.org/projects/sign-language-toolkit/badge/?version=latest" alt="Docs"></a>
</p>

# Sign Language Toolkit (SLTK)

| [Tutorials](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/full_pipeline/) | [Documentation](https://sign-language-toolkit.readthedocs.io/) | [Notebooks](notebooks/) | [Contributing](https://sign-language-toolkit.readthedocs.io/en/latest/contributing/) | [PyPI](https://pypi.org/project/signlangtk/) |

#

## What SLTK Offers

- SLTK is an **open-source** Python toolkit that accelerates **sign language research** and makes SOTA computer vision tools and resources easily availble to linguists and other people working in Sign Language and AI. 

- It provides a complete pipeline for **pose extraction**, **sign segmentation**, **gloss spotting**, **non-manual signal detection**, **corpus analysis**, and **evaluation** — all with a single CLI or Python API.

- SLTK brings together state-of-the-art models (WiLoR, NLF, TEASER, SignRep) behind a unified interface, so researchers can focus on linguistics rather than engineering.

## Vision

- We believe the field needs a **holistic toolkit** that jointly supports the full research workflow: video processing, pose extraction, temporal analysis, corpus management, and standardized evaluation.

- SLTK is designed for **reproducibility**: such that every metric, extraction step, and analysis tool uses the same data structures and can be run from a single CLI command or Python script.

#

## Quick Start

### Installation

```bash
# Core library (ELAN I/O, CLI, corpus tools, analysis) — no GPU needed
pip install signlangtk

# With GPU extraction backends
pip install "signlangtk[wilor,nlf,teaser]"

# Everything
pip install "signlangtk[all]"
```

> **Package name:** `signlangtk` on PyPI, `sltk` for Python imports.

<details>
<summary><b>Development install</b></summary>

```bash
git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,api,wilor,nlf,teaser]"
pytest  # 1697 tests
```
</details>

### One-Command Pipeline

An example video (`examples/example.mp4`) is included in the repository for testing.

```bash
sltk pipeline video.mp4 -o output/
# → output/video.eaf  (10+ tiers: segmentation, blinks, head nods, mouth, gaze, ...)
```

### Python API

```python
from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.teaser import TeaserExtractor
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
from sltk.nms.runner import detect_nms
from sltk.io.elan_roundtrip import ElanDocument

VIDEO, FPS = "recording.mp4", 25.0

# ── Step 1: Extract 3D hand poses ─────────────────────────────────
with WiLoRExtractor() as ext:
    ext.load_model()
    result = ext.extract_from_video(VIDEO, "recording_wilor.h5")
    print(f"{result.num_detections} hand detections across {result.num_frames} frames")

# ── Step 2: Extract face parameters ───────────────────────────────
with TeaserExtractor() as ext:
    ext.load_model()
    ext.extract_from_video(VIDEO, "recording_teaser.h5")

# ── Step 3: Segment signs ─────────────────────────────────────────
features = h5_to_features("recording_wilor.h5")  # (T, 192) MANO features
labels = get_runner().predict(features)            # 0=OUT, 1=IN, 2=BEGIN
segments = extract_segments(labels)                # [(start, end), ...]

# ── Step 4: Detect non-manual signals ─────────────────────────────
blinks, nms_events, quality = detect_nms(
    "recording_teaser.h5", detectors={"all"}
)

# ── Step 5: Assemble multi-tier ELAN file ─────────────────────────
doc = ElanDocument.new(video_path=VIDEO)

doc.add_tier("Segmentation")
for s, e in segments:
    doc.add_segment("Segmentation", s / FPS, e / FPS, "SIGN")

for tier in ["BLINK", "HEAD-NOD", "HEAD-SHAKE", "HEAD-TILT", "MOUTH-MOVEMENT"]:
    doc.add_tier(tier)
for b in blinks:
    doc.add_segment("BLINK", b.start_frame / FPS, b.end_frame / FPS, "blink")
for ev in nms_events:
    doc.add_segment(ev.tier, ev.start_frame / FPS, ev.end_frame / FPS, ev.label)

doc.save("recording.eaf")
```

---

## Extraction

Three GPU extractors share the same interface: `load_model()` → `extract_from_video()`. Weights auto-download from HuggingFace Hub on first use (~3.4 GB total).

### WiLoR — 3D Hand Reconstruction

21 keypoints per hand with MANO rotation matrices. Primary input for sign segmentation.

```python
from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig

config = WiLoRConfig(
    device="cuda:0",
    img_batch_size=128,       # reduce for <8GB VRAM
    detection_confidence=0.3,
    use_amp=True,
)
with WiLoRExtractor(config) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_wilor.h5")
```

**Output H5:**
```
video_wilor.h5
├── attrs: fps, num_frames, resolution
├── img_idx        (M,)              # frame index per detection
├── kpts_3d        (M, 21, 3)       # 3D hand keypoints
├── kpts_2d        (M, 21, 2)       # 2D projections
├── right          (M,)             # True = right hand
├── confidence     (M,)             # detection score
├── bboxes         (M, 4)           # hand bounding boxes
└── mano/
    ├── hand_pose      (M, 15, 3, 3)   # joint rotations
    ├── global_orient  (M, 1, 3, 3)    # wrist rotation
    └── betas          (M, 10)          # shape parameters
```


### NLF — Full-Body SMPL-X

55 SMPL-X joints (body + hands + face) with full pose parameters.

```python
from sltk.extraction.nlf import NLFExtractor, NLFConfig

with NLFExtractor(NLFConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_nlf.h5")
```

### TEASER — FLAME Face Parameters

FLAME 3D face parameters: jaw pose, expression, eyelid, shape, and head pose. Uses MediaPipe for face detection. This is the input for NMS detection.

```python
from sltk.extraction.teaser import TeaserExtractor, TeaserConfig

with TeaserExtractor(TeaserConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_teaser.h5")
```


### Batch Processing

```python
from pathlib import Path
from sltk.extraction.wilor import WiLoRExtractor

with WiLoRExtractor() as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "hands.h5")
    print(f"{result.num_detections} hands across {result.num_frames} frames")
```

---

## Sign Segmentation

A 4-layer Transformer that reads WiLoR hand features and predicts per-frame BIO labels.

```bash
# CLI
sltk segment video_wilor.h5 -o segments.eaf -f elan --video video.mp4

# Or directly from video (auto-extracts WiLoR first)
sltk segment video.mp4 -o segments.eaf -f elan
```

```python
# Python
from sltk.segmentation.runner import segment_h5
from sltk.segmentation.output import OutputFormat

segment_h5("video_wilor.h5", output_path="segments.eaf",
           output_format=OutputFormat.ELAN, fps=25.0, media_path="video.mp4")
```


---

## Gloss Spotting

Match detected segments to a dictionary of known signs using SignRep embeddings (768-dim ViT features).

```python
from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
dictionary = pipeline.load_dictionary(["/data/dictionaries/bsldict/signrep/"])

result = pipeline.spot(
    features=continuous,
    segments=[{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
    dictionary=dictionary,
    top_k=5,
)

for seg in result.segments:
    for gl in seg.top_glosses[:3]:
        print(f"  {gl['gloss']} ({gl['similarity']:.3f})")
```


---

## Non-Manual Signal Detection

Detect blinks, head nods/shakes/tilts, mouth movements, eyebrow raises, gaze direction, and eye squints from TEASER face data.

```bash
# CLI
sltk nms video_teaser.h5 -o nms_output/
```

```python
from sltk.nms.runner import detect_nms

blinks, nms_events, quality = detect_nms(
    "video_teaser.h5",
    detectors={"all"},
    smpl_path="video_nlf.h5",  # optional: enables gaze detection
)
print(f"{len(blinks)} blinks, {len(nms_events)} NMS events")
print(f"Tracking quality: {quality.detection_rate:.0%}")
```

| Detector | ELAN Tier | Signal | Source Data |
|----------|-----------|--------|-------------|
| **[Blink](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/nms/)** | `BLINK` | Eye closures | TEASER eyelid |
| **Nod** | `HEAD-NOD` | Vertical head oscillation | TEASER head pitch |
| **Shake** | `HEAD-SHAKE` | Horizontal head oscillation | TEASER head yaw |
| **Tilt** | `HEAD-TILT` | Side-to-side tilt | TEASER head roll |
| **Mouth** | `MOUTH-MOVEMENT` | Lip and mouth movement | FLAME expression |
| **Eyebrow** | `EYEBROW-RAISE` | Eyebrow raise or furrow | FLAME expression |
| **Gaze** | `EYE-GAZE` | Gaze direction | NLF eye pose |
| **Squint** | `EYE-SQUINT` | Partial eye closure | TEASER eyelid |

### Evaluation Metrics

Standardized metrics for translation, production (pose & video), and segmentation evaluation. See the [Metrics tutorial](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/metrics/) and [notebook](notebooks/04_evaluation_metrics.ipynb).

| Task | Metrics | Dependencies |
|------|---------|--------------|
| **[Translation](https://sign-language-toolkit.readthedocs.io/en/latest/api/metrics/#translation-metrics)** | BLEU-1/2/3/4, ROUGE-L, chrF/chrF++, TER, METEOR, WER | `signlangtk[metrics]` |
| **[Translation (neural)](https://sign-language-toolkit.readthedocs.io/en/latest/api/metrics/#bertscore)** | BLEURT, BERTScore | `signlangtk[metrics-neural]` |
| **[Production — Pose](https://sign-language-toolkit.readthedocs.io/en/latest/api/metrics/#pose-metrics)** | MPJPE, PA-MPJPE, PCK, DTW-MJE, APE, FGD | core (numpy/scipy) |
| **[Production — Video](https://sign-language-toolkit.readthedocs.io/en/latest/api/metrics/#video-quality-metrics)** | SSIM, PSNR, FID | `signlangtk[metrics-video]` |
| **[Segmentation](https://sign-language-toolkit.readthedocs.io/en/latest/api/metrics/#segmentation-metrics)** | Boundary F1, IoU, frame accuracy, label P/R/F1, confusion matrix | core |

### ROI Cropping

GPU-accelerated region-of-interest cropping for lips, hands, and other body regions. See the [Cropping API](https://sign-language-toolkit.readthedocs.io/en/latest/api/cropping/).

```python
from sltk.cropping import crop_lips_from_video

crops, frame_indices, bboxes = crop_lips_from_video("video.mp4", output_size=96)
```

### Corpus & Linguistic Analysis

| Tool | Description | Docs |
|------|-------------|------|
| **[Corpus Database](https://sign-language-toolkit.readthedocs.io/en/latest/api/corpus/)** | SQLite per workspace, auto-ingests ELAN files, FTS5 search | [API](https://sign-language-toolkit.readthedocs.io/en/latest/api/corpus/) |
| **Vocabulary** | Frequency distributions, type/token ratios | [Analysis API](https://sign-language-toolkit.readthedocs.io/en/latest/api/analysis/) |
| **Concordance (KWIC)** | Keyword-in-context with configurable window | [Concordance](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/concordance/) |
| **N-grams** | Bigram/trigram extraction with frequency counts | [Analysis API](https://sign-language-toolkit.readthedocs.io/en/latest/api/analysis/) |
| **Collocations** | Co-occurrence analysis with PMI/log-likelihood | [Analysis API](https://sign-language-toolkit.readthedocs.io/en/latest/api/analysis/) |
| **Duration Analysis** | Sign duration histograms and statistics | [Analysis API](https://sign-language-toolkit.readthedocs.io/en/latest/api/analysis/) |
| **Cross-Workspace** | Compare vocabulary and patterns across corpora | [Analysis API](https://sign-language-toolkit.readthedocs.io/en/latest/api/analysis/) |

### Linguistics

Specialized linguistic analysis tools. See the [Linguistics API](https://sign-language-toolkit.readthedocs.io/en/latest/api/linguistics/).

| Tool | Description |
|------|-------------|
| **Phonology** | HLMO parameter model — handshape, location, movement, orientation inventories. Minimal pair detection, phonological distance, inventory analysis. |
| **Non-Manual Analysis** | NMS scope, timing, co-occurrence analysis. Grammatical pattern detection (wh-questions, negation, topics). |
| **Inter-Rater Reliability** | Cohen's Kappa, Fleiss' Kappa, Krippendorff's Alpha. Boundary agreement and temporal label agreement for corpus annotation. |

### Glossing

Vocabulary management for model training. See the [Glossing API](https://sign-language-toolkit.readthedocs.io/en/latest/api/glossing/).

```python
from sltk.glossing import Vocabulary

vocab = Vocabulary.from_samples(dataset, min_count=2)
ids = vocab.encode(["HELLO", "WORLD"], add_bos=True, add_eos=True)
```

### Visualization

Skeleton and 3D mesh overlay rendering on video frames. See the [Visualization API](https://sign-language-toolkit.readthedocs.io/en/latest/api/visualization/).

```python
from sltk.visualization import generate_overlay_video
generate_overlay_video("video.mp4", "video_wilor.h5", "overlay.mp4", viz_type="wilor")
```

### ELAN I/O

| Feature | Description |
|---------|-------------|
| **Read/Write** | Full ELAN (.eaf) round-trip preserving all XML structure |
| **Create** | Build multi-tier ELAN files programmatically |
| **Merge** | Combine tiers from multiple ELAN files |
| **Export** | Convert to/from JSON, CSV, and other formats |

See the [ELAN tutorial](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/elan/).

#

## Supported Datasets

Built-in loaders for major sign language datasets:

| Dataset | Language | Type | Size | Tutorial |
|---------|----------|------|------|----------|
| [WLASL](https://dxli94.github.io/WLASL/) | ASL | Isolated | 2,000 classes | [Datasets](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/datasets/) |
| [ASL-Citizen](https://www.microsoft.com/en-us/research/project/asl-citizen/) | ASL | Isolated | Community-sourced | [Datasets](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/datasets/) |
| [How2Sign](https://how2sign.github.io/) | ASL | Continuous | 35K sentences | [Datasets](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/datasets/) |
| [BSLCP](https://bslcorpusproject.org/) | BSL | Continuous | Multi-view corpus | [Datasets](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/datasets/) |
| [BOBSL](https://www.robots.ox.ac.uk/~vgg/data/bobsl/) | BSL | Continuous | BBC archive | [Datasets](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/datasets/) |
| [Phoenix-2014T](https://www-i6.informatik.rwth-aachen.de/~koller/RWTH-PHOENIX-2014-T/) | DGS | Continuous | Weather broadcasts | [Datasets](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/datasets/) |
| [CSL-Daily](https://ustc-slr.github.io/datasets/2021_csl_daily/) | CSL | Continuous | Daily conversations | [Datasets](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/datasets/) |

#

## Additional Features

- **Pipeline CLI:** Single command from raw video to multi-tier ELAN file (`sltk pipeline`)
- **Unified Pose Format:** `PoseSequence` class normalizes all backends to `(T, N, C)` arrays with format conversion
- **ROI Cropping:** GPU-accelerated lip, hand, and generic region cropping for data loading and preprocessing
- **Phonological Analysis:** HLMO parameter model with minimal pair detection and inventory analysis
- **Inter-Rater Reliability:** Cohen's/Fleiss' Kappa, Krippendorff's Alpha, boundary agreement
- **Gloss Vocabulary:** Encode/decode glosses for model training with special token support
- **Visualization:** Skeleton overlay and 3D mesh projection rendering
- **Web Interface:** React + FastAPI app for workspace management, corpus exploration, and video viewing
- **Corpus Database:** SQLite per workspace with auto-ingest of ELAN files and full-text search
- **Batch Processing:** All extractors support directory-level batch processing
- **Model Weight Management:** Auto-download from HuggingFace Hub with environment variable overrides
- **Modular Install:** Optional dependency groups (`[wilor]`, `[metrics]`, `[api]`, etc.) keep the base lightweight

#

## CLI Reference

```bash
# Full pipeline
sltk pipeline video.mp4 -o output/

# Pose extraction
sltk extract wilor video.mp4 -o hands.h5
sltk extract nlf video.mp4 -o body.h5
sltk extract teaser video.mp4 -o face.h5

# Sign segmentation
sltk segment hands.h5 -o segments.eaf -f elan --video video.mp4

# Non-manual signal detection
sltk nms face.h5 -o nms_output/

# Gloss spotting
sltk spot video.mp4 --segments segments.json --dictionary dict/

# Evaluation
sltk evaluate preds.txt refs.txt --task translation -m bleu4 -m chrf
sltk evaluate pred.h5 ref.h5 --task production -m mpjpe -m pck
sltk evaluate preds.eaf refs.eaf --task segmentation

# ELAN utilities
sltk to-elan segments.json --video video.mp4 -o annotations.eaf
sltk from-elan annotations.eaf -o segments.json --tier Gloss
sltk info video_wilor.h5

# Web interface
sltk serve --host 0.0.0.0 --port 8000
```

See the full [CLI documentation](https://sign-language-toolkit.readthedocs.io/en/latest/api/cli/).

#

## Tutorials & Notebooks

| # | Tutorial | Notebook | Description |
|---|----------|----------|-------------|
| 01 | [Pose Extraction](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/extraction/) | [Notebook](notebooks/01_pose_extraction.ipynb) | WiLoR hands, NLF body, TEASER face |
| 02 | [Segmentation & Spotting](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/segmentation/) | [Notebook](notebooks/02_segmentation_and_spotting.ipynb) | Sign boundaries, dictionary matching |
| 03 | [NMS & ELAN](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/nms/) | [Notebook](notebooks/03_nms_and_elan.ipynb) | Non-manual signals, ELAN file assembly |
| 04 | [Evaluation Metrics](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/metrics/) | [Notebook](notebooks/04_evaluation_metrics.ipynb) | Translation, production, segmentation metrics |
| 05 | [ELAN Files](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/elan/) | — | Reading, writing, merging annotation files |
| 06 | [Datasets](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/datasets/) | — | Loading and exploring sign language corpora |
| 07 | [Concordance](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/concordance/) | — | KWIC, n-grams, collocations |
| 08 | [Feature Processing](https://sign-language-toolkit.readthedocs.io/en/latest/tutorials/processing/) | — | Pose features and normalization |

#

## Model Weights

All weights auto-download from [HuggingFace Hub](https://huggingface.co/fiskenai/vltk) and cache at `~/.cache/sltk/weights/`.

| Model | Size | Env Override |
|-------|------|-------------|
| WiLoR (hands) | ~2.5 GB | `SLTK_WILOR_CHECKPOINT` |
| NLF (body) | ~540 MB | `SLTK_NLF_MODEL` |
| TEASER (face) | ~350 MB | `SLTK_TEASER_CHECKPOINT` |
| SignRep (embedding) | ~350 MB | `SLTK_SIGNREP_CHECKPOINT` |
| Segmenter | ~180 MB | `SLTK_SEGMENTOR_V2_CHECKPOINT` |

Set `SLTK_AUTO_DOWNLOAD=1` to skip the confirmation prompt. Set `SLTK_WEIGHTS_DIR` to override the cache location.

#

## Testing

```bash
pytest                       # Full suite (1697 tests)
pytest -m "not slow"         # Skip slow GPU/neural tests
pytest tests/test_metrics.py # Single module
```

#

## Documentation

Full documentation at **[sign-language-toolkit.readthedocs.io](https://sign-language-toolkit.readthedocs.io/)**

| Section | Content |
|---------|---------|
| [Installation](https://sign-language-toolkit.readthedocs.io/en/latest/getting-started/installation/) | Install options, GPU setup, weight management |
| [Quick Start](https://sign-language-toolkit.readthedocs.io/en/latest/getting-started/quickstart/) | First steps with SLTK |
| [API Reference](https://sign-language-toolkit.readthedocs.io/en/latest/api/overview/) | Full Python API docs |
| [Cropping](https://sign-language-toolkit.readthedocs.io/en/latest/api/cropping/) | ROI and lip cropping |
| [Corpus Database](https://sign-language-toolkit.readthedocs.io/en/latest/api/corpus/) | SQLite corpus database |
| [Linguistics](https://sign-language-toolkit.readthedocs.io/en/latest/api/linguistics/) | Phonology, NMS analysis, reliability |
| [Glossing](https://sign-language-toolkit.readthedocs.io/en/latest/api/glossing/) | Vocabulary management |
| [NMS Detection](https://sign-language-toolkit.readthedocs.io/en/latest/api/nms/) | Non-manual signal detection API |
| [Visualization](https://sign-language-toolkit.readthedocs.io/en/latest/api/visualization/) | Skeleton and mesh overlays |
| [CLI Reference](https://sign-language-toolkit.readthedocs.io/en/latest/api/cli/) | Command-line interface |
| [REST API](https://sign-language-toolkit.readthedocs.io/en/latest/api/rest-api/) | FastAPI endpoint reference |

#

## License

**CC-BY-NC-ND-4.0** (Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International)

### Third-Party Licenses

SLTK bundles or depends on models with their own license terms:

| Component | License | Link |
|-----------|---------|------|
| **MANO** (hand model) | Non-commercial | [mano.is.tue.mpg.de](https://mano.is.tue.mpg.de/license.html) |
| **SMPL-X** (body model) | Non-commercial | [smpl-x.is.tue.mpg.de](https://smpl-x.is.tue.mpg.de/modellicense.html) |
| **FLAME** (face model) | Non-commercial / CC-BY-4.0 (2023+) | [flame.is.tue.mpg.de](https://flame.is.tue.mpg.de) |
| **WiLoR** | Apache 2.0 | [Potamias et al., CVPR 2025](https://github.com/rolpotamias/WiLoR) |
| **TEASER** | See repository | [Liu et al., ICLR 2025](https://github.com/Pixel-Talk/TEASER) |
| **NLF** | MIT (non-commercial) | [Sárándi & Pons-Moll, NeurIPS 2024](https://github.com/isarandi/nlf) |

#

## Citations & Acknowledgements

SLTK integrates several third-party models and methods. **If you use these components in your research, please cite the original papers.** SLTK does not claim authorship of these models — it provides a unified interface to run them together.

### Sign Segmentation — [He et al., FG 2025](https://arxiv.org/abs/2504.08593)

The sign segmenter uses the Hands-On model for temporal sign boundary detection from hand pose features.

```bibtex
@inproceedings{he2025hands,
  title={Hands-On: Segmenting Individual Signs from Continuous Sequences},
  author={He, Low Jian and Walsh, Harry and Sincan, Ozge Mercanoglu and Bowden, Richard},
  booktitle={2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG)},
  pages={1--5},
  year={2025},
  organization={IEEE}
}
```

### Gloss Spotting — [Wong et al., ICCV 2025](https://arxiv.org/abs/2503.08529)

SignRep provides self-supervised sign language representations used for dictionary-based gloss matching.

```bibtex
@inproceedings{wong2025signrep,
  title={SignRep: Enhancing Self-Supervised Sign Representations},
  author={Wong, Ryan and Camgoz, Necati Cihan and Bowden, Richard},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages={22804--22814},
  year={2025}
}
```

### Face Reconstruction — [Liu et al., ICLR 2025](https://arxiv.org/abs/2502.10982)

TEASER provides FLAME-based face parameter extraction (jaw pose, expression, eyelid) used for non-manual signal detection.

```bibtex
@article{liu2025teaser,
  title={TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction},
  author={Liu, Yunfei and Zhu, Lei and Lin, Lijian and Zhu, Ye and Zhang, Ailing and Li, Yu},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}
```

### Hand Reconstruction — [Potamias et al., CVPR 2025](https://arxiv.org/abs/2409.12259)

WiLoR provides end-to-end 3D hand localization and MANO parameter estimation from in-the-wild images.

```bibtex
@inproceedings{potamias2025wilor,
  title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
  author={Potamias, Rolandos Alexandros and Zhang, Jinglei and Deng, Jiankang and Zafeiriou, Stefanos},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}
```

### Body Pose — [Sárándi & Pons-Moll, NeurIPS 2024](https://arxiv.org/abs/2407.07532)

NLF estimates continuous 3D human body pose and shape (SMPL-X parameters, including eye gaze).

```bibtex
@inproceedings{sarandi2024nlf,
  title={Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation},
  author={S{\'a}r{\'a}ndi, Istv{\'a}n and Pons-Moll, Gerard},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2024}
}
```

### Pose Estimation — [Jiang et al., 2023](https://arxiv.org/abs/2303.07399)

RTMPose provides real-time multi-person whole-body keypoint detection (133 COCO-WholeBody landmarks).

```bibtex
@article{jiang2023rtmpose,
  title={RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose},
  author={Jiang, Tao and Lu, Peng and Zhang, Li and Ma, Ningsheng and Han, Rui and Lyu, Chengqi and Li, Yining and Chen, Kai},
  journal={arXiv preprint arXiv:2303.07399},
  year={2023}
}
```
