Metadata-Version: 2.4
Name: signlangtk
Version: 0.1.1
Summary: Sign Language Toolkit for sign language research
Author: Sign Language Research Team
License-Expression: CC-BY-NC-4.0
Project-URL: Repository, https://github.com/ed-fish/sign-language-toolkit
Project-URL: Documentation, https://github.com/ed-fish/sign-language-toolkit#readme
Project-URL: Issues, https://github.com/ed-fish/sign-language-toolkit/issues
Keywords: sign language,computer vision,machine learning,linguistics,ELAN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.10.0
Requires-Dist: h5py>=3.8.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: click>=8.1.0
Requires-Dist: defusedxml>=0.7.0
Requires-Dist: nltk>=3.8.0
Requires-Dist: huggingface_hub>=0.20.0
Provides-Extra: mediapipe
Requires-Dist: mediapipe>=0.10.0; extra == "mediapipe"
Provides-Extra: wilor
Requires-Dist: torch>=2.0.0; extra == "wilor"
Requires-Dist: smplx>=0.1.28; extra == "wilor"
Requires-Dist: pytorch-lightning>=2.0.0; extra == "wilor"
Requires-Dist: yacs>=0.1.8; extra == "wilor"
Requires-Dist: ultralytics>=8.0.0; extra == "wilor"
Requires-Dist: timm>=0.9.0; extra == "wilor"
Requires-Dist: dill>=0.3.0; extra == "wilor"
Provides-Extra: nlf
Requires-Dist: torch>=2.0.0; extra == "nlf"
Provides-Extra: teaser
Requires-Dist: torch>=2.0.0; extra == "teaser"
Requires-Dist: ultralytics>=8.0.0; extra == "teaser"
Requires-Dist: timm>=0.9.0; extra == "teaser"
Provides-Extra: rtmpose
Requires-Dist: torch>=2.0.0; extra == "rtmpose"
Requires-Dist: mmpose>=1.1.0; extra == "rtmpose"
Requires-Dist: mmdet>=3.0.0; extra == "rtmpose"
Requires-Dist: mmengine>=0.7.0; extra == "rtmpose"
Requires-Dist: mmcv>=2.0.0; extra == "rtmpose"
Requires-Dist: openmim>=0.3.0; extra == "rtmpose"
Requires-Dist: decord>=0.6.0; extra == "rtmpose"
Provides-Extra: smplfx
Requires-Dist: torch>=2.0.0; extra == "smplfx"
Requires-Dist: smplx>=0.1.28; extra == "smplfx"
Requires-Dist: h5py>=3.10.0; extra == "smplfx"
Requires-Dist: hdf5plugin>=4.0.0; extra == "smplfx"
Requires-Dist: decord>=0.6.0; extra == "smplfx"
Provides-Extra: torch
Requires-Dist: torch>=2.0.0; extra == "torch"
Requires-Dist: torchvision>=0.15.0; extra == "torch"
Provides-Extra: data
Requires-Dist: lmdb>=1.4.0; extra == "data"
Requires-Dist: msgpack>=1.0.0; extra == "data"
Provides-Extra: metrics
Requires-Dist: sacrebleu>=2.3.0; extra == "metrics"
Requires-Dist: rouge-score>=0.1.2; extra == "metrics"
Provides-Extra: analysis
Requires-Dist: scikit-learn>=1.3.0; extra == "analysis"
Requires-Dist: umap-learn>=0.5.0; extra == "analysis"
Requires-Dist: hdbscan>=0.8.0; extra == "analysis"
Requires-Dist: albumentations>=1.3.0; extra == "analysis"
Provides-Extra: vis
Requires-Dist: matplotlib>=3.7.0; extra == "vis"
Requires-Dist: opencv-python>=4.8.0; extra == "vis"
Provides-Extra: api
Requires-Dist: fastapi>=0.109.0; extra == "api"
Requires-Dist: uvicorn[standard]>=0.25.0; extra == "api"
Requires-Dist: pydantic>=2.5.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: slowapi>=0.1.9; extra == "api"
Requires-Dist: openai>=1.12.0; extra == "api"
Requires-Dist: anthropic>=0.39.0; extra == "api"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.5.0; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "docs"
Requires-Dist: pymdown-extensions>=10.0; extra == "docs"
Provides-Extra: all
Requires-Dist: signlangtk[analysis,api,data,mediapipe,metrics,nlf,rtmpose,smplfx,teaser,torch,vis,wilor]; extra == "all"
Dynamic: license-file

# Sign Language Toolkit (SLTK)

A research toolkit for sign language video analysis. SLTK provides a complete pipeline from raw video to linguistic annotations: **pose extraction**, **automatic segmentation**, **gloss spotting**, and **corpus analysis** — all accessible via Python, CLI, or a web interface.

## Installation

```bash
# Core library (data loading, ELAN I/O, CLI)
pip install signlangtk

# With web interface
pip install "signlangtk[api]"

# With GPU-accelerated pose extraction (WiLoR hand model)
pip install "signlangtk[wilor]"

# Everything
pip install "signlangtk[all]"
```

The PyPI package is called `signlangtk`, but the Python import is `sltk`:

```python
import sltk
from sltk.data import PoseSequence, Segment
```

Requires Python 3.10+. For development from source:

```bash
git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,api]"
```

## The Processing Pipeline

SLTK's core workflow is a three-stage pipeline that turns raw sign language video into searchable, annotated ELAN files:

```
Video (.mp4)
    │
    ├── Stage 1: Pose Extraction ──► {stem}_wilor.h5
    │       WiLoR hand model: MANO rotation matrices + 3D keypoints
    │
    ├── Stage 2: Segmentation ──► {stem}_segments.eaf
    │       Transformer model predicts sign boundaries (BIO labels)
    │
    └── Stage 3: Spotting ──► {stem}_spotted.eaf
            SignRep model matches segments to a dictionary of known signs
```

Each stage can be run independently. If you already have H5 pose files, start at Stage 2. If you already have segment boundaries, start at Stage 3.

---

## Stage 1: Pose Extraction

Extract hand poses from video using the WiLoR hand model. This produces an HDF5 file containing MANO rotation matrices and 21 3D keypoints per detected hand, per frame.

### Python

```python
from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig

config = WiLoRConfig(
    checkpoint_path="path/to/wilor_final.ckpt",
    detector_path="path/to/detector.pt",
    rescale_factor=2.0,
    detection_confidence=0.3,
)
extractor = WiLoRExtractor(config)
extractor.load_model()
result = extractor.extract_from_video("video.mp4")
# Saves to video_wilor.h5
extractor.close()
```

### API

```bash
# Start extraction job (runs in background on GPU)
curl -X POST http://localhost:8000/api/extraction/start \
  -H "Content-Type: application/json" \
  -d '{
    "video_path": "/data/video.mp4",
    "output_root": "/data/output",
    "config": {"enable_wilor": true, "device": "cuda"}
  }'
# Returns: {"job_id": "abc123", ...}

# Poll progress
curl http://localhost:8000/api/extraction/status/abc123
```

### Output format

The WiLoR H5 file has this structure:

```
video_wilor.h5
├── attrs: fps, num_frames, resolution, extractor
├── frame_idx      (num_frames, 2)           # (start_idx, count) per frame
├── kpts_3d        (num_detections, 21, 3)   # 3D hand keypoints
├── right          (num_detections,)          # True = right hand
└── mano/
    ├── hand_pose      (num_detections, 15, 3, 3)   # joint rotations
    └── global_orient  (num_detections, 1, 3, 3)     # wrist rotation
```

### Model weights

Checkpoints are resolved in this order:

1. Explicit path in `WiLoRConfig`
2. Environment variable: `SLTK_WILOR_CHECKPOINT`, `SLTK_WILOR_DETECTOR`
3. Bundled at `sltk/weights/wilor/`

Other extractors (MediaPipe, NLF/SMPL-X, TEASER, RTMPose) are also available — see `sltk/extraction/`.

---

## Stage 2: Segmentation

The segmenter is a 4-layer Transformer that reads WiLoR H5 files and predicts per-frame BIO labels (`0`=OUT, `1`=IN_SIGN, `2`=BEGIN), identifying where individual signs start and end.

### Python — high level

```python
from sltk.segmentation.runner import segment_h5
from sltk.segmentation.output import OutputFormat

# Segment a single file → ELAN output
segment_h5(
    "video_wilor.h5",
    output_path="video_segments.eaf",
    output_format=OutputFormat.ELAN,
    fps=25.0,
    media_path="video.mp4",  # links the video in the EAF
)

# Segment a single file → JSON output
segment_h5(
    "video_wilor.h5",
    output_path="video_segments.json",
    output_format=OutputFormat.JSON,
    fps=25.0,
)

# Segment an entire directory
segment_h5(
    "/data/poses/",
    output_path="/data/segments/output.json",
    output_format=OutputFormat.JSON,
    fps=25.0,
)
```

### Python — low level

```python
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments

# Load H5 → 192-dim feature vectors (MANO rotations as axis-angle)
features = h5_to_features("video_wilor.h5")  # shape: (num_frames, 192)

# Run the Transformer model
runner = get_runner()  # singleton, loads checkpoint once
labels = runner.predict(features)  # shape: (num_frames,) values 0/1/2

# Extract segment boundaries
segments = extract_segments(labels)
# [(12, 45), (50, 82), (90, 120), ...]
```

### API

```bash
# Segment a single file
curl -X POST http://localhost:8000/api/segmentation/segment \
  -H "Content-Type: application/json" \
  -d '{"h5_path": "/data/video_wilor.h5", "fps": 25.0}'

# Batch segment a directory
curl -X POST http://localhost:8000/api/segmentation/segment/batch \
  -H "Content-Type: application/json" \
  -d '{
    "directory": "/data/poses/",
    "fps": 25.0,
    "output_path": "/data/segments/",
    "output_format": "json"
  }'
```

### JSON output

```json
{
  "video_name": {
    "fps": 25.0,
    "num_frames": 3000,
    "segments": [
      {"start_frame": 12, "end_frame": 45, "start_sec": 0.48, "end_sec": 1.80},
      {"start_frame": 50, "end_frame": 82, "start_sec": 2.00, "end_sec": 3.28}
    ]
  }
}
```

### ELAN output

Creates a tier named `{video_name}_segmentation` with each segment labelled `SIGN`, authored by `segmenter_v2`.

### Model checkpoint

Set `SLTK_SEGMENTOR_CHECKPOINT` or place `segmentor_v2.ckpt` in `sltk/weights/segmentor/`.

---

## Stage 3: Gloss Spotting

The spotter uses SignRep (a ViT-based model) to extract 768-dim visual features from 16-frame sliding windows, then matches each detected segment against a **dictionary** of known sign features using cosine similarity.

### Prerequisites

- Segment boundaries from Stage 2 (or your own)
- A **dictionary**: a folder of `.npz` files (one per sign), each containing a `best_latent` key with a 768-dim feature vector

### Python — full pipeline

```python
from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()

# 1. Extract dense features from the full video
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features shape: (num_windows, 768), L2-normalized

# 2. Load dictionary
dictionary = pipeline.load_dictionary(
    ["/data/dictionaries/bsldict/signrep/"],
    feature_key="best_latent",
)

# 3. Define segments (from Stage 2, or load from JSON/EAF)
segments = [
    {"segment_id": 0, "start_frame": 12, "end_frame": 45},
    {"segment_id": 1, "start_frame": 50, "end_frame": 82},
]

# 4. Match each segment against the dictionary
result = pipeline.spot(
    features=continuous,
    segments=segments,
    dictionary=dictionary,
    top_k=10,
    segment_pooling="max",  # or "mean", "softmax_weighted"
)

# 5. Inspect results
for seg in result.segments:
    print(f"Segment {seg.start_ms}ms–{seg.end_ms}ms:")
    for gl in seg.top_glosses:
        print(f"  Rank {gl['rank']}: {gl['gloss']} ({gl['similarity']:.3f})")

# 6. Save as ELAN (creates Rank-1..N and Score-1..N tiers)
result.save_eaf("video_spotted.eaf", media_path="video.mp4")
```

### Python — one-shot

```python
result = pipeline.spot_from_video(
    video_path="video.mp4",
    segments_json="video_segments.json",
    dictionary_dirs=["/data/dictionaries/bsldict/signrep/"],
    top_k=20,
    stride=4,
)
```

### API

```bash
# Extract continuous features (cached server-side)
curl -X POST http://localhost:8000/api/signrep/continuous/extract \
  -H "Content-Type: application/json" \
  -d '{"video_path": "/data/video.mp4", "stride": 4}'
# Returns: {"features_id": "abc123", "num_windows": 500, ...}

# Spot glosses
curl -X POST http://localhost:8000/api/signrep/spot \
  -H "Content-Type: application/json" \
  -d '{
    "features_id": "abc123",
    "segments": [{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
    "dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
    "top_k": 10
  }'
```

### Building a dictionary

Before spotting, you need a dictionary. Extract one feature per isolated sign video:

```python
from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()
result = pipeline.extract_dictionary("isolated_sign.mp4", method="middle")
result.save_npz("dictionary/HELLO.npz")
```

Batch extraction via the API:

```bash
curl -X POST http://localhost:8000/api/signrep/dictionary/batch/job \
  -H "Content-Type: application/json" \
  -d '{
    "video_dir": "/data/isolated_signs/",
    "output_dir": "/data/dictionary/",
    "method": "middle"
  }'
```

### Model checkpoint

Set `SLTK_SIGNREP_CHECKPOINT` or place `ckpt.pt` in `sltk/weights/signrep/`.

---

## End-to-End Processing

The processing API combines Stages 2 and 3 into a single background job. It expects WiLoR H5 files to already exist alongside the videos (`{stem}_wilor.h5`).

### API

```bash
# Segmentation only
curl -X POST http://localhost:8000/api/processing/submit \
  -H "Content-Type: application/json" \
  -d '{
    "video_paths": ["/data/video1.mp4", "/data/video2.mp4"],
    "type": "segments",
    "fps": 25.0
  }'

# Segmentation + spotting
curl -X POST http://localhost:8000/api/processing/submit \
  -H "Content-Type: application/json" \
  -d '{
    "video_paths": ["/data/video1.mp4"],
    "type": "spots",
    "dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
    "top_k": 5,
    "fps": 25.0,
    "workspace": "my_workspace"
  }'

# Poll job
curl http://localhost:8000/api/processing/status/{job_id}

# Download result
curl -O http://localhost:8000/api/processing/output/{job_id}/video1_spotted.eaf
```

### Output files

| Job type | Output | Description |
|----------|--------|-------------|
| `segments` | `{stem}_segments.eaf` | Sign boundaries (SIGN labels) |
| `spots` | `{stem}_segments.eaf` + `{stem}_spotted.eaf` | Boundaries + ranked gloss labels |

When a `workspace` is specified, output EAF files are automatically ingested into the corpus database for search and analysis.

---

## Feature Extraction

SLTK computes several feature representations from H5 pose data, used by the segmenter and available for your own research.

### WiLoR segmenter features (192-dim)

Used by the Transformer segmenter (Stage 2). Converts MANO rotation matrices to axis-angle, concatenates left and right hand:

```python
from sltk.segmentation.h5_loader import h5_to_features

features = h5_to_features("video_wilor.h5")
# shape: (num_frames, 192)
# = 2 hands × 96 dims (16 joints × 6 axis-angle params)
```

### Angle features (104-dim)

Body joint angles and hand Euler angles from MANO rotation matrices:

```python
from sltk.processing.features import compute_angle_features

angles = compute_angle_features(body_poses, left_hand_poses, right_hand_poses)
# shape: (num_frames, 104)
# = 22 body angles + 41 left hand + 41 right hand
```

### HaMeR features (288-dim)

Flattened MANO rotation matrices:

```python
from sltk.processing.features import load_features_from_h5

angles, hamer = load_features_from_h5("video_mediapipe.h5", "video_wilor.h5")
# angles: (T, 104)
# hamer:  (T, 288) = 2 × (135 hand_pose + 9 global_orient)
```

### SignRep embeddings (768-dim)

Dense visual features from the SignRep ViT model:

```python
from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features: (num_windows, 768), L2-normalized
# 16-frame windows at stride 4
```

---

## Running the Web Interface

SLTK includes a React frontend for browsing workspaces, running processing jobs, and exploring corpus data.

### Development mode

```bash
# Start FastAPI backend (port 8000) + Vite dev server (port 5173)
bash scripts/run_dev.sh
```

This launches both servers. The frontend is available at `http://localhost:5173` and proxies API requests to the backend. Interactive API docs are at `http://localhost:8000/docs`.

### Production mode

```bash
# Build the frontend
cd frontend && npm ci && npm run build && cd ..

# Serve everything from FastAPI
sltk serve --host 0.0.0.0 --port 8000
```

The built frontend is served as static files from FastAPI at `http://localhost:8000`.

### Backend only

```bash
uvicorn sltk.api.main:app --host 0.0.0.0 --port 8000
```

### Frontend pages

| Route | Page | Purpose |
|-------|------|---------|
| `/` | Workspaces | Create/switch workspaces, scan directories |
| `/process` | Process | Submit segmentation and spotting jobs |
| `/explore` | Explore | Search glosses, view video clips, corpus statistics |
| `/viewer` | Viewer | Video playback with annotation overlay |
| `/analysis/*` | Analysis | Vocabulary, concordance, n-grams, collocations, durations |

---

## NMS Detection (Non-Manual Signals)

Detect blinks, head nods, shakes, tilts, and other non-manual signals from TEASER/FLAME face-tracking H5 files.

### Python

```python
from sltk.nms.runner import detect_nms, export_results

# Detect all NMS events
blinks, nms_events, quality = detect_nms(
    "video_teaser.h5",
    detectors={"all"},  # or {"blink", "nod", "shake", "tilt", "mouth", "eyebrow"}
)

# Export to ELAN
export_results(blinks, nms_events, "video_teaser.h5",
    output_dir="output/",
    formats=["elan", "json", "csv"],
    participant_id="P01",
)
```

### Available detectors

| Detector | Tier | Signal |
|----------|------|--------|
| `blink` | `BLINK` | Eye closures from eyelid parameters |
| `nod` | `HEAD-NOD` | Vertical head oscillation (pitch) |
| `shake` | `HEAD-SHAKE` | Horizontal head oscillation (yaw) |
| `tilt` | `HEAD-TILT` | Side-to-side head tilt (roll) |
| `mouth` | `MOUTH` | Lip/mouth movement from FLAME expression |
| `eyebrow` | `EYEBROW` | Eyebrow raise/furrow from FLAME expression |
| `gaze` | `EYE-GAZE` | Gaze direction (requires NLF/SMPL file) |
| `squint` | `EYE-SQUINT` | Partial eye closure |

### API

```bash
curl -X POST http://localhost:8000/api/nms/detect \
  -H "Content-Type: application/json" \
  -d '{
    "h5_path": "/data/video_teaser.h5",
    "detectors": ["all"],
    "format": ["elan"],
    "output_dir": "/data/output/"
  }'
```

---

## CLI Reference

```bash
sltk convert input.npy output.h5 --from mediapipe --to wilor --fps 25
sltk evaluate predictions.txt references.txt --task translation
sltk to-elan segments.json --video source.mp4 --output annotations.eaf
sltk from-elan annotations.eaf --output segments.json --tier Gloss
sltk info video_wilor.h5
sltk serve --host 0.0.0.0 --port 8000 --reload
sltk formats
```

## Data Types

```python
from sltk.data import PoseSequence, Segment, SegmentList

# Load poses
poses = PoseSequence.load("video_wilor.h5", format="wilor", fps=25)
poses.data     # (num_frames, num_keypoints, 3)
poses.fps      # 25.0
poses.format   # "wilor"

# Load ELAN annotations
from sltk.io import read_eaf, write_eaf
segments = read_eaf("annotations.eaf", tiers=["Gloss"])

# Create and export segments
new_segments = SegmentList([
    Segment(start=0.0, end=1.5, label="HELLO", tier="Gloss"),
    Segment(start=1.5, end=3.0, label="WORLD", tier="Gloss"),
])
write_eaf(new_segments, "output.eaf", video_path="source.mp4")
```

### Supported pose formats

| Format | Keypoints | Description |
|--------|-----------|-------------|
| MediaPipe | 33 body + 21x2 hands + 468 face | Holistic pose estimation |
| WiLoR | 21 per hand | MANO hand model with rotation matrices |
| NLF/SMPL-X | 55 joints | Full body with axis-angle rotations |

All stored as HDF5 (`.h5`) files.

## Configuration

| Variable | Description | Default |
|----------|-------------|---------|
| `SLTK_CORS_ORIGINS` | Allowed CORS origins | `http://localhost:5173,http://localhost:3000` |
| `SLTK_ALLOWED_PATHS` | Filesystem whitelist for API | `/vol/research,/home` |
| `SLTK_WILOR_CHECKPOINT` | WiLoR model checkpoint | auto-resolved |
| `SLTK_WILOR_DETECTOR` | WiLoR hand detector | auto-resolved |
| `SLTK_SIGNREP_CHECKPOINT` | SignRep model checkpoint | auto-resolved |
| `SLTK_SEGMENTOR_CHECKPOINT` | Segmenter checkpoint | auto-resolved |
| `SLTK_NLF_MODEL_PATH` | NLF model path | — |

## Testing

```bash
pytest                       # Full suite (1500+ tests)
pytest -m "not slow"         # Skip slow tests
pytest -m api                # API tests only
pytest --cov=sltk            # With coverage report
```

## License

CC-BY-NC-4.0
