Metadata-Version: 2.4
Name: openglottal
Version: 0.2.2
Summary: Automated glottal area segmentation from high-speed videoendoscopy
License: MIT License
        
        Copyright (c) 2024 OpenGlottal Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Keywords: glottis,segmentation,vocal-folds,laryngoscopy,medical-imaging
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Scientific/Engineering :: Image Processing
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: h5py>=3.8
Requires-Dist: torch>=2.0
Requires-Dist: torchvision>=0.15
Requires-Dist: ultralytics>=8.0
Requires-Dist: opencv-python>=4.8
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.10
Requires-Dist: matplotlib>=3.7
Requires-Dist: tqdm>=4.65
Requires-Dist: tensorboard>=2.14
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs; extra == "docs"
Requires-Dist: mkdocs-material; extra == "docs"
Provides-Extra: gui
Requires-Dist: PyQt5>=5.15; extra == "gui"
Dynamic: license-file

# OpenGlottal

![Patient 1 montage](https://raw.githubusercontent.com/hari-krishnan/openglottal/main/paper_cross/patient1_montage.png)  
*Frame montage: segmentation across a glottal cycle.*

![L/R asymmetry in vocal fold nodules](https://raw.githubusercontent.com/hari-krishnan/openglottal/main/paper_cross/qt_gui_lr_nodule.png)  
*Quantifying L/R asymmetry in a case of vocal fold nodules using the dynamic medial probe and independent displacement waveforms.*

Open-source toolkit for automated glottal area segmentation from high-speed videoendoscopy (HSV). Beyond per-frame masks, OpenGlottal extracts **glottal area waveforms (GAW)** and **left/right (L/R) displacement waveforms** along a configurable medial axis — enabling quantification of L/R asymmetry and kinematic features (open quotient, F0, periodicity) for clinical assessment.

Building on previous research into nodule-induced vibratory irregularities (Patel, Unnikrishnan, & Donohue, 2016), OpenGlottal provides the first open-source GUI to quantify these L/R asymmetries in real time.

**Author:** Harikrishnan Unnikrishnan (hari@orchard-robotics.com)

**Paper:** [**arXiv:2603.02087**](https://arxiv.org/abs/2603.02087) — *A Detection-Gated Pipeline for Robust Glottal Area Waveform Extraction and Clinical Pathology Assessment*

OpenGlottal combines a YOLOv8 glottis detector, a U-Net pixel-level segmenter, and a temporal vocal fold tracker into a single, reproducible inference and training pipeline — trained and evaluated on the [GIRAFE dataset](https://zenodo.org/records/13773163) ([dataset paper](https://doi.org/10.1016/j.dib.2025.111376)) and [BAGLS](https://zenodo.org/records/3762320) ([Scientific Data, 2020](https://doi.org/10.1038/s41597-020-0526-3)).

---

## Pipelines

Five pipelines are provided (four YOLO-gated and one U-Net-only):

| Pipeline | Flag | Description |
|----------|------|-------------|
| **1 — VFT** | `vft` | YOLO detects the glottis → crop (size locked to first detection) → motion-based VocalFoldTracker |
| **2 — Guided VFT** | `guided-vft` | YOLO bbox as ROI mask on the full frame → YOLOGuidedVFT (no cropping) |
| **3 — YOLO+UNet** | `unet` | YOLO + full-frame U-Net: detection-gated (output only on frames where YOLO fires) |
| **4 — YOLO-Crop+UNet** | `yolo-crop+unet` | YOLO crop → resize to 256×256 → crop-trained U-Net → project mask back. Weights are provided; use `--crop-weights` in `eval_girafe.py` / `eval_bagls.py`. Once tests validate, these weights will be committed to the repo. |
| **5 — U-Net only** | `unet-only` | Full-frame U-Net only, no YOLO gate (only `--unet-weights` required) |

All pipelines produce a per-frame **glottal area waveform**. The Qt GUI and CLI further support **L/R displacement waveform** extraction: independent left and right vocal-fold excursions relative to a medial axis, with a configurable sampling position (e.g. anterior–middle junction) to isolate asymmetry. Kinematic features (open quotient, fundamental frequency, periodicity, etc.) are computed from area or L/R waveforms for downstream clinical analysis.

---

## Installation

Use a virtual environment in the repo so all commands use the same Python and dependencies.

```bash
git clone https://github.com/hari-krishnan/openglottal.git
cd openglottal
python3 -m venv .venv
source .venv/bin/activate   # Linux/macOS; on Windows: .venv\Scripts\activate
pip install -e ".[dev]"
```

From a local clone (venv already exists):

```bash
cd /path/to/openglottal
source .venv/bin/activate
pip install -e ".[dev]"    # only if you need to reinstall
```

**Always use this venv** for training, evaluation, and CLI: activate it (`source .venv/bin/activate`) before running any `python`/`openglottal` commands, or use the `./run` script (e.g. `./run scripts/train_unet.py ...`) which uses `.venv/bin/python` automatically.

For the **Qt GUI**, install the optional GUI extra: `pip install -e ".[gui]"` (adds PyQt5). See [Qt GUI](#qt-gui) below.

Install from PyPI after a release is uploaded: `pip install openglottal` (optional extras: `pip install "openglottal[gui]"`). Publishing to PyPI is done **manually** (see [CONTRIBUTING.md](CONTRIBUTING.md#local-pypi-upload-optional)).

**Requirements:** Python ≥ 3.9, PyTorch ≥ 2.0, Ultralytics ≥ 8.0, OpenCV ≥ 4.8

**Weights:** Place pre-trained weights in `weights/` (or pass full paths to the scripts). Provided weights:

| Weights file | Description |
|--------------|-------------|
| `weights/og_girafe_unet_full.pt` | GIRAFE-trained U-Net (full-frame, no crop) |
| `weights/og_girafe_unet_crop.pt` | GIRAFE-trained U-Net (crop mode; use with YOLO-Crop+UNet pipeline) |
| `weights/og_girafe_yolo.pt` | GIRAFE-trained YOLO glottis detector |
| `weights/og_bagls_unet_full.pt` | BAGLS-trained U-Net (full-frame, no crop) |
| `weights/og_bagls_unet_crop.pt` | BAGLS-trained U-Net (crop mode; use with YOLO-Crop+UNet on BAGLS in-distribution) |
| `weights/og_bagls_yolo.pt` | BAGLS-trained YOLO glottis detector |

To train your own, see [Training](#training); trained models are saved under `outputs/`.

---

## Quick Start

### Python API

```python
import torch
from openglottal import TemporalDetector, UNet, extract_features_unet

device = torch.device("mps")   # or "cuda" / "cpu"

detector = TemporalDetector("weights/og_girafe_yolo.pt")

model = UNet(1, 1, (32, 64, 128, 256)).to(device)
model.load_state_dict(torch.load("weights/og_girafe_unet_full.pt", map_location=device))
model.eval()

features = extract_features_unet("video.avi", detector, model, device)
print(features)
# {'area_mean': 312.4, 'area_std': 98.1, 'open_quotient': 0.61, 'f0': 0.017, ...}
```

### CLI

```bash
# U-Net pipeline (recommended)
openglottal run video.avi \
    --yolo-weights weights/og_girafe_yolo.pt \
    --unet-weights weights/og_girafe_unet_full.pt \
    --pipeline unet \
    --output results/

# Motion-based pipeline (no U-Net weights needed)
openglottal run video.avi \
    --yolo-weights weights/og_girafe_yolo.pt \
    --pipeline guided-vft \
    --output results/
```

### Qt GUI

Desktop app for viewing HSV videos with segmentation overlay, midline/AC–PC axes, **L/R displacement waveforms**, and kinematic metrics (open quotient, F0, periodicity, etc.). Quantifying L/R asymmetry in cases such as vocal fold nodules is a key use case: the dynamic medial probe and independent left/right waveforms isolate nodule-induced irregularities.

**Pathology / L/R asymmetry** — For clinicians, the L/R displacement waveform is the main diagnostic view (see screenshot at top).

```bash
pip install -e ".[gui]"
openglottal-gui
```

Load a video, choose detector and U-Net weights from `weights/`, set frame range, and use Play/Pause or the timeline slider. Optional `metadata.json` next to the video supplies FPS and other keys. See [openglottal/qt_app/README.md](openglottal/qt_app/README.md) for details.

---

## Evaluation

### GIRAFE (4 test patients, 80 frames)

Compare against the GIRAFE paper baselines using GIRAFE-trained weights:

```bash
python scripts/eval_girafe.py \
    --images-dir       GIRAFE/Training/imagesTr \
    --labels-dir       GIRAFE/Training/labelsTr \
    --training-json    GIRAFE/Training/training.json \
    --unet-weights     weights/og_girafe_unet_full.pt \
    --yolo-weights     weights/og_girafe_yolo.pt \
    --device           mps
```

Results are printed alongside the published GIRAFE baselines for direct comparison.  Pass `--output-json results.json` to save raw per-frame scores.

### Results (GIRAFE test split, 4 patients, 80 frames)

| Method | Det.Recall | Dice | IoU | Dice≥0.5 |
|--------|-----------|------|-----|----------|
| InP (GIRAFE paper) | n/a | 0.71 | n/a | n/a |
| U-Net (GIRAFE paper) | n/a | 0.64 | n/a | n/a |
| SwinUNetV2 (paper) | n/a | 0.62 | n/a | n/a |
| **U-Net only** | n/a | **0.81** | **0.70** | **96.2%** |
| OTSU (baseline) | 0.95 | 0.22 | 0.13 | 2.5% |
| YOLO+UNet | 0.95 | 0.75 | 0.64 | 88.8% |
| YOLO-Crop+UNet† | 0.95 | 0.70 | 0.57 | 77.5% |
| Motion (baseline) | 0.95 | 0.27 | 0.17 | 9.7% |

- **Det.Recall** — fraction of frames where YOLO detected a glottis
- **Dice** — mean Dice coefficient across all test frames (higher is better)
- **Dice≥0.5** — fraction of frames meeting the clinical pass threshold

† YOLO-Crop+UNet uses the crop-trained U-Net: `weights/og_girafe_unet_crop.pt`. Full-frame and crop weight files are not interchangeable.

YOLO+Motion underperforms because GIRAFE test frames are the first 20 frames per patient, providing insufficient temporal context for the motion tracker to converge.

### BAGLS cross-dataset (3 500 test frames, GIRAFE-trained only)

No BAGLS data used in training. Use GIRAFE-trained weights; images are letterboxed to 256×256.

```bash
python scripts/eval_bagls.py \
    --bagls-dir    BAGLS/test \
    --unet-weights weights/og_girafe_unet_full.pt \
    --crop-weights weights/og_girafe_unet_crop.pt \
    --yolo-weights weights/og_girafe_yolo.pt \
    --device       mps
```

| Method | Det.Recall | Dice | IoU | Dice≥0.5 |
|--------|-----------|------|-----|----------|
| U-Net only | 1.00 | 0.59 | 0.50 | 67.1% |
| YOLO+UNet | 0.69 | 0.55 | 0.47 | 61.9% |
| **YOLO-Crop+UNet** | **0.69** | **0.61** | **0.53** | **70.3%** |

*(Table at default confidence $\tau=0.25$. With `--conf 0.02`, YOLO-Crop+UNet reaches Dice 0.64 and Dice≥0.5 76.4%; see paper and `sweep_bagls_conf.py`.)*

YOLO-Crop+UNet is the strongest pipeline on the unseen BAGLS data (+2 pp Dice, +3.2 pp Dice≥0.5 over U-Net alone at default $\tau$; at $\tau=0.02$, +5 pp Dice and +9.3 pp Dice≥0.5), despite YOLO only detecting on 68.8% of frames (domain shift from GIRAFE).  When YOLO does fire, cropping and re-scaling the region of interest gives U-Net higher effective resolution and cleaner context — benefits that generalise across datasets.

### BAGLS in-distribution (3 500 test frames, BAGLS-trained weights)

Use the provided BAGLS-trained U-Net (full-frame and crop) and YOLO weights:

```bash
python scripts/eval_bagls.py \
    --bagls-dir    BAGLS/test \
    --unet-weights weights/og_bagls_unet_full.pt \
    --crop-weights weights/og_bagls_unet_crop.pt \
    --yolo-weights weights/og_bagls_yolo.pt \
    --device       mps
```

On the 3 500-frame BAGLS test set this configuration achieves:

| Method             | Det.Recall | Dice  | IoU   | Dice≥0.5 |
|--------------------|-----------:|------:|------:|---------:|
| **U-Net only**     | 1.00       | 0.85  | 0.77  | 94.0%    |
| **YOLO+UNet**      | 0.87       | **0.85**  | **0.78**  | **94.6%** |
| YOLO-Crop+UNet     | 0.87       | 0.74  | 0.64  | 87.1%    |

So BAGLS-trained YOLO+UNet sets a strong in-distribution baseline (Dice 0.85), while the GIRAFE-trained YOLO-Crop+UNet remains the best cross-dataset configuration.

---

## Training

### 1. Build the YOLO dataset

```bash
openglottal build-dataset \
    --images-dir  GIRAFE/Training/imagesTr \
    --labels-dir  GIRAFE/Training/labelsTr \
    --training-json GIRAFE/Training/training.json \
    --output-dir  yolo_data
```

Or via script:

```bash
python scripts/train_yolo.py \
    --images-dir  GIRAFE/Training/imagesTr \
    --labels-dir  GIRAFE/Training/labelsTr \
    --training-json GIRAFE/Training/training.json \
    --epochs 100
```

YOLO saves best weights to `outputs/yolo/exp/weights/best.pt` (default run name `exp`). Use that path for `train_unet_crop.py` below, or copy to `weights/og_girafe_yolo.pt` for eval/CLI.

### 2. Train the U-Net (full-frame mode)

```bash
python scripts/train_unet.py \
    --images-dir  GIRAFE/Training/imagesTr \
    --labels-dir  GIRAFE/Training/labelsTr \
    --training-json GIRAFE/Training/training.json \
    --output outputs/og_girafe_unet_full.pt \
    --epochs 50
```

### 3. Train the U-Net (YOLO-crop mode — higher effective resolution)

Uses YOLO to crop each training image to the glottis region, resizes to 256×256, and trains U-Net on these patches.  At inference time the YOLO-Crop+UNet pipeline crops the input, runs U-Net, and projects the mask back to full-frame coordinates.

```bash
python scripts/train_unet_crop.py \
    --images-dir    GIRAFE/Training/imagesTr \
    --labels-dir    GIRAFE/Training/labelsTr \
    --training-json GIRAFE/Training/training.json \
    --yolo-weights  outputs/yolo/exp/weights/best.pt \
    --output        outputs/og_girafe_unet_crop.pt \
    --crop-size     256 \
    --epochs        50 \
    --device        cpu
```

**Device:** Use `--device cuda` on a Linux GPU machine or cloud instance. On Mac, use `--device cpu` for training (MPS is not used for training); use `--device mps` for evaluation (`eval_bagls.py`, `eval_girafe.py`, `analyze_gaw.py`) for faster inference.

Both training modes use a **50/50 BCE + Dice loss** with cosine annealing, saving the best validation checkpoint automatically.

---

## Repository Structure

```
openglottal/
├── weights/                 # pre-trained weights (see table above)
├── outputs/                 # trained models (gitignored) when training your own
├── openglottal/
│   ├── models/
│   │   ├── detector.py     # TemporalDetector — YOLOv8 + temporal box locking
│   │   ├── tracker.py      # VocalFoldTracker, YOLOGuidedVFT
│   │   └── unet.py         # UNet, DoubleConv, GlottisDataset (with augmentation)
│   ├── qt_app/             # Qt GUI — overlay, midline, waveform, openglottal-gui
│   │   ├── main.py         # Main window, inference worker, playback
│   │   ├── utils.py        # Overlay, axes/AC-PC drawing
│   │   └── README.md       # GUI usage and features
│   ├── metadata.py         # Video I/O, metadata.json, frame loading (used by GUI)
│   ├── features.py         # extract_features_{detector,yolo_guided_vft,unet}
│   ├── data.py             # mask_to_yolo, build_yolo_dataset
│   ├── utils.py            # I/O helpers, dice/IoU metrics, unet_segment_frame
│   └── cli.py              # `openglottal` command-line interface
├── scripts/
│   ├── train_yolo.py       # standalone YOLO training script
│   ├── train_unet.py       # standalone U-Net training script
│   ├── infer.py            # batch inference: AVI dir or image sequence → _out.avi
│   ├── eval_girafe.py      # per-patient test evaluation vs GIRAFE baselines
│   ├── eval_bagls.py       # cross-dataset evaluation on BAGLS (3 500 frames)
│   ├── analyze_gaw.py      # GAW feature analysis: Healthy vs Pathological (65 patients)
│   ├── download_datasets.py # download GIRAFE and BAGLS from Zenodo
│   ├── prepare_girafe_splits.py  # build GIRAFE training.json from images/labels
│   ├── prepare_bagls_splits.py   # build BAGLS train/test splits
│   ├── make_montage.py     # build frame montage PNGs for paper figures
│   ├── sweep_bagls_conf.py # YOLO confidence threshold sweep on BAGLS
│   └── train_unet_crop.py  # train U-Net on YOLO-cropped patches (higher res)
└── configs/
    └── default.yaml        # all hyperparameters documented in one place
```

---

## Dataset

OpenGlottal is developed and evaluated on **GIRAFE** and **BAGLS**. Download them (optional: use the script) then point the training/eval scripts at the extracted directories.

```bash
# Download GIRAFE and/or BAGLS to the current directory (or use --output-dir)
python scripts/download_datasets.py --girafe --bagls
```

- **GIRAFE** (Zenodo): [zenodo.org/records/13773163](https://zenodo.org/records/13773163) — 760 frames (256×256 px), expert-annotated glottal masks. Dataset paper: [Data in Brief (2025)](https://doi.org/10.1016/j.dib.2025.111376). After unpacking: `GIRAFE/Training/imagesTr/`, `GIRAFE/Training/labelsTr/`, `GIRAFE/Training/training.json`; raw videos: `GIRAFE/Raw_Data/`. If you have a copy in `./sdsc/glottal_area`, copy it over then run the prepare script:

  ```bash
  mkdir -p GIRAFE/Training && cp -r sdsc/glottal_area/imagesTr sdsc/glottal_area/labelsTr GIRAFE/Training/
  python scripts/prepare_girafe_splits.py --images-dir GIRAFE/Training/imagesTr --labels-dir GIRAFE/Training/labelsTr --output GIRAFE/Training/training.json
  ```
- **BAGLS** (Zenodo): [zenodo.org/records/3762320](https://zenodo.org/records/3762320) — benchmark for automatic glottis segmentation. Dataset paper: [Gómez et al., Scientific Data (2020)](https://doi.org/10.1038/s41597-020-0526-3). After downloading with `download_datasets.py --bagls`, use `BAGLS/test` as `--bagls-dir` for eval and `BAGLS/training` for training.

| Split (GIRAFE) | Frames |
|----------------|--------|
| Train | ~608 |
| Val | ~76 |
| Test | ~76 |

---

## Kinematic Features

The following scalar features are extracted from each glottal area waveform (or from the L/R-derived opening signal when using displacement mode):

| Feature | Description |
|---------|-------------|
| `area_mean` | Mean glottal area (px²) |
| `area_std` | Standard deviation of area |
| `area_range` | Max − min area |
| `open_quotient` | Fraction of cycle with area above 10 % of mean |
| `f0` | Dominant frequency from FFT (cycles/frame; multiply by capture fps for Hz) |
| `periodicity` | Peak autocorrelation at lags 1–50 |
| `cv` | Coefficient of variation (std / mean) |

**Spatial kinematics (L/R asymmetry)**  
- **L/R displacement:** Independent tracking of left and right vocal-fold excursions (px) relative to the medial axis. Exported as separate time series (L, R, L−R, area) via the GUI “Save analysis” or the CLI `openglottal displacement` command.  
- **Dynamic medial probe:** Configurable sampling position (0–1 along the anterior–posterior axis) so analysis can be run at a specific anatomical point (e.g. junction of anterior and middle thirds) to isolate nodule-induced irregularities.

---

## Glottal Area Waveform and L/R Displacement

Beyond frame-level segmentation, the pipeline produces a **Glottal Area Waveform (GAW)** — the per-frame glottal area over time — and optionally **L/R displacement waveforms** (left and right excursions relative to the medial axis). Kinematic features can be extracted from either GAW or L/R-derived signals and used for clinical classification and asymmetry assessment.

```bash
python scripts/analyze_gaw.py \
    --raw-data-dir  GIRAFE/Raw_Data \
    --yolo-weights  weights/og_girafe_yolo.pt \
    --unet-weights  weights/og_girafe_unet_full.pt \
    --device        mps \
    --output-dir    results/gaw
```

This script processes all 65 GIRAFE patients, extracts kinematic features from each area waveform, and compares **Healthy** vs **Pathological** groups using Mann-Whitney U tests.

### Key Findings (65 patients: 15 Healthy, 25 Pathological, 25 Unknown)

Because the cohort has a significant sex imbalance (see note below), results are reported **stratified by sex** rather than pooled.

**Female subgroup (12 H / 11 P):**

| Feature | Healthy (mean±std) | Pathological (mean±std) | p-value |
|---------|-------------------|------------------------|---------|
| area_mean | 125.2 ± 43.1 | 247.8 ± 204.6 | 0.230 |
| area_std | 112.9 ± 32.2 | 118.9 ± 96.0 | 0.406 |
| area_range | 336.7 ± 97.6 | 375.5 ± 272.2 | 0.559 |
| open_quotient | 0.760 ± 0.207 | 0.874 ± 0.131 | 0.192 |
| f0 | 241.7 ± 34.8 Hz | 203.5 ± 73.6 Hz | 0.156 |
| periodicity | 0.955 ± 0.008 | 0.946 ± 0.013 | 0.255 |
| cv | **0.95 ± 0.20** | **0.57 ± 0.29** | **0.006*** |

**Male subgroup (3 H / 14 P):**

| Feature | Healthy (mean±std) | Pathological (mean±std) | p-value |
|---------|-------------------|------------------------|---------|
| area_mean | 192.1 ± 18.3 | 172.7 ± 94.0 | 0.768 |
| area_std | 142.7 ± 35.0 | 92.0 ± 66.9 | 0.197 |
| area_range | 439.7 ± 86.7 | 343.1 ± 212.3 | 0.488 |
| open_quotient | 0.860 ± 0.145 | 0.843 ± 0.186 | 1.000 |
| f0 | 183.3 ± 75.0 Hz | 82.5 ± 79.3 Hz | 0.169 |
| periodicity | 0.962 ± 0.001 | 0.900 ± 0.116 | 0.068 |
| cv | 0.75 ± 0.19 | 0.63 ± 0.40 | 0.509 |

\* p < 0.05 (Mann-Whitney U, two-sided)

> **Sex imbalance note:** The Healthy group is 80% female (12F/3M) while the Pathological group is 56% male (14M/11F; Fisher's exact p=0.025). Because f0 is strongly sex-dependent (males ~100 Hz vs females ~224 Hz), pooling would confound f0. After stratifying, only **cv** (coefficient of variation) reaches significance in the female subgroup (p=0.006). In the male subgroup (n=3 Healthy), cv trends in the same direction (0.75 vs 0.63) but does not reach significance (p=0.509); periodicity approaches significance (p=0.068). The male subgroup is too small for reliable inference.

### Production Robustness

YOLO acts as a **detection gate**: when the endoscope moves away from the glottis (scope insertion, patient coughing, instrument in view), YOLO fires no detection and the area is set to zero — preventing spurious waveform spikes that would corrupt downstream feature extraction.

---

## Citation

If you use OpenGlottal in your research, please cite:

**H. Unnikrishnan.** *A Detection-Gated Pipeline for Robust Glottal Area Waveform Extraction and Clinical Pathology Assessment.* arXiv:2603.02087 [cs.CV], 2 Mar 2026.
[https://arxiv.org/abs/2603.02087](https://arxiv.org/abs/2603.02087)

```bibtex
@misc{unnikrishnan2026openglottal,
  title         = {A Detection-Gated Pipeline for Robust Glottal Area
                   Waveform Extraction and Clinical Pathology Assessment},
  author        = {Unnikrishnan, Harikrishnan},
  year          = {2026},
  eprint        = {2603.02087},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2603.02087}
}
```

Related prior work on glottal kinematics and high-speed imaging:

```bibtex
@article{patel2013invivo,
  title   = {In vivo measurement of pediatric vocal fold motion using
             structured light laser projection},
  author  = {Patel, Rita R and Donohue, Kevin D and Lau, Daniel and
             Unnikrishnan, Harikrishnan},
  journal = {The Laryngoscope},
  year    = {2013}
}

@article{patel2016effects,
  title   = {Effects of vocal fold nodules on glottal cycle measurements
             derived from high-speed digital imaging},
  author  = {Patel, Rita R and Unnikrishnan, Harikrishnan and Donohue, Kevin D},
  journal = {Journal of Speech, Language, and Hearing Research},
  year    = {2016}
}
```

---

## Research software notice

OpenGlottal is **research and educational software**. It is **not** a medical device and is **not** intended for diagnosis, treatment, or clinical decision-making. Outputs may be inaccurate; always rely on qualified clinical judgment and validated workflows where patient care is involved.

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for setup, tests, and linting.

## Citing

If you use this work in research, cite the [arXiv preprint](https://arxiv.org/abs/2603.02087). Repository metadata for citation managers is in [CITATION.cff](CITATION.cff).

---

## License

MIT License — see [LICENSE](LICENSE).
