Metadata-Version: 2.4
Name: phasenet
Version: 0.2.0
Summary: A PyTorch implementation of PhaseNet for seismic and DAS phase picking
Author: Weiqiang Zhu
License-Expression: MIT
Project-URL: Homepage, https://github.com/AI4EPS/phasenet-pytorch
Project-URL: Repository, https://github.com/AI4EPS/phasenet-pytorch
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: torch
Requires-Dist: torchvision
Requires-Dist: einops
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: h5py
Requires-Dist: matplotlib
Requires-Dist: pandas
Requires-Dist: tqdm
Requires-Dist: fsspec
Requires-Dist: obspy
Requires-Dist: gcsfs
Requires-Dist: datasets
Requires-Dist: pyarrow
Requires-Dist: wandb

# PhaseNet-PyTorch

PyTorch implementation of PhaseNet for seismic and DAS phase picking, event detection, and polarity classification.

## Models

| Model | Features | Data Type |
|-------|----------|-----------|
| `phasenet` | Phase (P/S) picking | Seismic 3-component |
| `phasenet_tf` | Phase picking + STFT spectrogram | Seismic 3-component |
| `phasenet_plus` | Phase + polarity + event detection | Seismic 3-component |
| `phasenet_tf_plus` | Phase + polarity + event detection + STFT | Seismic 3-component |
| `phasenet_das` | Phase picking | DAS single-channel |
| `phasenet_das_plus` | Phase + event detection | DAS single-channel |

The `_tf` variants add a Short-Time Fourier Transform (STFT) branch that extracts frequency features alongside the temporal waveform, improving performance on noisy data.

## Prediction

### CEED (Seismic) Prediction

```bash
# Demo: process a few events with plots
python scripts/predict_ceed.py --n-events 5

# Process all days for a year (saves parquets to results/ceed/)
python scripts/predict_ceed.py --all --year 2025 --output-dir results/ceed
```

Output: one parquet per day file in `results/ceed/{region}/`, with columns:
`event_id, station_id, waveform_index, origin_id, origin_index, origin_time, phase_index, phase_time, phase_score, phase_type, phase_polarity`

### DAS Prediction

```bash
# Predict with base PhaseNet on a HuggingFace subset
python scripts/predict_das.py --subset arcata --plot

# Predict with a trained DAS model on local data
python scripts/predict_das.py \
    --data-dir data/quakeflow_das/arcata/data \
    --model-type phasenet_das_plus \
    --checkpoint output/train_das_arcata/checkpoint.pth \
    --output-dir results/das/train_das_arcata/arcata \
    --no-ema --plot

# Predict from a file list
python scripts/predict_das.py \
    --file-list file_list.txt \
    --model-type phasenet_das_plus \
    --checkpoint output/model.pth \
    --plot

# Multi-GPU prediction
bash scripts/predict_das.sh phasenet_das_plus arcata 8 output/model.pth
```

Output: one parquet per event in `results/das/{model_name}/{subset}/`, with columns:
`event_id, channel_index, origin_id, origin_index, origin_time, phase_index, phase_time, phase_score, phase_type, dt_s, ps_center, ps_interval`

By default, picks are associated using P-S pairing. Use `--use-event-head` to associate via the model's event detection head instead.

## Training

### CEED (Seismic) Training

```bash
python train.py \
    --model phasenet_plus \
    --dataset-type ceed \
    --label-path results/ceed \
    --nx 16 \
    --max-iters 100000 \
    --batch-size 8 \
    --workers 4 \
    --lr 3e-4 \
    --eval-interval 5000 \
    --output-dir output/train_ceed
```

### DAS Training

```bash
# Using the training script
bash scripts/train_das.sh 0 arcata v26

# Or directly
python train.py \
    --model phasenet_das_plus \
    --dataset-type das \
    --data-path data/quakeflow_das/arcata/data \
    --label-path results/das/phasenet/arcata/picks \
    --label-list results/das/phasenet/arcata/labels.txt \
    --nx 2048 --nt 4096 \
    --num-patch 16 \
    --max-iters 50000 \
    --batch-size 2 --workers 8 \
    --lr 1e-4 --weight-decay 0.01 \
    --model-ema --model-ema-decay 0.999 \
    --eval-interval 1000 --save-interval 1000 \
    --output-dir output/train_das_arcata_v26
```

### Key Training Options

| Option | Description | Default |
|--------|-------------|---------|
| `--num-patch N` | Random crops per DAS sample (amortizes IO) | 2 |
| `--model-ema` | Enable exponential moving average | off |
| `--gradient-accumulation-steps N` | Accumulate gradients for larger effective batch | 1 |
| `--clip-grad-norm V` | Gradient clipping | 1.0 |
| `--compile` | Enable torch.compile | off |
| `--resume --checkpoint PATH` | Resume from checkpoint | - |
| `--reset-lr` | Reset LR schedule when resuming | off |

## Semi-supervised Training

Iterative self-training pipeline for DAS: predict → train on predictions → predict with new model → repeat.

```bash
# Start from PhaseNet (train DAS model from scratch)
bash scripts/semisupervised_das.sh arcata 5 0

# Start from a pretrained DAS model
bash scripts/semisupervised_das.sh arcata 5 0 phasenet_das output/train_das_v26/checkpoint.pth
```

Arguments: `[subset] [num_iterations] [gpu] [start_from] [checkpoint]`

- **From phasenet**: iteration 0 predicts with PhaseNet, iteration 1 trains from scratch (10k steps, warmup), iterations 2+ continue (1k steps, no warmup)
- **From phasenet_das**: iteration 0 predicts with pretrained DAS model, iterations 1+ continue (1k steps, no warmup)

Results are saved to `results/semisupervised_das/` and checkpoints to `output/semisupervised_das_v{N}/`.
