Metadata-Version: 2.4
Name: cf-extractor
Version: 0.2.0
Summary: Python semantic data extractor for Context-Footprint (AST + pluggable resolvers)
Author: Leric Zhang
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/AiricDev/context-footprint
Project-URL: Repository, https://github.com/AiricDev/context-footprint
Project-URL: Documentation, https://github.com/AiricDev/context-footprint/tree/main/extractors/python
Keywords: context-footprint,semantic,extractor,ast,jedi,ty,python
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0
Requires-Dist: ty>=0.0.1
Provides-Extra: dev
Requires-Dist: jedi>=0.19.0; extra == "dev"
Requires-Dist: pyrefly>=0.57.1; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Dynamic: license-file

# cf-extractor

Python semantic data extractor for [Context-Footprint](https://github.com/context-footprint/context-footprint). Outputs SemanticData JSON using **Python AST** plus a pluggable resolver backend. `ty` is the default backend; `jedi` remains available as a baseline resolver, and other LSP-backed backends can be plugged in for comparison.

## As a dependency

This package is a dependency of `cftool`. When you install cftool via uv or pip, cf-extractor is installed automatically:

```bash
uv tool install cftool   # includes cf-extractor
```

## Standalone usage

You can also install and use cf-extractor directly:

```bash
# From PyPI (when published)
pip install cf-extractor

# From Git
uv pip install "cf-extractor @ git+https://github.com/context-footprint/context-footprint#subdirectory=extractors/python"

# Development
cd extractors/python
uv sync
```

Run the extractor:

```bash
cf-extract /path/to/python/project
# or
uv run cf-extract /path/to/python/project
# or
python -m cf_extractor.main /path/to/project
```

Without arguments, uses the current directory (`.`). Output is written to stdout (valid JSON for cftool).

Resolver backend options:

```bash
cf-extract /path/to/python/project
cf-extract /path/to/python/project --resolver-backend jedi
cf-extract /path/to/python/project --resolver-backend ty --ty-path /path/to/ty
cf-extract /path/to/python/project --resolver-backend pyrefly --pyrefly-path /path/to/pyrefly
```

Optional metrics output for benchmarking:

```bash
cf-extract /path/to/python/project --metrics-out metrics.json
cf-extract-benchmark \
  --dataset small=tests/fixtures \
  --dataset medium=/path/to/project \
  --dataset large=/path/to/project \
  --report-out benchmark.md
```

Backend diffing:

```bash
cf-extract-diff /path/to/python/project --left jedi --right ty --report-out diff.md
cf-extract-diff /path/to/python/project --left ty --right pyrefly --report-out diff.md
```

## Tests

```bash
uv run pytest tests/ -v
```

## Requirements

- Python >= 3.9
- pydantic, ty (installed automatically)
- `jedi` is only needed for baseline comparison with `--resolver-backend jedi`
- `pyrefly` is only needed for comparison with `--resolver-backend pyrefly`
