Metadata-Version: 2.4
Name: parseimagenet
Version: 1.0.3
Summary: Extract ImageNet image paths by category keywords
Author-email: Reed Turgeon <turgeon.dev@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/MrT3313/Parse-ImageNet
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: jupyter; extra == "dev"
Requires-Dist: ipykernel; extra == "dev"
Dynamic: license-file

# ParseImageNet

Extract image file paths from ImageNet by matching category keywords. Useful for creating custom subsets of ImageNet for training or evaluation.

Python Version,https://img.shields.io/pypi/pyversions/parseimagenet
License,https://img.shields.io/github/license/MrT3313/Parse-ImageNet
Build Status,https://github.com/MrT3313/Parse-ImageNet/actions/workflows/main.yml/badge.svg

## [Kaggle Dataset](https://www.kaggle.com/competitions/imagenet-object-localization-challenge/data)

## Prerequisites

- Python 3.8+
- ImageNet dataset (or a subset) with the standard ILSVRC directory structure:
  ```
  ImageNet-Subset/
  ├── LOC_synset_mapping.txt
  └── ILSVRC/
      ├── ImageSets/
      │   └── CLS-LOC/
      │       └── train_cls.txt
      └── Data/
          └── CLS-LOC/
              └── train/
                  ├── n01440764/
                  │   ├── n01440764_10026.JPEG
                  │   └── ...
                  └── ...
  ```

## Installation

Clone the repository:

```bash
git clone https://github.com/MrT3313/Parse-ImageNet.git
```

Then install the package into the environment where you run Jupyter:

```bash
# Using pip
pip install -e /path/to/ParseImageNet
# ex: pip install -e /Users/mrt/Documents/MrT/code/computer-vision/ParseImageNet
```

The `-e` flag installs in "editable" mode, so code changes are immediately available without reinstalling. However, changes to package metadata (version, dependencies) in `pyproject.toml` still require running `pip install -e .` again.

## Usage

> [!NOTE]
> 
> [Example Notebook](/DOCS/ExampleNotebook.ipynb)

### In Jupyter Lab / Jupyter Notebook

```python
from pathlib import Path
from parseimagenet import get_image_paths_by_keywords

# Set the path to your ImageNet directory
base_path = Path('/path/to/your/ImageNet-Subset')
# ex: /Users/mrt/Documents/MrT/code/computer-vision/image-bank/ImageNet-Subset

# Use the default "birds" preset
image_paths = get_image_paths_by_keywords(base_path=base_path)

# image_paths is a list of Path objects
print(f"Found {len(image_paths)} images")
print(image_paths[:5])
```

#### Using Preset Keywords

Presets are predefined keyword lists for common categories:

```python
from parseimagenet import get_image_paths_by_keywords # main function
from parseimagenet import get_available_presets, KEYWORD_PRESETS # helpers

# See available presets
print(get_available_presets())  # ['birds']

# Use a specific preset
image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    preset="birds",
    num_images=200
)

# Access preset keywords directly
print(KEYWORD_PRESETS["birds"])
```

#### Using Custom Keywords

Custom keywords override the preset:

```python
image_paths = get_image_paths_by_keywords(
    base_path=base_path,
    keywords=['dog', 'puppy', 'hound'],
    num_images=100
)
```

> [!NOTE]
> 
> you can find all applicable categories in the `LOC_synset_mapping.txt` file

### Command Line

```bash
# Use default preset (birds)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset

# Use a specific preset
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --preset birds --num_images 100

# Use custom keywords (overrides preset)
python -m parseimagenet.ParseImageNetSubset --base_path /path/to/ImageNet-Subset --keywords dog puppy --num_images 100
```
