Metadata-Version: 2.1
Name: pygidl
Version: 0.0.2
Summary: Asynchronously download Google Images search results
Home-page: https://github.com/parameter-concern/pygidl
Author: Cameron Carpenter
Author-email: parameter.concern@gmail.com
License: MIT
Download-URL: https://github.com/parameter-concern/pygidl/releases/
Project-URL: Bug Tracker, https://github.com/parameter-concern/pygidl/issues/
Project-URL: Source Code, https://github.com/parameter-concern/pygidl/
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: aiohttp[speedups]
Requires-Dist: beautifulsoup4
Requires-Dist: python-slugify[unidecode]
Requires-Dist: tqdm

# Python Google Images Downloader

This tool lets you download results from Google Images pretty fast. It uses
[aiohttp](https://docs.aiohttp.org/en/stable/) under the hood. It is essentially
a Python 3.7 rewrite of
[google-images-downloader](https://github.com/hardikvasa/google-images-download)
without the restriction of not using external dependencies.

## Installation

The following should work to install the utility and check that it will run:

```bash
pip install pygidl
pygidl -h
```

## Basic Usage

From the command line:

basic usage is something like `pygidl "cats and dogs"`. This will create an
output directory in your current working directory named for the timestamp that
the command was run. Underneath that directory will be another directory with
the slugified version of your query string. Underneath that directory will be
the downloaded images, named by their sha256 hashes and file type extensions:

```bash
pygidl "cats and dogs"
tree .
.
└── 2020-01-21-17-57-34
    └── cats-and-dogs
        ├── 01d2dde343a45e3a1fcc5e7cd3cace33398c9b06a97e494d4329f264e57d5f57.jpg
        ├── 026cef34db26cbd5fa246bc720c1234b39ffa07737e43523b160390c13d5d3e6.jpeg
        ├── 03a0f2ebeed5d91acaed73ad303bd724767e101688e33d1a5557cca9139972d7.webp
        ├── 0affcf3198b40063e9302c4515380d6796946098c7c1c3c043072815e29e2770.jpeg
        ...
```


## Advanced Usage

The command can be configured to support several more complex query scenarios:

### Verbose Output

Use the `-v/--verbose` flag to change the log level to show more messages. The
log level is "WARNING" by default. Supplying `-v` once sets it to "INFO", and
two or more times sets it to "DEBUG".

### Prefixes and Suffixes

The `-p/--prefix` and `-s/--suffix` flags can be used to run multiple copies of
the same Google Images query with extra prefix or suffix strings. For example:

```bash
pygidl -p Andorra -p Angola -s "on a ship" -s "on a plane" flag
tree -d .
.
└── 2020-01-21-18-15-59
    ├── andorra-flag-on-a-plane
    ├── andorra-flag-on-a-ship
    ├── angola-flag-on-a-plane
    └── angola-flag-on-a-ship

5 directories
```

### Output Groups

You can override the name of the directory that contains the results of each
query with the `-g/--group` flag. For example:


```bash
pygidl -g "Cute Animals" -p fluffy -p adorable -s dog -s cat "" -v
tree . -d
.
└── cute-animals
    ├── adorable-cat
    ├── adorable-dog
    ├── fluffy-cat
    └── fluffy-dog

5 directories
```

### Face Search

You can tell Google Images to find faces with the `-f/--face` flag. For example:

```bash
pygidl -g "Tom Hanks" -p "" -p young "Tom Hanks" -s "" -s "Oscars" -f -v
tree -d .
.
└── tom-hanks
    ├── tom-hanks
    ├── tom-hanks-oscars
    ├── young-tom-hanks
    └── young-tom-hanks-oscars

5 directories
```

## Programmatic Usage

Something like the following should work (assuming you have `opencv-python`
installed in your environment):

```python
import asyncio
import os

import cv2

from pygidl import scrape_google_images


downloaded_image_paths = asyncio.run(
    scrape_google_images(
        base_query="cats and dogs",
        prefixes=["cute", "adorable"],
        suffixes=["playing", "running"],
        group="cute-animals",
        output_dir=os.getcwd(),
        face=False,
    )
)
for path in downloaded_image_paths:
    image = cv2.imread(path)
    if image is None:
        print(f"could not load image {path}")
        continue
    height, width = image.shape[:2]
    print(f"image {path} has size {width}x{height}")
```

## Known Issues and Limitations

- Only returns max of 100 results per query
- Doesn't support full range of advanced search options
- No tests
- No retries
- No report on results/metadata output option
- Sometimes Google returns results from a different template without
  easily-parseable metadata


## Contributing

I don't think anyone will ever get this far, but if you want to open a pull
request (or even better, take over ownership of the project for me!), go for it.
At a minimum, new code should have type hints, docstrings, and be auto-formatted
with `black` with an 80-character max line length. Even better would be some
tests and Sphinx documentation!


