Metadata-Version: 2.4
Name: image-sitemap
Version: 2.1.0
Summary: Library to generate XML sitemaps for websites and images. Boost SEO by indexing image URLs for better visibility on search engines.
Author-email: AndreiDrang <python-captcha@pm.me>
License-Expression: MIT
Project-URL: Homepage, https://github.com/AndreiDrang/image_sitemap
Project-URL: Documentation, https://github.com/AndreiDrang/image_sitemap
Project-URL: Repository, https://github.com/AndreiDrang/image_sitemap
Project-URL: Issues, https://github.com/AndreiDrang/image_sitemap/issues
Project-URL: Changelog, https://github.com/AndreiDrang/image_sitemap/releases
Keywords: sitemap generator,image sitemap,XML sitemap,SEO optimization,Python SEO tools,website indexing,search engine visibility,image indexing,web crawler
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Framework :: AsyncIO
Classifier: Operating System :: Unix
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aiohttp==3.*
Requires-Dist: beautifulsoup4==4.*
Provides-Extra: style
Requires-Dist: black==25.*; extra == "style"
Requires-Dist: isort<8,>=6; extra == "style"
Requires-Dist: autoflake==2.*; extra == "style"
Dynamic: license-file

# 🗺️ image_sitemap

<div align="center">
    <a href="https://vyjava.xyz/dashboard/image/e38067e4-7204-4e80-ad2b-d318b07320ac">
        <img alt="Logo_348x300" src="https://s.vyjava.xyz/files/2025/08-August/07/e38067e4/Logo_348x300_.png">
    </a>
</div>

<hr>

[![PyPI version](https://badge.fury.io/py/image-sitemap.svg)](https://badge.fury.io/py/image-sitemap)
[![Python versions](https://img.shields.io/pypi/pyversions/image-sitemap.svg?logo=python&logoColor=FBE072)](https://badge.fury.io/py/image-sitemap)
[![Downloads](https://static.pepy.tech/badge/image-sitemap/month)](https://pepy.tech/project/image-sitemap)

Image & Website Sitemap Generator - SEO Tool for Better Visibility

Sitemap Images is a Python tool that generates [a specialized XML sitemap file](./example_sitemap_images.xml),
allowing you to submit image URLs to search engines like **Google**, **Bing**, and **Yahoo**.
This tool helps improve image search visibility, driving more traffic to your website and increasing engagement.
To ensure search engines can discover your sitemap, simply add the following line to your **robots.txt** file:
```txt
Sitemap: https://example.com/sitemap-images.xml
```
By including image links in your sitemap and referencing it in your robots.txt file, you can enhance your website's SEO and make it easier for users to find your content.

Google image sitemaps standard description - [Click](https://developers.google.com/search/docs/crawling-indexing/sitemaps/image-sitemaps).

## 📦 Features

- Supports both website and image sitemap generation  
- Easy integration with existing Python projects  
- Helps improve visibility in search engine results  
- Boosts image search performance
- Subdomain filtering with exclusion support
- Configurable crawling depth and query parameters

## ✍️ Examples

1. Set website page and crawling depth, run script
    ```python
    import asyncio
    
    from image_sitemap import Sitemap
    from image_sitemap.instruments.config import Config
      
    images_config = Config(
        max_depth=3,
        accept_subdomains=True,
        excluded_subdomains={"blog", "api", "staging"},  # Exclude specific subdomains
        is_query_enabled=False,
        file_name="sitemap_images.xml",
        header={
           "User-Agent": "ImageSitemap Crawler",
           "Accept": "text/html",
        },
    )
    sitemap_config = Config(
        max_depth=3,
        accept_subdomains=True,
        excluded_subdomains={"blog", "api", "staging"},  # Exclude specific subdomains
        is_query_enabled=False,
        file_name="sitemap.xml",
        header={
           "User-Agent": "ImageSitemap Crawler",
           "Accept": "text/html",
        },
    )
    
    asyncio.run(Sitemap(config=images_config).run_images_sitemap(url="https://rucaptcha.com/"))
    asyncio.run(Sitemap(config=sitemap_config).run_sitemap(url="https://rucaptcha.com/"))
    ```
2. Get sitemap images data in file 
    ```xml
    <?xml version="1.0" encoding="UTF-8"?>
    <urlset
        xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
        <url>
            <loc>https://rucaptcha.com/proxy/residential-proxies</loc>
            <image:image>
                <image:loc>https://rucaptcha.com/dist/web/assets/rotating-residential-proxies-NEVfEVLW.svg</image:loc>
            </image:image>
        </url>
    </urlset>
    ```
   Or just sitemap file
    ```xml
   <?xml version="1.0" encoding="UTF-8"?>
   <urlset
       xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
       <url>
           <loc>https://rucaptcha.com/</loc>
       </url>
       <url>
           <loc>https://rucaptcha.com/h</loc>
       </url>
    </urlset>
    ```

## 🔧 Configuration Options

The `Config` class provides various options to customize sitemap generation:

### Subdomain Control
- `accept_subdomains` (bool): Enable/disable subdomain crawling (default: `True`)
- `excluded_subdomains` (Set[str]): Set of subdomain names to exclude from parsing (default: `set()`)

```python
# Example: Include all subdomains except blog and api
config = Config(
    accept_subdomains=True,
    excluded_subdomains={"blog", "api", "staging", "dev"}
)

# This will include:
# - example.com
# - www.example.com  
# - shop.example.com
# But exclude:
# - blog.example.com
# - api.example.com
# - staging.example.com
# - dev.example.com
```

### Other Options
- `max_depth` (int): Maximum crawling depth (default: `1`)
- `is_query_enabled` (bool): Include URLs with query parameters (default: `True`)
- `file_name` (str): Output sitemap filename (default: `"sitemap_images.xml"`)
- `exclude_file_links` (bool): Filter out file links from sitemap (default: `True`)
- `header` (dict): Custom HTTP headers for requests

You can check examples file here - [Click](./example_sitemap_images.xml).
