Metadata-Version: 2.1
Name: rabbitmq-spider
Version: 0.0.2
Summary: rabbitmq-spider is an open-source tool that helps with web scraping by using RabbitMQ and Scrapy to distribute and scale scraping tasks across multiple instances.
Project-URL: Homepage, https://github.com/nobbbbby/rabbitmq-spider
Project-URL: Bug Tracker, https://github.com/nobbbbby/rabbitmq-spider/issues
Author-email: Nobby Tang <cnadytzar@gmail.com>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Requires-Dist: pika
Requires-Dist: scrapy>2.0
Description-Content-Type: text/markdown

# rabbitmq-spider

rabbitmq-spider is an open-source tool that helps with web scraping by using RabbitMQ and Scrapy to distribute and scale
scraping tasks across multiple instances.

Inpsired by [scrapy-redis](https://github.com/rmax/scrapy-redis).

## Features

1. It only uses RabbitMQ for message generation tasks and does not use RabbitMQ to implement Scrapy’s queue.
2. It can automatically acknowledge (ack) or negatively acknowledge (nack) messages based on the response results.

## Installation 

```shell
pip install rabbitmq_spider
```
## Usage

### 1.Add config values:

```python
RABBITMQ_HOST = 'localhost'
RABBITMQ_PORT = '5672'
RABBITMQ_USERNAME = 'guest'
RABBITMQ_PASSWORD = 'guest'
RABBITMQ_VIRTUAL_HOST = '/'

SPIDER_MIDDLEWARES = {
    'rabbitscrape.middlewares.RabbitmqSpiderMiddleware': 49,
}
```

### 2.Add RabbitMQSpider to your spider

```python
import json

from rabbitmq_spider.spiders import RabbitMQSpider
from scrapy import Request


class YourSpider(RabbitMQSpider):
    """Demo"""
    name = 'demo'
    routing_key = 'demo.queue'

    def make_request_from_data(self, data):
        msg_dict = json.loads(data)
        url = msg_dict['url']

        return Request(url)

    def parse(self, response, **kwargs):
        self.logger.debug(response.status)
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.
