Metadata-Version: 2.4
Name: hcf-backend
Version: 0.6.1
Summary: ScrapyCloud HubStorage frontier backend for Frontera
Home-page: https://github.com/scrapinghub/hcf-backend
Maintainer: Scrapinghub
License: BSD
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: frontera<0.8,>=0.7.2
Requires-Dist: humanize>=0.5.1
Requires-Dist: requests>=2.18.4
Requires-Dist: retrying>=1.3.3
Requires-Dist: scrapinghub>=2.3.1
Requires-Dist: shub-workflow>=1.10.20
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: maintainer
Dynamic: requires-dist
Dynamic: summary

# HCF (HubStorage Crawl Frontier) Backend for Frontera

When used with scrapy, use it with Scrapy Scheduler provided by [scrapy-frontera](https://github.com/scrapinghub/scrapy-frontera). Scrapy scheduler provided
by [Frontera](https://github.com/scrapinghub/frontera) *is not* supported. `scrapy-frontera` is a scrapy scheduler which allows to use frontera backends,
like the present one, with scrapy projects.

See specific usage instructions at module and class docstrings at [backend.py](https://github.com/scrapinghub/hcf-backend/blob/master/hcf_backend/backend.py).
Some examples of usage can be seen in the [scrapy-frontera README](https://github.com/scrapinghub/scrapy-frontera/blob/master/README.rst).

A complete tutorial for using `hcf-backend` with ScrapyCloud workflows is available at
[shub-workflow Tutorial: Managing Hubstorage Crawl Frontiers](https://github.com/scrapinghub/shub-workflow/wiki/Managing-Hubstorage-Crawl-Frontiers). `shub-workflow` is a framework for
defining workflows of spiders and scripts running over ScrapyCloud. This is a strongly recommended lecture, because it documents the integration of different tools
which together provide the best benefit.

Package also provides a convenient command line tool for hubstorage frontier handling and manipulation:
[hcfpal.py](https://github.com/scrapinghub/hcf-backend/blob/master/hcf_backend/utils/hcfpal.py). It supports dumping, count, deletion, moving, listing, etc.
See command line help for usage.

Another provided tool is [crawlmanager.py](https://github.com/scrapinghub/hcf-backend/blob/master/hcf_backend/utils/crawlmanager.py). It facilitates the
scheduling of consumer spider jobs. Examples of usage are also available in the already mentioned `shub-workflow` Tutorial.

Installation
============

`pip install hcf-backend`


Development environment setup
=============================

For hcf-backend developers, Pipfile files are provided for a development environment.

Run:

```
$ pipenv install --dev
$ pipenv shell
$ cp .envtemplate .env
```

and edit .env accordingly
