Metadata-Version: 2.1
Name: scrapyu
Version: 0.1.10
Summary: Scrapy utils
Home-page: https://github.com/lin-zone/scrapyu
License: MIT
Keywords: scrapy,pipelines,middlewares,utils,dupefilter
Author: lin-zone
Author-email: z_one10@163.com
Requires-Python: >=3.6,<4.0
Classifier: Framework :: Scrapy
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Chinese (Simplified)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Utilities
Requires-Dist: fake-useragent (==0.1.11)
Requires-Dist: html2text (==2019.9.26)
Requires-Dist: path-py (==12.4.0)
Requires-Dist: pymongo (==3.10.0)
Requires-Dist: pytest-cov (>=2.8,<3.0)
Requires-Dist: redis (==3.3.11)
Requires-Dist: scrapy (>=1.8,<2.0)
Requires-Dist: selenium (==3.141.0)
Requires-Dist: testfixtures (==6.10.3)
Project-URL: Repository, https://github.com/lin-zone/scrapyu
Description-Content-Type: text/markdown

# scrapyu

[![Build Status](https://www.travis-ci.org/lin-zone/scrapyu.svg?branch=master)](https://www.travis-ci.org/lin-zone/scrapyu)
[![codecov](https://codecov.io/gh/lin-zone/scrapyu/branch/master/graph/badge.svg)](https://codecov.io/gh/lin-zone/scrapyu)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/scrapyu?logo=python&logoColor=FBE072)](https://pypi.org/project/scrapyu/)
[![GitHub](https://img.shields.io/github/license/lin-zone/scrapyu)](https://github.com/lin-zone/scrapyu/blob/master/LICENSE)
[![GitHub stars](https://img.shields.io/github/stars/lin-zone/scrapyu?logo=github)](https://github.com/lin-zone/scrapyu)
[![GitHub forks](https://img.shields.io/github/forks/lin-zone/scrapyu?logo=github)](https://github.com/lin-zone/scrapyu)

## UserAgentMiddleware

```python
# settings.py
USERAGENT_TYPE = 'firefox'
DOWNLOADER_MIDDLEWARES = {
   'scrapyu.UserAgentMiddleware': 543,
}
```

## MarkdownPipeline

```python
# settings.py
MARKDOWNS_STORE = 'news'
ITEM_PIPELINES = {
    'scrapyu.MarkdownPipeline': 300,
}
```

```python
# items.py
import scrapy

class MarkdownItem(scrapy.Item):
    html = scrapy.Field()
    filename = scrapy.Field()
```

## FirefoxCookiesMiddleware

```python
# settings.py
GECKODRIVER_PATH = 'geckodriver'
DOWNLOADER_MIDDLEWARES = {
   'scrapyu.FirefoxCookiesMiddleware': 543,
}
```

## MongoDBPipeline

```python
# settings.py
MONGODB_URI = 'mongodb://localhost:27017'
# or
# MONGODB_HOST = 'localhost'
# MONGODB_PORT = 27017
MONGODB_DATABASE = 'scrapyu'
MONGODB_COLLECTION = 'items'
MONGODB_BUFFER_LENGTH = 100
MONGODB_UNIQUE_KEY = 'title name'       # use only if no buffer
# or
# MONGODB_UNIQUE_KEY = ['title', 'name']
# MONGODB_UNIQUE_KEY = ('title', 'name')
ITEM_PIPELINES = {
    'scrapyu.MongoDBPipeline': 300,
}
```

## RedisDupeFilter

```python
# settings.py
DUPEFILTER_CLASS = 'scrapyu.RedisDupeFilter'
REDIS_DUPE_HOST = 'localhost'
REDIS_DUPE_PORT = 6379
REDIS_DUPE_DATABASE = 0
REDIS_DUPE_PASSWORD = 'password'
REDIS_DUPE_KEY = 'requests'
REDIS_DUPE_IGNORE_URL = r'http://scrapytest.org/\d+'
```

