Metadata-Version: 1.1
Name: xpaw
Version: 0.10.2
Summary: Async web scraping framework
Home-page: https://github.com/jadbin/xpaw
Author: jadbin
Author-email: jadbin.com@hotmail.com
License: Apache 2
Description: ====
        xpaw
        ====
        
        .. image:: https://travis-ci.org/jadbin/xpaw.svg?branch=master
            :target: https://travis-ci.org/jadbin/xpaw
        
        .. image:: https://coveralls.io/repos/jadbin/xpaw/badge.svg?branch=master
            :target: https://coveralls.io/github/jadbin/xpaw?branch=master
        
        .. image:: https://img.shields.io/badge/license-Apache 2-blue.svg
            :target: https://github.com/jadbin/xpaw/blob/master/LICENSE
        
        Key Features
        ============
        
        - Provides a web scraping framework used to crawl web pages.
        - Provides data extraction tools used to extract structured data from web pages.
        
        Spider Example
        ==============
        
        以下是我们的一个爬虫类示例，其作用为爬取 `腾讯新闻 <http://news.qq.com/>`_ 首页的"要闻":
        
        .. code-block:: python
        
            from xpaw import Spider, HttpRequest, Selector, run_spider
        
        
            class TencentNewsSpider(Spider):
                def start_requests(self):
                    yield HttpRequest("http://news.qq.com/", callback=self.parse)
        
                def parse(self, response):
                    selector = Selector(response.text)
                    major_news = selector.css("div.major a.linkto").text
                    self.log("Major news:")
                    for i in range(len(major_news)):
                        self.log("%s: %s", i + 1, major_news[i])
        
        
            if __name__ == '__main__':
                run_spider(TencentNewsSpider)
        
        在爬虫类中我们定义了一些方法：
        
        - ``start_requests``: 返回爬虫初始请求。
        - ``parse``: 处理请求得到的页面，这里借助 ``Selector`` 及CSS Selector语法提取到了我们所需的数据。
        
        Documentation
        =============
        
        http://xpaw.readthedocs.io/
        
        Requirements
        ============
        
        - Python >= 3.5.3
        - `aiohttp`_
        - `lxml`_
        - `cssselect`_
        
        .. _aiohttp: https://pypi.python.org/pypi/aiohttp
        .. _lxml: https://pypi.python.org/pypi/lxml
        .. _cssselect: https://pypi.python.org/pypi/cssselect
        
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
