Metadata-Version: 1.1
Name: feed-seeker
Version: 0.0.1
Summary: Extract rss, atom, and other feeds from webpages
Home-page: https://github.com/mitmedialab/feed_seeker
Author: Colin Carroll
Author-email: ccarroll@mit.edu
License: MIT
Description-Content-Type: UNKNOWN
Description: ===========
        Feed Seeker
        ===========
        *It slant rhymes with "heat seeker"*
        
        |Build Status| |Coverage|
        
        A library for finding atom, rss, rdf, and xml feeds from web pages. Produced at the `mediacloud <https://mediacloud.org>`_ project. An incremental improvement over `feedfinder2 <https://github.com/dfm/feedfinder2>`_, which was itself based on `feedfinder <http://www.aaronsw.com/2002/feedfinder/>`_, written by Mark Pilgrim, and maintained by Aaron Swartz until his untimely death. 
        
        
        Installation
        ------------
        
        The library is available on `PyPI <https://pypi.org/project/feed_seeker/>`_:
        
        .. code-block:: bash
        
            pip install feed_seeker
        
        Quickstart
        ----------
        By default, the library uses :code:`requests` to grab html and inspect it and find the most
        likely feed url:
        
        .. code-block:: python
        
            from feed_seeker import find_feed_url
        
            >>> find_feed_url('https://github.com/mitmedialab/feed_seeker') 
            'https://github.com/mitmedialab/feed_seeker/commits/master.atom'
        
        
        To do a more thorough search, use :code:`generate_feed_urls`, which returns more likely candidates first.
        
        .. code-block:: python
        
            from feed_seeker import generate_feed_urls
            
            >>> for url in generate_feed_urls('https://xkcd.com'):
            ...     print(url)
            ... 
            https://xkcd.com/atom.xml
            https://xkcd.com/rss.xml
        
        
        For the most thorough search, add a :code:`spider` argument to do depth-first spidering of urls on the same hostname. Note the below call takes nearly four minutes, compared to 0.5 seconds for :code:`find_feed_url`.
        
        
        .. code-block:: python
        
            >>> for url in generate_feed_urls('https://github.com/mitmedialab/feed_seeker', spider=1):
            ...     print(url)
            ... 
        	https://github.com/mitmedialab/feed_seeker/commits/master.atom,
        	https://github.com/mitmedialab/feed_seeker/commits/95cf320796c487df8b70f9c42281d8f26452cc31.atom,
        	https://github.com/mitmedialab/feed_seeker/commits/3e93490cb91f7652325c2fe41ef29a5be4558d6a.atom,
        	https://github.com/mitmedialab/feed_seeker/commits/659311b8853c4c4a67e3b4bc67a78461d825a064.atom,
        	https://github.com/mitmedialab/feed_seeker/commits/a8f7b86eac2cedd9209ac5d2ddcceb293d2404c9.atom,
        	https://github.com/index.atom,
        	https://github.com/articles.atom,
        	https://github.com/dfm/feedfinder2/commits/master.atom,
        	https://github.com/blog.atom,
        	https://github.com/blog/all.atom,
        	https://github.com/blog/broadcasts.atom,
        	https://github.com/ColCarroll.atom
                                                          
        In a hurry?
        -----------
        
        If you have a long list of urls, you might want to set a timeout with :code:`max_time`:
        
        .. code-block:: python
        
        	>>> for url in ('https://httpstat.us/200?sleep=5000', 'https://github.com/mitmedialab/feed_seeker'):
        	   ...     try:
        	   ...         print('found feed:\t{}'.format(find_feed_url(url, max_time=3)))
        	   ...     except TimeoutError:
        	   ...         print('skipping {}'.format(url))
        	   skipping https://httpstat.us/200?sleep=5000
               found feed:	https://github.com/mitmedialab/feed_seeker/commits/master.atom
        
        
        Differences with :code:`feedfinder2`
        ====================================
        The biggest difference is that all functions are implemented as generators, and are evaluated lazily. Candidate feed links are actually accessed and inspected to determine whether or not they are a feed, which can be quite time consuming. We expose a function to find the most likely feed link, and another to lazily generate links in rough order from most prominent to least.
        
        There are also a few more heuristics based on our experience at `mediacloud <https://mediacloud.org>`_.
        
        .. |Build Status| image:: https://travis-ci.org/mitmedialab/feed_seeker.png?branch=master
           :target: https://travis-ci.org/mitmedialab/feed_seeker
        .. |Coverage| image:: https://coveralls.io/repos/github/mitmedialab/feed_seeker/badge.svg?branch=master
           :target: https://coveralls.io/github/mitmedialab/feed_seeker?branch=master
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
