Metadata-Version: 1.0
Name: YoullDownload
Version: 0.3
Summary: Grab from a remote site page all resources that a browser will probably download visiting the page
Home-page: https://github.com/RedTurtle/YoullDownload
Author: RedTurtle Technology
Author-email: sviluppoplone@redturtle.it
License: GPL
Description: Quick info
        ==========
        
        Let say you need to use the HTTP load testing and benchmarking utility `siege`__ on a web page
        and you also want to use the ``--internet`` option, to simulate at best the behavior of a web browser.
        
        __ http://www.joedog.org/siege-home/
        
        When a web browser load a page, it also load all the resources inside that page:
        
        * Images
        * JavaScript files
        * CSS
        * Media resources
        
        So you need a list of all URLs taken from that page.
        
        This utility (its name mean "**You Will Download**") will simply create this list for you.
        
        You simply need to redirect the utility output to a file, then use also the siege ``--file`` option.
        
        Usage
        -----
        
        ::
        
            $ youlldownload http://host.com/section/page
        
        Using with siege::
        
            $ youlldownload http://host.com/section/page > list.txt
            $ siege -i -f list.txt [other options]
        
        Taken resouces
        --------------
        
        * from ``script`` tags we'll take the ``src`` URL
        * from ``link`` tags with ``rel`` equals to ``stylesheet`` we'll take the ``href`` url
        * from ``img`` tags we'll take the ``src`` URL
        * from ``object`` tags we'll take the ``data`` URL
        * from ``embed`` tags we'll take the ``src`` URL
        * from ``style`` tags we'll take the URL inside if the tag is using an "*@import url*"
          directive
        * from ``iframe`` tags we'll take the ``src`` URL
        * from ``source`` tags inside ``video`` we'll take the ``src`` URL
        
        Also: CSS sources are deeply analyzed for found additional resources inside them
        (like background images, fonts, ...).
        
        Authors
        =======
        
        This product was developed by RedTurtle Technology team.
        
        .. image:: http://www.redturtle.it/redturtle_banner.png
           :alt: RedTurtle Technology Site
           :target: http://www.redturtle.it/
        
        Changelog
        =========
        
        0.3 (2015-05-28)
        ----------------
        
        - Remove duplicated URLs from final report
          [keul] 
        - Do not include same version of an URL with anchors
          [keul]
        - Inspect also resources from CSS (backgroun images, fonts, ...)
          [keul]
        - Script was not properly working outside homepage if a "base" tag
          was not provided
          [keul]
        
        0.2 (2014-04-02)
        ----------------
        
        - Added support for ``src`` attribute of ``iframe`` tag
          [keul]
        - Added support for ``src`` attribute of ``source`` tag
          (HTML 5 video element)
          [keul]
        - Do not break if ``base`` tag is not present
          [keul]
        
        0.1 (2013-01-30)
        ----------------
        
        - initial release
Keywords: crawler log web
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
Classifier: Topic :: Utilities
Classifier: Topic :: Internet :: WWW/HTTP
