Metadata-Version: 1.1
Name: internetarchive
Version: 0.9.8
Summary: A python interface to archive.org.
Home-page: https://github.com/jjjake/ia-wrapper
Author: Jacob M. Johnson
Author-email: jake@archive.org
License: AGPL 3
Description: A python interface to archive.org
        ---------------------------------
        
        .. image:: https://travis-ci.org/jjjake/internetarchive.svg
            :target: https://travis-ci.org/jjjake/internetarchive
        
        .. image:: https://img.shields.io/pypi/dm/internetarchive.svg
            :target: https://pypi.python.org/pypi/internetarchive
        
        This package installs a CLI tool named ``ia`` for using archive.org from the command-line.
        It also installs the ``internetarchive`` python module for programatic access to archive.org.
        Please report all bugs and issues on `Github <https://github.com/jjjake/ia-wrapper/issues>`__.
        
        .. contents:: Table of Contents:
        
        
        Installation
        ~~~~~~~~~~~~
        
        You can install this module via pip:
        
        ``pip install internetarchive``
        
        Alternatively, you can install a few extra dependencies to help speed things up a bit:
        
        ``pip install "internetarchive[speedups]"``
        
        This will install `ujson <https://pypi.python.org/pypi/ujson>`__ for faster JSON parsing,
        and `gevent <https://pypi.python.org/pypi/gevent>`__ for concurrent downloads.
        
        If you want to install this module globally on your system instead of inside a ``virtualenv``, use sudo:
        
        ``sudo pip install internetarchive``
        
        
        Configuring
        ~~~~~~~~~~~
        You can configure both the ``ia`` command-line tool and the Python interface from the command-line:
        
        .. code:: bash
        
            $ ia configure
        
        You will be prompted to enter your Archive.org login credentials. If authorization is successful a config file will be saved
        on your computer that contains your Archive.org S3 keys for uploading and modifying metadata.
        
        
        Command-Line Usage
        ------------------
        Help is available by typing ``ia --help``. You can also get help on a command: ``ia <command> --help``.
        Available subcommands are ``configure``, ``metadata``, ``upload``, ``download``, ``search``, ``delete``, ``list``, and ``catalog``.
        
        
        Downloading
        ~~~~~~~~~~~
        
        To download the entire `TripDown1905 <https://archive.org/details/TripDown1905>`__ item:
        
        .. code:: bash
        
            $ ia download TripDown1905
        
        ``ia download`` usage examples:
        
        .. code:: bash
        
            #download just the mp4 files using ``--glob``
            $ ia download TripDown1905 --glob='*.mp4'
        
            #download all the mp4 files using ``--formats``:
            $ ia download TripDown1905 --format='512Kb MPEG4'
        
            #download multiple formats from an item:
            $ ia download TripDown1905 --format='512Kb MPEG4' --format='Ogg Video'
        
            #list all the formats in an item:
            $ ia metadata --formats TripDown1905
        
            #download a single file from an item:
            $ ia download TripDown1905 TripDown1905_512kb.mp4
        
            #download multiple files from an item:
            $ ia download TripDown1905 TripDown1905_512kb.mp4 TripDown1905.ogv
        
        
        Uploading
        ~~~~~~~~~
        
        You can use the provided ``ia`` command-line tool to upload items. After `configuring ia <https://github.com/jjjake/internetarchive#configuring>`__,
        you can upload files like so:
        
        .. code:: bash
        
            #upload files:
            $ ia upload <identifier> file1 file2 --metadata="title:foo" --metadata="blah:arg"
        
            #upload from `stdin`:
            $ curl http://dumps.wikimedia.org/kywiki/20130927/kywiki-20130927-pages-logging.xml.gz |
              ia upload <identifier> - --remote-name=kywiki-20130927-pages-logging.xml.gz --metadata="title:Uploaded from stdin."
        
        Metadata
        ~~~~~~~~
        
        You can use the ``ia`` command-line tool to download item metadata in JSON format:
        
        .. code:: bash
        
            $ ia metadata TripDown1905
        
        You can also modify metadata after `configuring ia <https://github.com/jjjake/internetarchive#configuring>`__.
        
        .. code:: bash
        
            $ ia metadata <identifier> --modify="foo:bar" --modify="baz:foooo"
        
        Data Mining
        ~~~~~~~~~~~
        
        IA Mine can be used for data mining Archive.org metadata and search results: `https://github.com/jjjake/iamine <https://github.com/jjjake/iamine>`__.
        
        Searching
        ~~~~~~~~~
        
        You can search using the provided ``ia`` command-line script:
        
        .. code:: bash
        
            $ ia search 'subject:"market street" collection:prelinger'
        
        
        Parallel Downloading
        ~~~~~~~~~~~~~~~~~~~~
        
        If you have the GNU ``parallel`` tool intalled, then you can combine ``ia search`` and ``ia metadata`` to quickly retrieve data for many items in parallel:
        
        .. code:: bash
        
            $ia search 'subject:"market street" collection:prelinger' | parallel -j40 'ia metadata {} > {}_meta.json'
        
        
        
        Python module usage
        -------------------
        
        Below is brief overview of the ``internetarchive`` Python library.
        Please refer to the `API documentation <http://ia-wrapper.readthedocs.org/en/latest/>`__ for more specific details.
        
        Downloading from Python
        ~~~~~~~~~~~~~~~~~~~~~~~
        
        The Internet Archive stores data in
        `items <http://blog.archive.org/2011/03/31/how-archive-org-items-are-structured/>`__.
        You can query the archive using an item identifier:
        
        .. code:: python
        
            >>> from internetarchive import get_item
            >>> item = get_item('stairs')
            >>> print(item.metadata)
        
        Items contains files. You can download the entire item:
        
        .. code:: python
        
            >>> item.download()
        
        or you can download just a particular file:
        
        .. code:: python
        
            >>> f = item.get_file('glogo.png')
            >>> f.download() #writes to disk
            >>> f.download('/foo/bar/some_other_name.png')
        
        You can iterate over files:
        
        .. code:: python
        
            >>> for f in item.iter_files():
            ...     print(f.name, f.sha1)
        
        Uploading from Python
        ~~~~~~~~~~~~~~~~~~~~~
        
        You can use the IA's S3-like interface to upload files to an item after
        `configuring the internetarchive library <https://github.com/jjjake/internetarchive#configuring>`__.
        
        .. code:: python
        
            >>> from internetarchive import get_item
            >>> item = get_item('new_identifier')
            >>> md = dict(mediatype='image', creator='Jake Johnson')
            >>> item.upload('/path/to/image.jpg', metadata=md)
        
        Item-level metadata must be supplied with the first file uploaded to an
        item.
        
        You can upload additional files to an existing item:
        
        .. code:: python
        
            >>> item = internetarchive.Item('existing_identifier')
            >>> item.upload(['/path/to/image2.jpg', '/path/to/image3.jpg'])
        
        You can also upload file-like objects:
        
        .. code:: python
        
            >>> import StringIO
            >>> fh = StringIO.StringIO('hello world')
            >>> fh.name = 'hello_world.txt'
            >>> item.upload(fh)
        
        
        Modifying Metadata from Python
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        You can modify metadata for existing items, using the ``item.modify_metadata()`` function. This uses the `IA Metadata
        API <http://blog.archive.org/2013/07/04/metadata-api/>`__ under the hood and requires your IAS3 credentials. So, once
        again make sure you have the `internetarchive library configured <https://github.com/jjjake/internetarchive#configuring>`__.
        
        .. code:: python
        
            >>> from internetarchive import get_item
            >>> item = get_item('my_identifier')
            >>> md = dict(blah='one', foo=['two', 'three'])
            >>> item.modify_metadata(md)
        
        
        Searching from Python
        ~~~~~~~~~~~~~~~~~~~~~
        
        You can search for items using the `archive.org advanced search
        engine <https://archive.org/advancedsearch.php>`__:
        
        .. code:: python
        
            >>> from internetarchive import search_items
            >>> search = search_items('collection:nasa')
            >>> print(search.num_found)
            186911
        
        You can iterate over your results:
        
        .. code:: python
        
            >>> for result in search:
            ...     print(result['identifier'])
        
        
        .. :changelog:
        
        Release History
        ---------------
        
        0.9.8 (2015-11-09)
        ++++++++++++++++++
        
        **Bugfixes**
        
        - Fixed `ia help` bug.
        - Fixed bug in `File.download()` where connection errors weren't being caught/retried correctly.
        
        0.9.7 (2015-11-05)
        ++++++++++++++++++
        
        **Bugfixes**
        
        - Cleanup partially downloaded files when `download()` fails.
        
        **Features and Improvements**
        
        - Added `--format` option to `ia delete`.
        - Refactored `download()` and `ia download` to behave more like rsync. Files are now clobbered by default,
          `ignore_existing` and `--ignore-existing` now skip over files already downloaded without making a request.
        - Added retry support to `download()` and `ia download`.
        - Added `files` kwarg to `Item.download()` for downloading specific files.
        - Added `ignore_errors` option to `File.download()` for ignoring (but logging) exceptions.
        - Added default timeouts to metadata and download requests.
        - Less verbose output in `ia download` by default, use `ia download --verbose` for old style output.
        
        0.9.6 (2015-10-12)
        ++++++++++++++++++
        
        **Bugfixes**
        
        - Removed sync-db features for now, as lazytaable is not playing nicely with setup.py right now.
        
        0.9.5 (2015-10-12)
        ++++++++++++++++++
        
        **Features and Improvements**
        
        - Added skip based on mtime and length if no other clobber/skip options specified in `download()` and `ia download`.
        
        0.9.4 (2015-10-01)
        ++++++++++++++++++
        
        **Features and Improvements**
        
        - Added `internetarchive.api.get_username()` for retrieving a username with an S3 key-pair.
        - Added ability to sync downloads via an sqlite database.
        
        0.9.3 (2015-09-28)
        ++++++++++++++++++
        
        **Features and Improvements**
        
        - Added ability to download items from an itemlist or search query in `ia download`.
        - Made `ia configure` Python 3 compatabile.
        
        **Bugfixes**
        
        - Fixed bug in `ia upload` where uploading an item with more than one collection specified caused the collection check to fail.
        
        
        0.9.2 (2015-08-17)
        ++++++++++++++++++
        
        **Bugfixes**
        
        - Added error message for failed `ia configure` calls due to invalid creds. 
        
        
        0.9.1 (2015-08-13)
        ++++++++++++++++++
        
        **Bugfixes**
        
        - Updated docopt to v0.6.2 and PyYAML to v3.11.
        - Updated setup.py to automatically pull version from `__init__`.
        
        
        0.8.5 (2015-07-13)
        ++++++++++++++++++
        
        **Bugfixes**
        
        - Fixed UnicodeEncodeError in `ia metadata --append`.
        
        **Features and Improvements**
        
        - Added configuration documentation to readme.
        - Updated requests to v2.7.0
        
        0.8.4 (2015-06-18)
        ++++++++++++++++++
        
        **Features and Improvements**
        
        - Added check to `ia upload` to see if the collection being uploaded to exists.
          Also added an option to override this check.
        
        0.8.3 (2015-05-18)
        ++++++++++++++++++
        
        **Features and Improvements**
        
        - Fixed append to work like a standard metadata update if the metadata field
          does not yet exist for the given item.
        
        0.8.0 2015-03-09
        ++++++++++++++++
        
        **Bugfixes**
        
        - Encode filenames in upload URLs.
        
        0.7.9 (2015-01-26)
        ++++++++++++++++++
        
        **Bugfixes**
        
        - Fixed bug in `internetarchive.config.get_auth_config` (i.e. `ia configure`)
          where logged-in cookies returned expired within hours. Cookies should now be
          valid for about one year.
        
        0.7.8 (2014-12-23)
        ++++++++++++++++++
        
        - Output error message when downloading non-existing files in `ia download` rather
          than raising Python exception.
        - Fixed IOError in `ia search` when using `head`, `tail`, etc..
        - Simplified `ia search` to output only JSON, rather than doing any special
          formatting.
        - Added experimental support for creating pex binaries of ia in `Makefile`. 
        
        0.7.7 (2014-12-17)
        ++++++++++++++++++
        
        - Simplified `ia configure`. It now only asks for Archive.org email/password and
          automatically adds S3 keys and Archive.org cookies to config.
          See `internetarchive.config.get_auth_config()`.
        
        0.7.6 (2014-12-17)
        ++++++++++++++++++
        
        - Write metadata to stdout rather than stderr in `ia mine`.
        - Added options to search archive.org/v2.
        - Added destdir option to download files/itemdirs to a given destination dir.
        
        0.7.5 (2014-10-08)
        ++++++++++++++++++
        
        - Fixed typo.
        
        0.7.4 (2014-10-08)
        ++++++++++++++++++
        
        - Fixed missing "import" typo in `internetarchive.iacli.ia_upload`.
        
        0.7.3 (2014-10-08)
        ++++++++++++++++++
        
        - Added progress bar to `ia mine`.
        - Fixed unicode metadata support for `upload()`.
        
        0.7.2 (2014-09-16)
        ++++++++++++++++++
        
        - Suppress `KeyboardInterrupt` exceptions and exit with status code 130.
        - Added ability to skip downloading files based on checksum in `ia download`,
          `Item.download()`, and `File.download()`.
        - `ia download` is now verbose by default. Output can be suppressed with the `--quiet`
          flag.
        - Added an option to not download into item directories, but rather the current working
          directory (i.e. `ia download --no-directories <id>`).
        - Added/fixed support for modifying different metadata targets (i.e. files/logo.jpg).
        
        0.7.1 (2014-08-25)
        ++++++++++++++++++
        
        - Added `Item.s3_is_overloaded()` method for S3 status check. This method is now used on
          retries in the upload method now as well. This will avoid uploading any data if a 503
          is expected. If a 503 is still returned, retries are attempted.
        - Added `--status-check` option to `ia upload` for S3 status check.
        - Added `--source` parameter to `ia list` for returning files matching IA source (i.e. 
          original, derivative, metadata, etc.).
        - Added support to `ia upload` for setting remote-name if only a single file is being
          uploaded.
        - Derive tasks are now only queued after the last file has been uploaded.
        - File URLs are now quoted in `File` objects, for downloading files with specail
          characters in their filenames
        
        0.7.0 (2014-07-23)
        ++++++++++++++++++
        
        - Added support for retry on S3 503 SlowDown errors.
        
        0.6.9 (2014-07-15)
        ++++++++++++++++++
        
        - Added support for \n and \r characters in upload headers.
        - Added support for reading filenames from stdin when using the `ia delete` command.
        
        0.6.8 (2014-07-11)
        ++++++++++++++++++
        
        - The delete `ia` subcommand is now verbose by default.
        - Added glob support to the delete `ia` subcommand (i.e. `ia delete --glob='*jpg'`).
        - Changed indexed metadata elements to clobber values instead of insert.
        - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are now deprecated.
          IAS3_ACCESS_KEY and IAS3_SECRET_KEY must be used if setting IAS3
          keys via environment variables.
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.4
