Metadata-Version: 1.1
Name: tika-app
Version: 1.0.0.dev0
Summary: Python client for Apache Tika App
Home-page: https://github.com/fedelemantuano/tika-app-python
Author: Fedele Mantuano
Author-email: mantuano.fedele@gmail.com
License: Apache License, Version 2.0
Description: tika-app-python
        ===============
        
        Overview
        --------
        
        tika-app-python is a wrapper for `Apache Tika App`_.
        
        Apache 2 Open Source License
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        tika-app-python can be downloaded, used, and modified free of charge. It
        is available under the Apache 2 license.
        
        Authors
        -------
        
        Main Author
        ~~~~~~~~~~~
        
        Fedele Mantuano (**Twitter**:
        [@fedelemantuano](https://twitter.com/fedelemantuano))
        
        Installation
        ------------
        
        Clone repository
        
        ::
        
            git clone https://github.com/fedelemantuano/tika-app-python.git
        
        and install tika-app-python with ``setup.py``:
        
        ::
        
            cd tika-app-python
        
            python setup.py install
        
        or use ``pip``:
        
        ::
        
            pip install tika-app
        
        Usage
        -----
        
        Import ``TikaApp`` class:
        
        ::
        
            from tikapp import TikaApp
        
            tika_client = TikaApp(file_jar="/opt/tika/tika-app-1.14.jar")
        
        For get **content type**:
        
        ::
        
            tika_client.detect_content_type("your_file")
        
        For detect **language**:
        
        ::
        
            tika_client.detect_language("your_file")
        
        For detect **all metadata and content**:
        
        ::
        
            tika_client.extract_all_content("your_file")
        
        For detect **only content**:
        
        ::
        
            tika_client.extract_only_content("your_file")
        
        If you want to use payload in base64, you can use the same methods with
        ``payload`` argument:
        
        ::
        
            tika_client.detect_content_type(payload="base64_payload")
            tika_client.detect_language(payload="base64_payload")
            tika_client.extract_all_content(payload="base64_payload")
            tika_client.extract_only_content(payload="base64_payload")
        
        Usage from command-line
        -----------------------
        
        If you installed tika-app-python with ``pip`` or ``setup.py`` you can
        use it with command-line. To use tika-app-python you should submit the
        Apache Tika app JAR. You can: - leave the default value:
        ``/opt/tika/tika-app-1.13.jar`` - set the enviroment value
        ``TIKA_APP_JAR`` - use ``--jar`` switch
        
        The last one overwrite all the others.
        
        These are all swithes:
        
        ::
        
            usage: tikapp [-h] (-f FILE | -p PAYLOAD) [-j JAR] [-d] [-t] [-l] [-a]
                               [-v]
        
            Wrapper for Apache Tika App.
        
            optional arguments:
              -h, --help            show this help message and exit
              -f FILE, --file FILE  File to submit (default: None)
              -p PAYLOAD, --payload PAYLOAD
                                    Base64 payload to submit (default: None)
              -j JAR, --jar JAR     Apache Tika app JAR (default: None)
              -d, --detect          Detect document type (default: False)
              -t, --text            Output plain text content (default: False)
              -l, --language        Output only language (default: False)
              -a, --all             Output metadata and content from all embedded files
                                    (default: False)
              -v, --version         show program's version number and exit
        
        Example:
        
        .. code:: shell
        
            $ tikapp -f example_file -a
        
        Performance tests
        -----------------
        
        These are the results of performance tests in `tests`_ folder:
        
        ::
        
            (Python 2)
            tika_content_type()             0.704840 sec
            tika_detect_language()          1.592066 sec
            magic_content_type()            0.000215 sec
            tika_extract_all_content()      0.816366 sec
            tika_extract_only_content()     0.788667 sec
        
            (Python 3)
            tika_content_type()             0.698357 sec
            tika_detect_language()          1.593452 sec
            magic_content_type()            0.000226 sec
            tika_extract_all_content()      0.785915 sec
            tika_extract_only_content()     0.766517 sec
        
        .. _Apache Tika App: https://tika.apache.org/
        .. _tests: https://github.com/fedelemantuano/tika-app-python/tree/develop/tests
Keywords: tika,apache,toolkit
Platform: Linux
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.0
Classifier: Programming Language :: Python :: 3.1
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
