Metadata-Version: 2.4
Name: detextive
Version: 2.0
Summary: Detects textual content.
Project-URL: Homepage, https://github.com/emcd/python-detextive
Project-URL: Documentation, https://emcd.github.io/python-detextive
Project-URL: Download, https://pypi.org/project/detextive/#files
Project-URL: Source Code, https://github.com/emcd/python-detextive
Project-URL: Issue Tracker, https://github.com/emcd/python-detextive/issues
Author-email: Eric McDonald <emcd@users.noreply.github.com>
License-Expression: Apache-2.0
License-File: LICENSE.txt
Keywords: MIME,charset,detection,newline,text
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Software Development
Requires-Python: >=3.10
Requires-Dist: absence~=1.1
Requires-Dist: accretive~=4.1
Requires-Dist: chardet
Requires-Dist: dynadoc~=1.4
Requires-Dist: frigid~=4.1
Requires-Dist: puremagic
Requires-Dist: typing-extensions
Provides-Extra: all
Requires-Dist: charset-normalizer; extra == 'all'
Requires-Dist: python-magic-bin; (sys_platform == 'win32') and extra == 'all'
Requires-Dist: python-magic; (sys_platform != 'win32') and extra == 'all'
Provides-Extra: charset-normalizer
Requires-Dist: charset-normalizer; extra == 'charset-normalizer'
Provides-Extra: python-magic
Requires-Dist: python-magic-bin; (sys_platform == 'win32') and extra == 'python-magic'
Requires-Dist: python-magic; (sys_platform != 'win32') and extra == 'python-magic'
Description-Content-Type: text/x-rst

.. vim: set fileencoding=utf-8:
.. -*- coding: utf-8 -*-
.. +--------------------------------------------------------------------------+
   |                                                                          |
   | Licensed under the Apache License, Version 2.0 (the "License");          |
   | you may not use this file except in compliance with the License.         |
   | You may obtain a copy of the License at                                  |
   |                                                                          |
   |     http://www.apache.org/licenses/LICENSE-2.0                           |
   |                                                                          |
   | Unless required by applicable law or agreed to in writing, software      |
   | distributed under the License is distributed on an "AS IS" BASIS,        |
   | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
   | See the License for the specific language governing permissions and      |
   | limitations under the License.                                           |
   |                                                                          |
   +--------------------------------------------------------------------------+

*******************************************************************************
                                   detextive
*******************************************************************************

.. image:: https://img.shields.io/pypi/v/detextive
   :alt: Package Version
   :target: https://pypi.org/project/detextive/

.. image:: https://img.shields.io/pypi/status/detextive
   :alt: PyPI - Status
   :target: https://pypi.org/project/detextive/

.. image:: https://github.com/emcd/python-detextive/actions/workflows/tester.yaml/badge.svg?branch=master&event=push
   :alt: Tests Status
   :target: https://github.com/emcd/python-detextive/actions/workflows/tester.yaml

.. image:: https://emcd.github.io/python-detextive/coverage.svg
   :alt: Code Coverage Percentage
   :target: https://github.com/emcd/python-detextive/actions/workflows/tester.yaml

.. image:: https://img.shields.io/github/license/emcd/python-detextive
   :alt: Project License
   :target: https://github.com/emcd/python-detextive/blob/master/LICENSE.txt

.. image:: https://img.shields.io/pypi/pyversions/detextive
   :alt: Python Versions
   :target: https://pypi.org/project/detextive/


🕵️ A Python library which provides consolidated text detection
capabilities for reliable content analysis. Offers MIME type detection,
character set detection, and line separator processing.

Key Features ⭐
===============================================================================

🔍 **MIME Type Detection**
  Intelligent content-based detection using magic bytes with file extension
  fallback for comprehensive format identification.

📝 **Character Encoding Detection**
  Statistical analysis with UTF-8 optimization and validation through decode
  operations for reliable text processing.

📄 **Line Separator Processing**
  Cross-platform line ending detection and normalization supporting CR, LF,
  and CRLF formats with mixed-content handling.

✅ **Textual Content Validation**
  Smart classification of MIME types and content reasonableness assessment
  using control character and printability heuristics.


Installation 📦
===============================================================================

Method: Install Python Package
-------------------------------------------------------------------------------

Install via `uv <https://github.com/astral-sh/uv/blob/main/README.md>`_ ``pip``
command:

::

    uv pip install detextive

Or, install via ``pip``:

::

    pip install detextive


Examples 💡
===============================================================================

Basic Usage
-------------------------------------------------------------------------------

**MIME Type and Charset Detection**:

Load your content as bytes:

.. code-block:: python

    import detextive

    with open( 'document.txt', 'rb' ) as file:
        content = file.read( )

You can detect MIME type and charset individually:

.. code-block:: python

    mimetype = detextive.detect_mimetype( content, location = 'document.txt' )
    charset = detextive.detect_charset( content )

Or use combined inference for better accuracy:

.. code-block:: python

    mimetype, charset = detextive.infer_mimetype_charset(
        content, location = 'document.txt' )
    print( "Detected: {mimetype} with {charset} encoding".format(
        mimetype = mimetype, charset = charset ) )

**Line Separator Processing**:

Detect line separators in mixed content:

.. code-block:: python

    import detextive

    content = 'Line 1\r\nLine 2\rLine 3\n'
    separator = detextive.LineSeparators.detect_bytes( content.encode( ) )

Normalize line separators to Python standard:

.. code-block:: python

    normalized = detextive.LineSeparators.normalize_universal( content )

Convert to platform-specific line separators:

.. code-block:: python

    native = detextive.LineSeparators.CRLF.nativize( normalized )

**Content Classification**:

Check if MIME types represent textual content:

.. code-block:: python

    import detextive

    detextive.is_textual_mimetype( 'application/json' )  # True
    detextive.is_textual_mimetype( 'image/jpeg' )        # False

Validate that decoded text content is reasonable:

.. code-block:: python

    text = "Hello world!"
    detextive.is_valid_text( text )      # True

Binary data that might decode as text but isn't valid fails validation:

.. code-block:: python

    binary_as_text = "Config file\x00\x00\x00data"
    detextive.is_valid_text( binary_as_text )  # False

**High-Level Decoding**:

For complete bytes-to-text processing with automatic charset detection and validation:

.. code-block:: python

    import detextive

    with open( 'document.txt', 'rb' ) as file:
        content = file.read( )

    text = detextive.decode( content, location = 'document.txt' )
    print( f"Decoded text: {text}" )


Contribution 🤝
===============================================================================

Contribution to this project is welcome! However, it must follow the `code of
conduct
<https://emcd.github.io/python-project-common/stable/sphinx-html/common/conduct.html>`_
for the project.

Please file bug reports and feature requests in the `issue tracker
<https://github.com/emcd/python-detextive/issues>`_ or submit `pull
requests <https://github.com/emcd/python-detextive/pulls>`_ to
improve the source code or documentation.

For development guidance and standards, please see the `development guide
<https://emcd.github.io/python-detextive/stable/sphinx-html/contribution.html#development>`_.


Additional Indicia
===============================================================================

.. image:: https://img.shields.io/github/last-commit/emcd/python-detextive
   :alt: GitHub last commit
   :target: https://github.com/emcd/python-detextive

.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/copier-org/copier/master/img/badge/badge-grayscale-inverted-border-orange.json
   :alt: Copier
   :target: https://github.com/copier-org/copier

.. image:: https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg
   :alt: Hatch
   :target: https://github.com/pypa/hatch

.. image:: https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit
   :alt: pre-commit
   :target: https://github.com/pre-commit/pre-commit

.. image:: https://microsoft.github.io/pyright/img/pyright_badge.svg
   :alt: Pyright
   :target: https://microsoft.github.io/pyright

.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
   :alt: Ruff
   :target: https://github.com/astral-sh/ruff

.. image:: https://img.shields.io/pypi/implementation/detextive
   :alt: PyPI - Implementation
   :target: https://pypi.org/project/detextive/

.. image:: https://img.shields.io/pypi/wheel/detextive
   :alt: PyPI - Wheel
   :target: https://pypi.org/project/detextive/


Other Projects by This Author 🌟
===============================================================================


* `python-absence <https://github.com/emcd/python-absence>`_ (`absence <https://pypi.org/project/absence/>`_ on PyPI)

  🕳️ A Python library package which provides a **sentinel for absent values** - a falsey, immutable singleton that represents the absence of a value in contexts where ``None`` or ``False`` may be valid values.
* `python-accretive <https://github.com/emcd/python-accretive>`_ (`accretive <https://pypi.org/project/accretive/>`_ on PyPI)

  🌌 A Python library package which provides **accretive data structures** - collections which can grow but never shrink.
* `python-classcore <https://github.com/emcd/python-classcore>`_ (`classcore <https://pypi.org/project/classcore/>`_ on PyPI)

  🏭 A Python library package which provides **foundational class factories and decorators** for providing classes with attributes immutability and concealment and other custom behaviors.
* `python-dynadoc <https://github.com/emcd/python-dynadoc>`_ (`dynadoc <https://pypi.org/project/dynadoc/>`_ on PyPI)

  📝 A Python library package which bridges the gap between **rich annotations** and **automatic documentation generation** with configurable renderers and support for reusable fragments.
* `python-falsifier <https://github.com/emcd/python-falsifier>`_ (`falsifier <https://pypi.org/project/falsifier/>`_ on PyPI)

  🎭 A very simple Python library package which provides a **base class for falsey objects** - objects that evaluate to ``False`` in boolean contexts.
* `python-frigid <https://github.com/emcd/python-frigid>`_ (`frigid <https://pypi.org/project/frigid/>`_ on PyPI)

  🔒 A Python library package which provides **immutable data structures** - collections which cannot be modified after creation.
* `python-icecream-truck <https://github.com/emcd/python-icecream-truck>`_ (`icecream-truck <https://pypi.org/project/icecream-truck/>`_ on PyPI)

  🍦 **Flavorful Debugging** - A Python library which enhances the powerful and well-known ``icecream`` package with flavored traces, configuration hierarchies, customized outputs, ready-made recipes, and more.
* `python-mimeogram <https://github.com/emcd/python-mimeogram>`_ (`mimeogram <https://pypi.org/project/mimeogram/>`_ on PyPI)

  📨 A command-line tool for **exchanging collections of files with Large Language Models** - bundle multiple files into a single clipboard-ready document while preserving directory structure and metadata... good for code reviews, project sharing, and LLM interactions.
