Metadata-Version: 2.4
Name: detextive
Version: 3.1
Summary: Detects textual content.
Project-URL: Homepage, https://github.com/emcd/python-detextive
Project-URL: Documentation, https://emcd.github.io/python-detextive
Project-URL: Download, https://pypi.org/project/detextive/#files
Project-URL: Source Code, https://github.com/emcd/python-detextive
Project-URL: Issue Tracker, https://github.com/emcd/python-detextive/issues
Author-email: Eric McDonald <emcd@users.noreply.github.com>
License-Expression: Apache-2.0
License-File: LICENSE.txt
Keywords: MIME,charset,detection,newline,text
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Software Development
Requires-Python: >=3.10
Requires-Dist: absence~=1.1
Requires-Dist: accretive~=4.1
Requires-Dist: chardet
Requires-Dist: dynadoc~=1.4
Requires-Dist: frigid~=4.2
Requires-Dist: puremagic
Requires-Dist: typing-extensions
Provides-Extra: all
Requires-Dist: charset-normalizer; extra == 'all'
Requires-Dist: python-magic-bin; (sys_platform == 'win32') and extra == 'all'
Requires-Dist: python-magic; (sys_platform != 'win32') and extra == 'all'
Provides-Extra: charset-normalizer
Requires-Dist: charset-normalizer; extra == 'charset-normalizer'
Provides-Extra: python-magic
Requires-Dist: python-magic-bin; (sys_platform == 'win32') and extra == 'python-magic'
Requires-Dist: python-magic; (sys_platform != 'win32') and extra == 'python-magic'
Description-Content-Type: text/x-rst

.. vim: set fileencoding=utf-8:
.. -*- coding: utf-8 -*-
.. +--------------------------------------------------------------------------+
   |                                                                          |
   | Licensed under the Apache License, Version 2.0 (the "License");          |
   | you may not use this file except in compliance with the License.         |
   | You may obtain a copy of the License at                                  |
   |                                                                          |
   |     http://www.apache.org/licenses/LICENSE-2.0                           |
   |                                                                          |
   | Unless required by applicable law or agreed to in writing, software      |
   | distributed under the License is distributed on an "AS IS" BASIS,        |
   | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
   | See the License for the specific language governing permissions and      |
   | limitations under the License.                                           |
   |                                                                          |
   +--------------------------------------------------------------------------+

*******************************************************************************
                                   detextive
*******************************************************************************

.. image:: https://img.shields.io/pypi/v/detextive
   :alt: Package Version
   :target: https://pypi.org/project/detextive/

.. image:: https://img.shields.io/pypi/status/detextive
   :alt: PyPI - Status
   :target: https://pypi.org/project/detextive/

.. image:: https://github.com/emcd/python-detextive/actions/workflows/tester.yaml/badge.svg?branch=master&event=push
   :alt: Tests Status
   :target: https://github.com/emcd/python-detextive/actions/workflows/tester.yaml

.. image:: https://emcd.github.io/python-detextive/coverage.svg
   :alt: Code Coverage Percentage
   :target: https://github.com/emcd/python-detextive/actions/workflows/tester.yaml

.. image:: https://img.shields.io/github/license/emcd/python-detextive
   :alt: Project License
   :target: https://github.com/emcd/python-detextive/blob/master/LICENSE.txt

.. image:: https://img.shields.io/pypi/pyversions/detextive
   :alt: Python Versions
   :target: https://pypi.org/project/detextive/


🕵️ A Python library which provides consolidated text detection
capabilities for reliable content analysis. Offers MIME type detection,
character set detection, and line separator processing.

Key Features ⭐
===============================================================================

🔍 **MIME Type Detection**
  Intelligent content-based detection using magic bytes with file extension
  fallback for comprehensive format identification.

📝 **Character Encoding Detection**
  Statistical analysis with UTF-8 optimization and validation through decode
  operations for reliable text processing.

📄 **Line Separator Processing**
  Cross-platform line ending detection and normalization supporting CR, LF,
  and CRLF formats with mixed-content handling.

✅ **Textual Content Validation**
  Smart classification of MIME types and content reasonableness assessment
  using control character and printability heuristics.


Installation 📦
===============================================================================

Method: Install Python Package
-------------------------------------------------------------------------------

Install via `uv <https://github.com/astral-sh/uv/blob/main/README.md>`_ ``pip``
command:

::

    uv pip install detextive

Or, install via ``pip``:

::

    pip install detextive


Examples 💡
===============================================================================

Basic Usage
-------------------------------------------------------------------------------

**MIME Type and Charset Detection**:

Load your content as bytes:

.. code-block:: python

    import detextive

    with open( 'document.txt', 'rb' ) as file:
        content = file.read( )

You can detect MIME type and charset individually:

.. code-block:: python

    mimetype = detextive.detect_mimetype( content, location = 'document.txt' )
    charset = detextive.detect_charset( content )

Or use combined inference for better accuracy:

.. code-block:: python

    mimetype, charset = detextive.infer_mimetype_charset(
        content, location = 'document.txt' )
    print( "Detected: {mimetype} with {charset} encoding".format(
        mimetype = mimetype, charset = charset ) )

**Line Separator Processing**:

Detect line separators in mixed content:

.. code-block:: python

    import detextive

    content = 'Line 1\r\nLine 2\rLine 3\n'
    separator = detextive.LineSeparators.detect_bytes( content.encode( ) )

Normalize line separators to Python standard:

.. code-block:: python

    normalized = detextive.LineSeparators.normalize_universal( content )

Convert to platform-specific line separators:

.. code-block:: python

    native = detextive.LineSeparators.CRLF.nativize( normalized )

**Content Classification**:

Check if MIME types represent textual content:

.. code-block:: python

    import detextive

    detextive.is_textual_mimetype( 'application/json' )  # True
    detextive.is_textual_mimetype( 'image/jpeg' )        # False

Validate that decoded text content is reasonable:

.. code-block:: python

    text = "Hello world!"
    detextive.is_valid_text( text )      # True

Binary data that might decode as text but isn't valid fails validation:

.. code-block:: python

    binary_as_text = "Config file\x00\x00\x00data"
    detextive.is_valid_text( binary_as_text )  # False

**High-Level Decoding**:

For complete bytes-to-text processing with automatic charset detection and validation:

.. code-block:: python

    import detextive

    with open( 'document.txt', 'rb' ) as file:
        content = file.read( )

    text = detextive.decode( content, location = 'document.txt' )
    print( f"Decoded text: {text}" )


Contribution 🤝
===============================================================================

Contribution to this project is welcome! However, it must follow the `code of
conduct
<https://emcd.github.io/python-project-common/stable/sphinx-html/common/conduct.html>`_
for the project.

Please file bug reports and feature requests in the `issue tracker
<https://github.com/emcd/python-detextive/issues>`_ or submit `pull
requests <https://github.com/emcd/python-detextive/pulls>`_ to
improve the source code or documentation.

For development guidance and standards, please see the `development guide
<https://emcd.github.io/python-detextive/stable/sphinx-html/contribution.html#development>`_.


Additional Indicia
===============================================================================

.. image:: https://img.shields.io/github/last-commit/emcd/python-detextive
   :alt: GitHub last commit
   :target: https://github.com/emcd/python-detextive

.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/copier-org/copier/master/img/badge/badge-grayscale-inverted-border-orange.json
   :alt: Copier
   :target: https://github.com/copier-org/copier

.. image:: https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg
   :alt: Hatch
   :target: https://github.com/pypa/hatch

.. image:: https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit
   :alt: pre-commit
   :target: https://github.com/pre-commit/pre-commit

.. image:: https://microsoft.github.io/pyright/img/pyright_badge.svg
   :alt: Pyright
   :target: https://microsoft.github.io/pyright

.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
   :alt: Ruff
   :target: https://github.com/astral-sh/ruff

.. image:: https://img.shields.io/pypi/implementation/detextive
   :alt: PyPI - Implementation
   :target: https://pypi.org/project/detextive/

.. image:: https://img.shields.io/pypi/wheel/detextive
   :alt: PyPI - Wheel
   :target: https://pypi.org/project/detextive/


Other Projects by This Author 🌟
===============================================================================


* `python-absence <https://github.com/emcd/python-absence>`_ (`absence <https://pypi.org/project/absence/>`_ on PyPI)

  🕳️ A Python library package which provides a **sentinel for absent values** - a falsey, immutable singleton that represents the absence of a value in contexts where ``None`` or ``False`` may be valid values.
* `python-accretive <https://github.com/emcd/python-accretive>`_ (`accretive <https://pypi.org/project/accretive/>`_ on PyPI)

  🌌 A Python library package which provides **accretive data structures** - collections which can grow but never shrink.
* `python-classcore <https://github.com/emcd/python-classcore>`_ (`classcore <https://pypi.org/project/classcore/>`_ on PyPI)

  🏭 A Python library package which provides **foundational class factories and decorators** for providing classes with attributes immutability and concealment and other custom behaviors.
* `python-dynadoc <https://github.com/emcd/python-dynadoc>`_ (`dynadoc <https://pypi.org/project/dynadoc/>`_ on PyPI)

  📝 A Python library package which bridges the gap between **rich annotations** and **automatic documentation generation** with configurable renderers and support for reusable fragments.
* `python-falsifier <https://github.com/emcd/python-falsifier>`_ (`falsifier <https://pypi.org/project/falsifier/>`_ on PyPI)

  🎭 A very simple Python library package which provides a **base class for falsey objects** - objects that evaluate to ``False`` in boolean contexts.
* `python-frigid <https://github.com/emcd/python-frigid>`_ (`frigid <https://pypi.org/project/frigid/>`_ on PyPI)

  🔒 A Python library package which provides **immutable data structures** - collections which cannot be modified after creation.
* `python-icecream-truck <https://github.com/emcd/python-icecream-truck>`_ (`icecream-truck <https://pypi.org/project/icecream-truck/>`_ on PyPI)

  🍦 **Flavorful Debugging** - A Python library which enhances the powerful and well-known ``icecream`` package with flavored traces, configuration hierarchies, customized outputs, ready-made recipes, and more.
* `python-librovore <https://github.com/emcd/python-librovore>`_ (`librovore <https://pypi.org/project/librovore/>`_ on PyPI)

  🐲 **Documentation Search Engine** - An intelligent documentation search and extraction tool that provides both a command-line interface for humans and an MCP (Model Context Protocol) server for AI agents. Search across Sphinx and MkDocs sites with fuzzy matching, extract clean markdown content, and integrate seamlessly with AI development workflows.
* `python-mimeogram <https://github.com/emcd/python-mimeogram>`_ (`mimeogram <https://pypi.org/project/mimeogram/>`_ on PyPI)

  📨 A command-line tool for **exchanging collections of files with Large Language Models** - bundle multiple files into a single clipboard-ready document while preserving directory structure and metadata... good for code reviews, project sharing, and LLM interactions.
