Metadata-Version: 2.4
Name: detextive
Version: 1.0
Summary: Detects textual content.
Project-URL: Homepage, https://github.com/emcd/python-detextive
Project-URL: Documentation, https://emcd.github.io/python-detextive
Project-URL: Download, https://pypi.org/project/detextive/#files
Project-URL: Source Code, https://github.com/emcd/python-detextive
Project-URL: Issue Tracker, https://github.com/emcd/python-detextive/issues
Author-email: Eric McDonald <emcd@users.noreply.github.com>
License-Expression: Apache-2.0
License-File: LICENSE.txt
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.10
Requires-Dist: absence~=1.1
Requires-Dist: chardet
Requires-Dist: dynadoc~=1.4
Requires-Dist: frigid~=4.1
Requires-Dist: puremagic
Requires-Dist: typing-extensions
Description-Content-Type: text/x-rst

.. vim: set fileencoding=utf-8:
.. -*- coding: utf-8 -*-
.. +--------------------------------------------------------------------------+
   |                                                                          |
   | Licensed under the Apache License, Version 2.0 (the "License");          |
   | you may not use this file except in compliance with the License.         |
   | You may obtain a copy of the License at                                  |
   |                                                                          |
   |     http://www.apache.org/licenses/LICENSE-2.0                           |
   |                                                                          |
   | Unless required by applicable law or agreed to in writing, software      |
   | distributed under the License is distributed on an "AS IS" BASIS,        |
   | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
   | See the License for the specific language governing permissions and      |
   | limitations under the License.                                           |
   |                                                                          |
   +--------------------------------------------------------------------------+

*******************************************************************************
                                   detextive
*******************************************************************************

.. image:: https://img.shields.io/pypi/v/detextive
   :alt: Package Version
   :target: https://pypi.org/project/detextive/

.. image:: https://img.shields.io/pypi/status/detextive
   :alt: PyPI - Status
   :target: https://pypi.org/project/detextive/

.. image:: https://github.com/emcd/python-detextive/actions/workflows/tester.yaml/badge.svg?branch=master&event=push
   :alt: Tests Status
   :target: https://github.com/emcd/python-detextive/actions/workflows/tester.yaml

.. image:: https://emcd.github.io/python-detextive/coverage.svg
   :alt: Code Coverage Percentage
   :target: https://github.com/emcd/python-detextive/actions/workflows/tester.yaml

.. image:: https://img.shields.io/github/license/emcd/python-detextive
   :alt: Project License
   :target: https://github.com/emcd/python-detextive/blob/master/LICENSE.txt

.. image:: https://img.shields.io/pypi/pyversions/detextive
   :alt: Python Versions
   :target: https://pypi.org/project/detextive/


🕵️ A Python library which provides consolidated text detection
capabilities for reliable content analysis. Offers MIME type detection,
character set detection, and line separator processing.

Key Features ⭐
===============================================================================

🔍 **MIME Type Detection**
  Intelligent content-based detection using magic bytes with file extension
  fallback for comprehensive format identification.

📝 **Character Encoding Detection**
  Statistical analysis with UTF-8 optimization and validation through decode
  operations for reliable text processing.

📄 **Line Separator Processing**
  Cross-platform line ending detection and normalization supporting CR, LF,
  and CRLF formats with mixed-content handling.

✅ **Textual Content Validation**
  Smart classification of MIME types and content reasonableness assessment
  using control character and printability heuristics.


Installation 📦
===============================================================================

Method: Install Python Package
-------------------------------------------------------------------------------

Install via `uv <https://github.com/astral-sh/uv/blob/main/README.md>`_ ``pip``
command:

::

    uv pip install detextive

Or, install via ``pip``:

::

    pip install detextive


Examples 💡
===============================================================================

Basic Usage
-------------------------------------------------------------------------------

**MIME Type and Charset Detection**:

.. code-block:: python

    import detextive

    with open( 'document.txt', 'rb' ) as file:
        content = file.read( )

    # Individual detection
    mimetype = detextive.detect_mimetype( content, 'document.txt' )
    charset = detextive.detect_charset( content )

    # Combined detection
    mimetype, charset = detextive.detect_mimetype_and_charset(
        content, 'document.txt' )
    print( "Detected: {mimetype} with {charset} encoding".format(
        mimetype = mimetype, charset = charset ) )

**Line Separator Processing**:

.. code-block:: python

    import detextive

    content = 'Line 1\r\nLine 2\rLine 3\n'
    separator = detextive.LineSeparators.detect_bytes( content.encode( ) )

    # Normalize line separators to Python standard.
    normalized = detextive.LineSeparators.normalize_universal( content )

    # Convert to specific line separators.
    native = detextive.LineSeparators.CRLF.nativize( normalized )

**Content Classification**:

.. code-block:: python

    import detextive

    # Check if MIME type represents textual content
    detextive.is_textual_mimetype( 'application/json' )  # True
    detextive.is_textual_mimetype( 'image/jpeg' )        # False

    # Validate text content from bytes
    detextive.is_textual_content( b'Hello world!' )      # True
    detextive.is_textual_content( b'\x00\x01\x02\x03' )  # False


Contribution 🤝
===============================================================================

Contribution to this project is welcome! However, it must follow the `code of
conduct
<https://emcd.github.io/python-project-common/stable/sphinx-html/common/conduct.html>`_
for the project.

Please file bug reports and feature requests in the `issue tracker
<https://github.com/emcd/python-detextive/issues>`_ or submit `pull
requests <https://github.com/emcd/python-detextive/pulls>`_ to
improve the source code or documentation.

For development guidance and standards, please see the `development guide
<https://emcd.github.io/python-detextive/stable/sphinx-html/contribution.html#development>`_.


`More Flair <https://www.imdb.com/title/tt0151804/characters/nm0431918>`_
===============================================================================

.. image:: https://img.shields.io/github/last-commit/emcd/python-detextive
   :alt: GitHub last commit
   :target: https://github.com/emcd/python-detextive

.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/copier-org/copier/master/img/badge/badge-grayscale-inverted-border-orange.json
   :alt: Copier
   :target: https://github.com/copier-org/copier

.. image:: https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg
   :alt: Hatch
   :target: https://github.com/pypa/hatch

.. image:: https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit
   :alt: pre-commit
   :target: https://github.com/pre-commit/pre-commit

.. image:: https://microsoft.github.io/pyright/img/pyright_badge.svg
   :alt: Pyright
   :target: https://microsoft.github.io/pyright

.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
   :alt: Ruff
   :target: https://github.com/astral-sh/ruff

.. image:: https://img.shields.io/pypi/implementation/detextive
   :alt: PyPI - Implementation
   :target: https://pypi.org/project/detextive/

.. image:: https://img.shields.io/pypi/wheel/detextive
   :alt: PyPI - Wheel
   :target: https://pypi.org/project/detextive/


Other Projects by This Author 🌟
===============================================================================


* `python-absence <https://github.com/emcd/python-absence>`_ (`absence <https://pypi.org/project/absence/>`_ on PyPI)

  🕳️ A Python library package which provides a **sentinel for absent values** - a falsey, immutable singleton that represents the absence of a value in contexts where ``None`` or ``False`` may be valid values.
* `python-accretive <https://github.com/emcd/python-accretive>`_ (`accretive <https://pypi.org/project/accretive/>`_ on PyPI)

  🌌 A Python library package which provides **accretive data structures** - collections which can grow but never shrink.
* `python-classcore <https://github.com/emcd/python-classcore>`_ (`classcore <https://pypi.org/project/classcore/>`_ on PyPI)

  🏭 A Python library package which provides **foundational class factories and decorators** for providing classes with attributes immutability and concealment and other custom behaviors.
* `python-dynadoc <https://github.com/emcd/python-dynadoc>`_ (`dynadoc <https://pypi.org/project/dynadoc/>`_ on PyPI)

  📝 A Python library package which bridges the gap between **rich annotations** and **automatic documentation generation** with configurable renderers and support for reusable fragments.
* `python-falsifier <https://github.com/emcd/python-falsifier>`_ (`falsifier <https://pypi.org/project/falsifier/>`_ on PyPI)

  🎭 A very simple Python library package which provides a **base class for falsey objects** - objects that evaluate to ``False`` in boolean contexts.
* `python-frigid <https://github.com/emcd/python-frigid>`_ (`frigid <https://pypi.org/project/frigid/>`_ on PyPI)

  🔒 A Python library package which provides **immutable data structures** - collections which cannot be modified after creation.
* `python-icecream-truck <https://github.com/emcd/python-icecream-truck>`_ (`icecream-truck <https://pypi.org/project/icecream-truck/>`_ on PyPI)

  🍦 **Flavorful Debugging** - A Python library which enhances the powerful and well-known ``icecream`` package with flavored traces, configuration hierarchies, customized outputs, ready-made recipes, and more.
* `python-mimeogram <https://github.com/emcd/python-mimeogram>`_ (`mimeogram <https://pypi.org/project/mimeogram/>`_ on PyPI)

  📨 A command-line tool for **exchanging collections of files with Large Language Models** - bundle multiple files into a single clipboard-ready document while preserving directory structure and metadata... good for code reviews, project sharing, and LLM interactions.
