Metadata-Version: 2.4
Name: presidio_analyzer
Version: 2.2.362
Summary: Presidio Analyzer package
License-Expression: MIT
Keywords: presidio_analyzer
Author: Presidio
Author-email: presidio@microsoft.com
Requires-Python: >=3.10,<3.14
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Provides-Extra: ahds
Provides-Extra: azure-ai-language
Provides-Extra: gliner
Provides-Extra: langextract
Provides-Extra: server
Provides-Extra: stanza
Provides-Extra: transformers
Requires-Dist: accelerate ; extra == "transformers"
Requires-Dist: azure-ai-textanalytics ; extra == "azure-ai-language"
Requires-Dist: azure-core ; extra == "azure-ai-language"
Requires-Dist: azure-health-deidentification (>=1.1.0b1,<2.0.0) ; extra == "ahds"
Requires-Dist: azure-identity (>=1.18.0) ; extra == "langextract"
Requires-Dist: azure-identity (>=1.23.0,<2.0.0) ; extra == "ahds"
Requires-Dist: flask (>=1.1) ; extra == "server"
Requires-Dist: gliner (>=0.2.13,<1.0.0) ; extra == "gliner"
Requires-Dist: gunicorn ; (platform_system != "Windows") and (extra == "server")
Requires-Dist: huggingface_hub ; extra == "gliner"
Requires-Dist: huggingface_hub ; extra == "transformers"
Requires-Dist: jinja2 (>=3.0.0) ; extra == "langextract"
Requires-Dist: langextract (>=1.0.0) ; extra == "langextract"
Requires-Dist: more-itertools (>=10.0.0) ; extra == "langextract"
Requires-Dist: onnxruntime (>=1.19) ; (python_version > "3.10") and (extra == "gliner")
Requires-Dist: onnxruntime (>=1.19,<1.24.1) ; (python_version == "3.10") and (extra == "gliner")
Requires-Dist: openai (>=1.50.0) ; extra == "langextract"
Requires-Dist: phonenumbers (>=8.12,<10.0.0)
Requires-Dist: pydantic (>=2.0.0,<3.0.0)
Requires-Dist: pyyaml
Requires-Dist: regex
Requires-Dist: spacy (>=3.4.4,!=3.7.0)
Requires-Dist: spacy_huggingface_pipelines ; extra == "transformers"
Requires-Dist: stanza (>=1.10.1,<2.0.0) ; extra == "stanza"
Requires-Dist: tldextract
Requires-Dist: transformers ; extra == "gliner"
Requires-Dist: transformers ; extra == "transformers"
Requires-Dist: waitress ; (platform_system == "Windows") and (extra == "server")
Project-URL: Homepage, https://github.com/Microsoft/presidio
Description-Content-Type: text/markdown

# Presidio analyzer

## Description

The Presidio analyzer is a Python based service for detecting PII entities in text.

During analysis, it runs a set of different _PII Recognizers_,
each one in charge of detecting one or more PII entities using different mechanisms.

Presidio analyzer comes with a set of predefined recognizers,
but can easily be extended with other types of custom recognizers.
Predefined and custom recognizers leverage regex,
Named Entity Recognition and other types of logic to detect PII in unstructured text.

### Language Model-based PII/PHI Detection

Presidio analyzer supports language model-based PII/PHI detection (LLMs, SLMs) for flexible entity recognition. The current implementation uses [LangExtract](https://github.com/google/langextract) with support for multiple providers:

- **Ollama** - Local model deployment for privacy-sensitive environments
- **Azure OpenAI** - Cloud-based deployment with enterprise features

```bash
pip install presidio-analyzer[langextract]
```

#### Quick Usage

**Ollama** (local models):

```python
from presidio_analyzer.predefined_recognizers import BasicLangExtractRecognizer
recognizer = BasicLangExtractRecognizer()  # Uses default config
```

**Azure OpenAI** (cloud models):

```python
from presidio_analyzer.predefined_recognizers import AzureOpenAILangExtractRecognizer

# Simple usage - pass everything as parameters
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4",  # Your Azure deployment name
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-api-key"
)

# Or use environment variables (AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY):
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4"  # Your Azure deployment name
)

# Advanced: Customize entities/prompts with config file
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4",
    config_path="./custom_config.yaml",  # Optional: for custom entities/prompts
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-api-key"
)
```

**Note:** LangExtract recognizers do not validate connectivity during initialization. Connection errors or missing models will be reported when `analyze()` is first called.

See the [Language Model-based PII/PHI Detection guide](https://microsoft.github.io/presidio/samples/python/langextract/) for complete setup and usage instructions.

## Deploy Presidio analyzer to Azure

Use the following button to deploy presidio analyzer to your Azure subscription.

[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fmicrosoft%2Fpresidio%2Fmain%2Fpresidio-analyzer%2Fdeploytoazure.json)

## Simple usage example

```python
from presidio_analyzer import AnalyzerEngine

# Set up the engine, loads the NLP module (spaCy model by default) and other PII recognizers
analyzer = AnalyzerEngine()

# Call analyzer to get results
results = analyzer.analyze(text="My phone number is 212-555-5555",
                           entities=["PHONE_NUMBER"],
                           language='en')
print(results)

```

## GPU Acceleration

For GPU acceleration, install the appropriate dependencies for your hardware:

- **Linux with NVIDIA GPU**: cupy-cuda12x (or the version matching your CUDA installation)
- **macOS with Apple Silicon**: MPS (Metal Performance Shaders) is currently not supported. The analyzer will use CPU for PyTorch operations.

## Documentation

Additional documentation on installation, usage and extending the Analyzer can be found under the [Analyzer](https://microsoft.github.io/presidio/analyzer/) section of [Presidio Documentation](https://microsoft.github.io/presidio)

