Metadata-Version: 2.4
Name: cjm-source-provider
Version: 0.0.1
Summary: Protocol and data types for querying content sources across decomposition workflows.
Home-page: https://github.com/cj-mills/cjm-source-provider
Author: Christian J. Mills
Author-email: 9126128+cj-mills@users.noreply.github.com
License: Apache-2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-python
Dynamic: summary

# cjm-source-provider


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` bash
pip install cjm_source_provider
```

## Project Structure

    nbs/
    ├── models.ipynb    # Data types for source records, source blocks, and selected source references.
    └── protocols.ipynb # Protocol definitions for extensible content source providers.

Total: 2 notebooks

## Module Dependencies

``` mermaid
graph LR
    models[models<br/>Models]
    protocols[protocols<br/>Protocols]

    protocols --> models
```

*1 cross-module dependencies detected*

## CLI Reference

No CLI commands found in this project.

## Module Overview

Detailed documentation for each module in the project:

### Models (`models.ipynb`)

> Data types for source records, source blocks, and selected source
> references.

#### Import

``` python
from cjm_source_provider.models import (
    SourceRecord,
    SelectedSource,
    SourceBlock
)
```

#### Classes

``` python
class SourceRecord(TypedDict):
    "Standard record format for source providers."
```

``` python
class SelectedSource(TypedDict):
    "A selected source reference in a queue."
```

``` python
@dataclass
class SourceBlock:
    "A content block fetched from a source provider for processing."
    
    id: str  # Unique identifier (e.g., job_id, file path hash)
    provider_id: str  # Source provider identifier
    text: str  # Raw content text
    media_path: Optional[str]  # Path to source media file
    metadata: Dict[str, Any] = field(...)  # Additional metadata from source
    
    def to_dict(self) -> Dict[str, Any]:  # Dictionary representation
            """Convert to dictionary for JSON serialization."""
            return asdict(self)
        
        @classmethod
        def from_dict(
            cls,
            data: Dict[str, Any]  # Dictionary representation
        ) -> 'SourceBlock':  # Reconstructed SourceBlock
        "Convert to dictionary for JSON serialization."
    
    def from_dict(
            cls,
            data: Dict[str, Any]  # Dictionary representation
        ) -> 'SourceBlock':  # Reconstructed SourceBlock
        "Create from dictionary."
```

### Protocols (`protocols.ipynb`)

> Protocol definitions for extensible content source providers.

#### Import

``` python
from cjm_source_provider.protocols import (
    SourceProvider
)
```

#### Classes

``` python
@runtime_checkable
class SourceProvider(Protocol):
    "Protocol for content source providers."
    
    def provider_id(self) -> str:  # Unique identifier for this provider instance
            """Unique identifier for this provider instance."""
            ...
        
        @property
        def provider_name(self) -> str:  # Human-readable name for display
        "Unique identifier for this provider instance."
    
    def provider_name(self) -> str:  # Human-readable name for display
            """Human-readable name for display."""
            ...
        
        @property
        def provider_type(self) -> str:  # Provider type category
        "Human-readable name for display."
    
    def provider_type(self) -> str:  # Provider type category
            """Provider type category (e.g., 'transcription_db', 'local_file', 'video_source')."""
            ...
        
        def query_records(
            self,
            limit: int = 100  # Maximum number of records to return
        ) -> List['SourceRecord']:  # List of source records
        "Provider type category (e.g., 'transcription_db', 'local_file', 'video_source')."
    
    def query_records(
            self,
            limit: int = 100  # Maximum number of records to return
        ) -> List['SourceRecord']:  # List of source records
        "Query available records from this provider."
    
    def get_source_block(
            self,
            record_id: str  # Record identifier
        ) -> Optional['SourceBlock']:  # SourceBlock or None if not found
        "Fetch a specific record as a SourceBlock for processing."
```
