Metadata-Version: 2.2
Name: msq-data-store-tools
Version: 1.0.0
Summary: Integration tools for MSQ DataStore/Prism - Azure Fabric, LLM, and API utilities
Home-page: https://github.com/freemavens/data_store_tools
Author: Damian Rumble (Forge)
Project-URL: Homepage, https://github.com/freemavens/data_store_tools
Project-URL: Documentation, https://github.com/freemavens/data_store_tools/blob/main/README.md
Project-URL: Repository, https://github.com/freemavens/data_store_tools
Project-URL: Bug Tracker, https://github.com/freemavens/data_store_tools/issues
Keywords: azure,fabric,data-engineering,llm,api,microsoft-fabric,lakehouse
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: azure-storage-blob>=12.0.0
Requires-Dist: azure-storage-file-datalake>=12.0.0
Requires-Dist: azure-identity>=1.12.0
Requires-Dist: openai>=1.0.0
Requires-Dist: requests>=2.28.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: pyarrow>=10.0.0
Requires-Dist: semantic-link-labs>=0.6.0
Requires-Dist: deltalake>=0.10.0
Dynamic: home-page

# DataStore Tools

Integration utilities for MSQ DataStore/Prism platform - simplified access to Azure Fabric, LLM APIs, and data ingestion.

## Installation
```bash
pip install data-store-tools
```

## Quick Start

### Azure Fabric Operations
```python
from data_store_tools import FabricTools

# Initialize Fabric tools
fabric_tools = FabricTools(my_project_name,FABRIC_WORKSPACE_NAME)

# Create table name
lakehouse, table_name = create_datastore_table_name(lakehouse,data_subcat,data_contents,date_part)

# Upload DataFrame to Fabric Lakehouse
load_table_to_lakehouse(df,delete_old_table = False)

# Read from Delta table
df = read_file_from_lakehouse(lakehouse_name,file_name)
```

### Azure Storage Operations
```python
from data_store_tools import AzureTools


# Azure Fabric upload
azure = AzureTools()
azure.upload_to_ml_datastore(my_project_name)

# Blob storage operations
azure.upload_to_ml_datastore(data,blob_file_name,blob_file_path)
azure.download_from_blob(data,blob_file_name)
```

### API Integration
```python
from data_store_tools import APITools

# Generic API calls with retry logic
api = APITools()

```

### LLM Integration
```python
from data_store_tools import LargeLanguageModelTools

# OpenAI GPT integration
llm = LargeLanguageModelTools()

```

## Features

### FabricTools
- **Workspace Discovery**: Find and list Fabric workspaces by pattern
- **Lakehouse Management**: List, create, and manage lakehouses
- **Delta Operations**: Read/write Delta tables with schema validation
- **Medallion Architecture**: Support for Bronze/Silver/Gold layers
- **Schema Management**: Automatic schema mismatch detection and handling
- **Temporal Data**: Ranking and deduplication utilities

### AzureTools
- **Lakehouse Integration**: Upload/download DataFrames to Fabric Lakehouse
- **Blob Storage**: Full CRUD operations on Azure Blob Storage
- **Data Lake Gen2**: Integration with Azure Data Lake Storage
- **Azure ML**: Upload to Azure ML datastores

### APITools
- **Generic Wrapper**: Call any REST API with standardized error handling
- **Retry Logic**: Automatic retry with exponential backoff
- **Authentication**: Support for various auth methods
- **Response Parsing**: JSON/XML parsing utilities

### LargeLanguageModelTools
- **OpenAI Integration**: GPT-3.5, GPT-4 model, GPT-5 deployment patterns
- **Azure OpenAI**: Support for Azure-hosted OpenAI services
- **Prompt Management**: Template and prompt engineering utilities
- **Batch Processing**: Handle multiple completions efficiently

## Architecture Compatibility

Designed for MSQ DataStore/Prism medallion architecture:
- **Bronze Layer**: Raw data ingestion
- **Silver Layer**: Cleaned and validated data
- **Gold Layer**: Production-ready, aggregated data
- **Sandbox**: Development and testing

## Cross-Tenant Support

Handles MSQ/Freemavens tenant scenarios:
- DefaultAzureCredential authentication
- Service principal support
- Workspace-specific operations
- OneLake path conventions

## Documentation

Full documentation and examples: https://github.com/freemavens/data_store_tools

## Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

## License

MIT License

## Support

- Issues: https://github.com/freemavens/data_store_tools/issues
- Internal: Contact Forge Data Science team
```

## Create LICENSE
```
MIT License

Copyright (c) 2026 Forge

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```

## Create MANIFEST.in (for additional files)
```
include README.md
include LICENSE
recursive-include src/data_store_tools *.py
recursive-exclude * __pycache__
recursive-exclude * *.py[co]
```

## Project Organization

```
├── LICENSE            <- Open-source license if one is chosen
├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default mkdocs project; see www.mkdocs.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml     <- Project configuration file with package metadata for 
│                         data_store_tools and configuration for tools like black
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.cfg          <- Configuration file for flake8
│
└── scr  <- Source code for use in this project.
    │
    ├── __init__.py             <- Makes data-store-tools a Python module
    │
    ├── fabric_tools.py               <- 
    │
    ├── azure_tools.py              <- 
    │
    ├── api_tools.py             <- 
    │
    ├── large_langugage_tools.py    <- 
    │
    ├── tooling                
    │   ├── __init__.py 
    │   ├── load_env.py          <-     
    │   └── utils.py            <-  
```

--------

