Metadata-Version: 2.4
Name: qtsicl
Version: 0.1.0
Summary: A library for generating high-quality synthetic data for AI development.
Author: 3rror_py
License: MIT License
        
        Copyright (c) 2025 3rror_py and Qtsicl
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Keywords: synthetic data,data generation,AI,machine learning,privacy,data augmentation,time series,tabular data
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Security :: Cryptography
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.0.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: statsmodels>=0.13.0
Provides-Extra: smote
Requires-Dist: imbalanced-learn>=0.9.0; extra == "smote"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Dynamic: license-file

QTSICL: Your Comprehensive AI Toolkit
Version: 0.1.0 (Alpha Development Stage)

Overview: Empowering AI Development with QTSICL
QTSICL (likely an acronym for a powerful cosmic entity, reflecting scale and innovation) is a comprehensive suite of Python libraries and tools designed by 3rror_py and the QTSICL Community. It aims to address critical challenges and streamline the entire lifecycle of Artificial Intelligence, from data preparation and model development to ethical deployment and intelligent automation.

By consolidating specialized modules, QTSICL provides a unified, powerful ecosystem for developers and researchers to build robust, ethical, and high-performing AI systems efficiently.

Core QTSICL Modules & Components
The QTSICL toolkit is built upon several foundational modules, each serving a distinct yet interconnected purpose in the AI development pipeline:

1. Synthetic Data Generation (synth-data-gen within QTSICL)
Purpose: This core module within QTSICL is designed to tackle data scarcity and associated challenges by offering robust, flexible tools for generating high-quality synthetic data that statistically mimics real-world datasets.

Why it's Crucial:

Accelerated Development: Reduces time/cost of real data collection, enabling faster prototyping and testing.

Privacy Preservation: Generates data that captures statistical properties without revealing sensitive individual records.

Addressing Data Imbalance: Balances datasets suffering from class imbalance for more robust models.

Enhanced Diversity & Robustness: Creates specific variations and edge cases to improve model generalization.

Key Functionalities:

qtsicl.synth_data_gen.tabular: Generates synthetic tabular data, preserving numerical distributions, correlations, and categorical frequencies.

qtsicl.synth_data_gen.timeseries: Synthesizes time series data by decomposing and recreating trends, seasonality, and residuals.

qtsicl.synth_data_gen.augmentation: Provides data expansion techniques like SMOTE and feature space augmentation.

qtsicl.synth_data_gen.privacy (Conceptual): Focuses on generating data with explicit privacy guarantees (e.g., Differential Privacy).

qtsicl.synth_data_gen.text (Conceptual): Generates simple or structured synthetic text using templates and rules.

qtsicl.synth_data_gen.config (Conceptual): Manages configuration for profile-based or constraint-driven synthesis.

qtsicl.synth_data_gen.utils: Centralizes common helper functions like random seed setting and statistical profiling.

2. AI Model Development & MLOps (ai-dev)
Overview: ai-dev is a comprehensive Python library designed to accelerate and simplify the entire lifecycle of AI model development, from data preparation and model training to evaluation, deployment, and ongoing operations (MLOps). It empowers developers and researchers by abstracting away common complexities, integrating best practices, and offering intelligent automation for building robust and high-performing AI systems.

Key Features:

Data Management & Preparation: Flexible data loading, robust data splitting (including stratified sampling), handling missing values, text cleaning, and comprehensive data profiling.

Feature Engineering: Automated feature creation, selection, and transformation.

Model Development & Training: Efficient model training with callbacks, hyperparameter optimization, and support for fine-tuning large language models.

Evaluation & Validation: Advanced metrics, cross-validation, and performance analysis.

MLOps & Deployment: Tools for model versioning, pipeline orchestration, and deployment.

Human-in-the-Loop Feedback: Integration for continuous model improvement.

3. Universal Content & Action Orchestrator (qcsc-mod)
Overview: qcsc-mod is an intelligent Python library designed to empower both developers and everyday users with advanced AI capabilities for content creation and assisted automation. By leveraging the power of Large Language Models (LLMs) and Image Generation models, qcsc-mod provides a simplified interface to generate diverse text content, create basic images, and orchestrate actions by interpreting natural language instructions.

Key Features:

Intelligent Text Generation: Functions like generate_text, summarize, and brainstorm_ideas.

Basic Image Creation: The generate_image function to produce simple images from text descriptions.

Assisted Automation (via perform_action): Interprets natural language for tasks like save_text and read_text from files.

4. Trustworthy & Responsible AI (py-ai-trust)
Overview: py-ai-trust is a dedicated Python library designed to empower developers and researchers in building, evaluating, and monitoring Trustworthy and Responsible AI (RAI) systems. It ensures AI models are not just accurate but also fair, robust, private, and explainable, complementing existing AI development workflows.

Key Features:

Fairness Audit: Bias detection using metrics like Statistical Parity Difference, Disparate Impact, and visualizations of disparities.

Bias Mitigation: Techniques like Reweighing (pre-processing) and Equalized Odds (post-processing) to reduce bias.

Robustness Testing: Assess model performance under various noise types, conceptual adversarial examples, and data corruption.

Privacy Auditing: Conceptual Membership Inference Attacks (MIA) and Feature Leakage Detection.

Explainability Enhancement: Permutation Importance, Partial Dependence Plots (PDP), and Individual Conditional Expectation (ICE) Plots.

Comprehensive Audit & Reporting: Orchestrates holistic trustworthiness audits and generates consolidated reports.

The QTSICL Ecosystem: How It All Connects
The various modules under the QTSICL umbrella are designed to work together seamlessly, forming a cohesive and powerful AI development ecosystem:

qtsicl (with synth_data_gen) provides the Data: It serves as the data factory, generating high-quality synthetic data for training, testing, and privacy-preserving use cases. This data can directly feed into ai-dev.

ai-dev Builds the Models Efficiently: Leveraging data from qtsicl.synth_data_gen, ai-dev enables rapid prototyping, efficient training, and robust MLOps practices for your AI models.

py-ai-trust Ensures Responsibility: Models built using ai-dev (and potentially data generated by qtsicl.synth_data_gen) are then audited by py-ai-trust to ensure they are fair, robust, private, and explainable, minimizing ethical risks before deployment.

qcsc-mod Orchestrates and Creates: This module integrates by providing advanced content generation and natural-language-driven automation capabilities, potentially leveraging AI insights from other QTSICL components or assisting in data preparation/reporting.

By combining qtsicl's data generation capabilities with ai-dev's development speed, py-ai-trust's focus on responsibility, and qcsc-mod's intelligent orchestration, the QTSICL ecosystem empowers you to build comprehensive, efficient, and trustworthy AI systems.

Key Technologies and Libraries Used (Across QTSICL Modules)
The QTSICL ecosystem leverages standard and widely-used Python libraries for data manipulation, scientific computing, and statistical modeling:

numpy: Fundamental package for numerical computing.

pandas: For efficient data manipulation and analysis of tabular data.

scikit-learn: Provides various ML utilities, including base estimators and data preprocessing.

scipy: For scientific computing, including statistical functions (e.g., multivariate_normal).

statsmodels: Crucial for time series decomposition and analysis.

matplotlib & seaborn: For data visualization and plotting generated data characteristics.

imbalanced-learn (Optional): For advanced data augmentation techniques like SMOTE.

Additional libraries specific to ai-dev, qcsc-mod, and py-ai-trust (e.g., transformers for LLMs, Pillow for image generation, adversarial-robustness-toolbox, opacus for DP, shap, lime for explainability).

Installation
You can install the primary qtsicl package (which includes synth_data_gen):

pip install qtsicl

To include optional dependencies for data augmentation (SMOTE) within synth_data_gen:

pip install "qtsicl[smote]"

For ai-dev, qcsc-mod, and py-ai-trust, please refer to their respective documentation or repositories for installation instructions. They are intended to be installed as separate, complementary packages.

Usage Example (Synthetic Data Generation from qtsicl)
Here's a basic example demonstrating how to use the TabularSynthesizer from qtsicl's synth_data_gen module:

import pandas as pd
from qtsicl.qtsicl.synth_data_gen.tabular import TabularSynthesizer # Note the nested import path
from qtsicl.qtsicl.synth_data_gen.utils import evaluate_synthetic_data, get_statistical_profile

# 1. Prepare some real data
data = {
    'Age': [25, 30, 35, 28, 40, 25, 30, 50, 45, 33],
    'Income': [50000, 60000, 75000, 55000, 80000, 52000, 62000, 90000, 85000, 70000],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
    'City': ['NY', 'LA', 'NY', 'CHI', 'LA', 'NY', 'CHI', 'LA', 'NY', 'CHI']
}
real_df = pd.DataFrame(data)

print("Original Real Data Head:")
print(real_df.head())
print("\nOriginal Real Data Profile:")
print(get_statistical_profile(real_df))

# 2. Initialize and fit the TabularSynthesizer
synthesizer = TabularSynthesizer()
synthesizer.fit(real_df)

# 3. Generate synthetic data
num_synthetic_samples = 100
synthetic_df = synthesizer.generate(num_synthetic_samples)

print(f"\nGenerated {num_synthetic_samples} Synthetic Data Head:")
print(synthetic_df.head())
print("\nSynthetic Data Profile:")
print(get_statistical_profile(synthetic_df))

# 4. Evaluate the quality
evaluation_report = evaluate_synthetic_data(real_df, synthetic_df)
import json
print("\nSynthetic Data Evaluation Report:")
print(json.dumps(evaluation_report, indent=4))

Project Status and Future Vision
The QTSICL ecosystem is currently at an Alpha development stage (0.1.0). It provides foundational capabilities across synthetic data generation, AI model development, content orchestration, and trustworthy AI.

The future vision for QTSICL includes:

Deepening Integration: Further enhancing the seamless interoperability between all QTSICL modules.

Advanced Capabilities: Expanding into more sophisticated areas like formal differential privacy, advanced time series models, and nuanced text generation within synth_data_gen.

Enhanced Automation: Integrating more intelligent automation features across all components.

Community Contributions: Fostering an active community to grow and enhance the toolkit.

Ultimately, QTSICL aims to be the go-to comprehensive solution for AI development, empowering users with cutting-edge tools and responsible AI practices.

Contributing
Contributions are highly welcome across all QTSICL modules! If you have ideas for new features, bug fixes, or documentation improvements, please feel free to:

Fork the relevant repository (e.g., qtsicl, ai-dev, qcsc-mod, py-ai-trust).

Create a new branch (git checkout -b feature/your-feature).

Make your changes.

Commit your changes (git commit -m 'Add new feature').

Push to the branch (git push origin feature/your-feature).

Open a Pull Request.

License
This project (and its individual modules within the QTSICL ecosystem) is licensed under the MIT License. See the LICENSE file for details.

Copyright (c) 2025 3rror_py and Qtsicl Community
