Metadata-Version: 2.4
Name: gaik
Version: 0.3.14
Summary: General AI Kit - Reusable AI/ML components for Python
Author: GAIK Project
License: MIT License
        
        Copyright (c) 2026 GAIK - GenAI for knowledge mgt
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://gaik.ai/
Project-URL: Repository, https://github.com/GAIK-project/gaik-toolkit
Project-URL: Documentation, https://github.com/GAIK-project/gaik-toolkit/tree/main/docs
Project-URL: Issues, https://github.com/GAIK-project/gaik-toolkit/issues
Keywords: ai,ml,openai,azure-openai,structured-outputs,pydantic,schema,extraction,transcription,whisper,audio,video,pdf-parsing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.12.0
Requires-Dist: python-dotenv>=1.2.0
Requires-Dist: openai>=2.7.0
Provides-Extra: extract
Provides-Extra: embedder
Requires-Dist: openai>=1.58.0; extra == "embedder"
Requires-Dist: langchain-core>=0.2.0; extra == "embedder"
Provides-Extra: vector-store
Requires-Dist: langchain-core>=0.2.0; extra == "vector-store"
Requires-Dist: numpy>=1.24.0; extra == "vector-store"
Requires-Dist: chromadb>=0.5.0; extra == "vector-store"
Provides-Extra: pg-vector-store
Requires-Dist: psycopg[binary]>=3.1; extra == "pg-vector-store"
Requires-Dist: langchain-core>=0.2.0; extra == "pg-vector-store"
Provides-Extra: retriever
Requires-Dist: gaik[embedder]; extra == "retriever"
Requires-Dist: gaik[vector-store]; extra == "retriever"
Requires-Dist: langchain-core>=0.2.0; extra == "retriever"
Requires-Dist: sentence-transformers>=2.6.0; extra == "retriever"
Provides-Extra: answer-generator
Requires-Dist: openai>=1.58.0; extra == "answer-generator"
Requires-Dist: langchain-core>=0.2.0; extra == "answer-generator"
Provides-Extra: parser
Requires-Dist: PyMuPDF>=1.26.0; extra == "parser"
Requires-Dist: python-docx>=1.2.0; extra == "parser"
Requires-Dist: docling==2.64.1; extra == "parser"
Requires-Dist: psutil; extra == "parser"
Requires-Dist: requests>=2.31.0; extra == "parser"
Provides-Extra: rag-parser-docling
Requires-Dist: docling==2.64.1; extra == "rag-parser-docling"
Requires-Dist: docling-core[chunking]<3.0.0,>=2.50.1; extra == "rag-parser-docling"
Requires-Dist: docling-ibm-models<4,>=3.9.1; extra == "rag-parser-docling"
Requires-Dist: docling-parse<5.0.0,>=4.7.0; extra == "rag-parser-docling"
Requires-Dist: langchain-core>=0.2.0; extra == "rag-parser-docling"
Requires-Dist: pydantic>=2.0.0; extra == "rag-parser-docling"
Requires-Dist: python-dotenv>=1.0.0; extra == "rag-parser-docling"
Requires-Dist: torch>=2.1.0; extra == "rag-parser-docling"
Requires-Dist: transformers>=4.39.0; extra == "rag-parser-docling"
Provides-Extra: rag-parser-vision
Requires-Dist: docling==2.64.1; extra == "rag-parser-vision"
Requires-Dist: docling-core[chunking]<3.0.0,>=2.50.1; extra == "rag-parser-vision"
Requires-Dist: docling-ibm-models<4,>=3.9.1; extra == "rag-parser-vision"
Requires-Dist: docling-parse<5.0.0,>=4.7.0; extra == "rag-parser-vision"
Requires-Dist: langchain-core>=0.2.0; extra == "rag-parser-vision"
Requires-Dist: openai>=2.7; extra == "rag-parser-vision"
Requires-Dist: PyMuPDF>=1.23.0; extra == "rag-parser-vision"
Requires-Dist: Pillow>=10.0.0; extra == "rag-parser-vision"
Requires-Dist: pydantic>=2.0.0; extra == "rag-parser-vision"
Requires-Dist: python-dotenv>=1.0.0; extra == "rag-parser-vision"
Requires-Dist: torch>=2.1.0; extra == "rag-parser-vision"
Requires-Dist: transformers>=4.39.0; extra == "rag-parser-vision"
Provides-Extra: rag-workflow
Requires-Dist: gaik[rag-parser-vision]; extra == "rag-workflow"
Requires-Dist: gaik[embedder]; extra == "rag-workflow"
Requires-Dist: gaik[vector-store]; extra == "rag-workflow"
Requires-Dist: gaik[retriever]; extra == "rag-workflow"
Requires-Dist: gaik[answer-generator]; extra == "rag-workflow"
Provides-Extra: transcriber
Requires-Dist: pydub>=0.25.1; extra == "transcriber"
Requires-Dist: requests>=2.31.0; extra == "transcriber"
Provides-Extra: enhance-transcript
Provides-Extra: text-to-speech
Provides-Extra: parallel-transcriber
Provides-Extra: classifier
Requires-Dist: PyMuPDF>=1.26.0; extra == "classifier"
Requires-Dist: python-docx>=1.2.0; extra == "classifier"
Provides-Extra: all
Requires-Dist: gaik[extract]; extra == "all"
Requires-Dist: gaik[parser]; extra == "all"
Requires-Dist: gaik[transcriber]; extra == "all"
Requires-Dist: gaik[enhance-transcript]; extra == "all"
Requires-Dist: gaik[text-to-speech]; extra == "all"
Requires-Dist: gaik[parallel-transcriber]; extra == "all"
Requires-Dist: gaik[classifier]; extra == "all"
Requires-Dist: gaik[rag-parser-docling]; extra == "all"
Requires-Dist: gaik[rag-parser-vision]; extra == "all"
Requires-Dist: gaik[embedder]; extra == "all"
Requires-Dist: gaik[vector-store]; extra == "all"
Requires-Dist: gaik[pg-vector-store]; extra == "all"
Requires-Dist: gaik[retriever]; extra == "all"
Requires-Dist: gaik[answer-generator]; extra == "all"
Requires-Dist: gaik[rag-workflow]; extra == "all"
Requires-Dist: gaik[audio-to-structured-data]; extra == "all"
Requires-Dist: gaik[documents-to-structured-data]; extra == "all"
Provides-Extra: audio-to-structured-data
Requires-Dist: gaik[transcriber]; extra == "audio-to-structured-data"
Requires-Dist: gaik[extract]; extra == "audio-to-structured-data"
Provides-Extra: documents-to-structured-data
Requires-Dist: gaik[parser]; extra == "documents-to-structured-data"
Requires-Dist: gaik[extract]; extra == "documents-to-structured-data"
Provides-Extra: parser-cpu
Requires-Dist: PyMuPDF>=1.26.0; extra == "parser-cpu"
Requires-Dist: python-docx>=1.2.0; extra == "parser-cpu"
Provides-Extra: documents-to-structured-data-cpu
Requires-Dist: gaik[parser-cpu]; extra == "documents-to-structured-data-cpu"
Requires-Dist: gaik[extract]; extra == "documents-to-structured-data-cpu"
Provides-Extra: all-cpu
Requires-Dist: gaik[extract]; extra == "all-cpu"
Requires-Dist: gaik[parser-cpu]; extra == "all-cpu"
Requires-Dist: gaik[transcriber]; extra == "all-cpu"
Requires-Dist: gaik[enhance-transcript]; extra == "all-cpu"
Requires-Dist: gaik[text-to-speech]; extra == "all-cpu"
Requires-Dist: gaik[parallel-transcriber]; extra == "all-cpu"
Requires-Dist: gaik[classifier]; extra == "all-cpu"
Requires-Dist: gaik[audio-to-structured-data]; extra == "all-cpu"
Requires-Dist: gaik[documents-to-structured-data-cpu]; extra == "all-cpu"
Requires-Dist: gaik[pg-vector-store]; extra == "all-cpu"
Requires-Dist: gaik[rag-workflow]; extra == "all-cpu"
Provides-Extra: dev
Requires-Dist: ruff>=0.14.1; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=4.0; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Dynamic: license-file

﻿# GAIK – Generative AI Knowledge Management Toolkit

[![PyPI version](https://img.shields.io/pypi/v/gaik.svg)](https://pypi.org/project/gaik/)
![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)

This is a generative AI toolkit of the GAIK project ([gaik.ai](https://gaik.ai)). It provides a complete set of components and guidance for building knowledge-centric GenAI solutions, from strategic directions to deployable implementations.

# Project Documentation

Project documentation is available at:

https://gaik-project.github.io/gaik-toolkit/

**Live Demo:** https://gaik-demo.2.rahtiapp.fi/

# Why the toolkit is needed

**Generative AI has significant potential to increase the productivity of knowledge work** 
- Example experiments: consultants using AI were significantly more productive – they completed 12.2% more tasks on average, and completed tasks 25.1% more quickly (Dell'Acqua, 2023) 
- Example cases from practice: Customer-support agents at a large firm selling business-process software demonstrated a 15% increase in productivity when assisted by generative AI (Brynjolfsson, 2025).

**However, tangible business value from Generative AI implementation projects is still limited**
- “only 26% of companies have advanced beyond the proof-of-concept stage to generate value” Source: BCG’s report (de Bellefonds et al, 2024). 
- “Despite $30–40 billion in enterprise investment into GenAI, 95% of organizations are getting zero return.” Source: MIT report (Challapally et al, 2025).

Adopting Generative AI and creating value from **it is especially challenging for small and medium-sized enterprises (SMEs)**, which lack the technical expertise and capabilities to implement GenAI solutions effectively. The literature review of Oldemeyer et al. (2024) identified the following three most frequent challenges for SMEs in the AI implementation in the industrial sector: knowledge, costs, and the low maturity level in digitalization.

# Overall approach
Companies can deal with GenAI challenges by combining reusable building blocks with clear guidelines.
Instead of designing solutions from scratch, teams assemble existing components and follow proven ways of working. 
This makes it easier to turn ideas into real results, while reducing implementation time, risk, and required resources, and improving overall solution quality.

# Toolkit Focus
The knowledge management perspective for structuring GenAI development and implementation activities.

The toolkit focuses on three core **knowledge processes** in organizations:
| Knowledge process | Description | Illustration |
|-----------|-------------|--------------|
| **Knowledge capture** | Extract needed information from business documents, videos, voice recordings, emails, and meeting recordings | ![Knowledge capture](images/Knowledge_capture_image.jpg) |
| **Knowledge access** | Intelligent access to organizational knowledge (document repositories, databases, wikis, CRMs) | ![Knowledge access](images/Knowledge_access_image.jpg) |
| **Knowledge synthesis** | Automatic generation of business reports, sales proposals, marketing materials, project proposals | ![Knowledge synthesis](images/Knowledge_synthesis_image.jpg) |

The following **generic use cases** are defined as the top priority at the moment:
| Knowledge process | Generic use cases |
|---|---|
| **Knowledge capture** | A. Incident reporting in industry (e.g., for equipment, buildings)<br>B. Creating construction site diaries<br>C. Creation of transcripts and closed captions in various languages for instructional videos and podcasts<br>D. … |
| **Knowledge access** | A. Customer assistant for complex products and services<br>B. Semantic audio and video search for medical instructions<br>C. Learning assistant |
| **Knowledge synthesis** | A. Sales proposal generation<br>B. Report preparation<br>C. … |


---

## Layer-Based Architecture

The GAIK Toolkit is organized into a layer-based architecture that spans from strategic planning to implementation and security:

| Layer | Purpose | Contents |
|-------|---------|----------|
| **Strategy Layer** | Identification and selection of use cases, GenAI adoption readiness assessment and preparation, business value evaluation | Use case selection framework, Value evaluation framework, AI maturity assessment tool, GenAI success canvas  |
| **Requirements Layer** | Requirements capture and specification | Requirement templates, test cases |
| **Business Layer** | Use case definition, workflow and work system analysis and redesign | GenAI product canvas, Workflow templates, Work systems definitions |
| **Implementation Layer** | Solution development either via no-code or code-based approach, solution performance evaluation, integration, and monitoring | Reusable software components and modules for system development, (`gaik` code package), no-code assets, evaluation methods, unit tests, deployment packages, connectors |
| **Security Compliance Layer** | Security policies and compliance frameworks | Security guidelines, compliance checks, audit trails |
| **Guidance Layer** | Guides and automates the process of solution development and implementation for KM (how to select and assemble building blocks) | Process and guide for GenAI solution implementation, Configuration wizard, Glossary |

This architecture ensures that GenAI solutions are built with proper governance, clear requirements, and comprehensive implementation support.

![GAIK Architecture](images/Toolkit_layers.jpg)



## License

This project is licensed under the MIT License – see `LICENSE` for details.
