Metadata-Version: 2.4
Name: promptcache-ai
Version: 0.2.0
Summary: Semantic similarity cache for LLM responses (Redis backend, TTL, cost tracking).
Author-email: Tase Nikol <anikolaou.ph@gmail.com>
License: MIT
Project-URL: Repository, https://github.com/tase-nikol/promptcache
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: redis>=5.0.0
Requires-Dist: numpy>=1.24
Requires-Dist: pydantic>=2.6
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: ruff>=0.6.0; extra == "dev"
Requires-Dist: mypy>=1.8; extra == "dev"
Requires-Dist: types-redis>=4.6.0.20241004; extra == "dev"
Provides-Extra: bench
Requires-Dist: sentence-transformers>=2.6.0; extra == "bench"
Requires-Dist: tiktoken>=0.7.0; extra == "bench"
Requires-Dist: matplotlib>=3.8.0; extra == "bench"
Dynamic: license-file

PromptCache
===========

> Reduce your LLM API costs by 30--70% with semantic caching.

PromptCache reuses LLM responses for **semantically similar prompts**, not just exact string matches.

If two users ask:

-   "Explain Redis in simple terms"

-   "Can you explain Redis simply?"

You shouldn't pay twice.

PromptCache makes sure you don't.

* * * * * 
![License](https://img.shields.io/badge/license-MIT-green)

* * * * *

The Problem
--------------

If you're using OpenAI or any LLM API in production, you're likely paying repeatedly for:

-   The same question phrased differently

-   Similar support requests across users

-   Slight variations in prompts

-   Background job retries

-   RAG pipelines returning near-identical queries

Traditional caching only works for **exact matches**.

LLMs need **semantic caching**.

* * * * *

What PromptCache Does
-----------------------

1.  Embeds your prompt into a vector

2.  Searches Redis for similar past prompts

3.  If similarity ≥ threshold → returns cached response

4.  Otherwise → calls the LLM and stores the result

```sql
User Prompt
     ↓
Embed → Redis Vector Search
     ↓
Hit? → Return cached answer
Miss? → Call LLM → Store result
```

* * * * *

10-Second Example
--------------------

```python
from promptcache import SemanticCache
from promptcache.backends.redis_vector import RedisVectorBackend
from promptcache.embedders.openai import OpenAIEmbedder
from promptcache.types import CacheMeta

embedder = OpenAIEmbedder(model="text-embedding-3-small")

backend = RedisVectorBackend(
    url="redis://localhost:6379/0",
    dim=embedder.dim,
)

cache = SemanticCache(
    backend=backend,
    embedder=embedder,
    namespace="support-bot",
    threshold=0.92,
)

meta = CacheMeta(
    model="gpt-4.1-mini",
    system_prompt="You are a helpful support assistant.",
)

result = cache.get_or_set(
    prompt="How do I reset my password?",
    llm_call=my_llm_call,
    extract_text=lambda r: r.output_text,
    meta=meta,
)

print(result.cache_hit)  # True or False`
```
That's it.

* * * * *

Example Impact
-----------------

In a SaaS support assistant:

-   62% cache hit rate

-   48% reduction in token usage

-   44% reduction in API spend

Your mileage depends on workload --- but high-volume, repetitive systems benefit the most.

* * * * *

Production-Ready Design
--------------------------

PromptCache isolates cache entries by:

-   `namespace`

-   `model`

-   `system_prompt`

-   `tools_schema`

-   `embedder`

This prevents cross-context contamination.

Additional features:

-   ✅ Redis HNSW vector search (cosine similarity)

-   ✅ TTL support

-   ✅ Hit-rate statistics

-   ✅ Optional cost tracking

-   ✅ In-memory backend (for testing)

-   ✅ Framework-agnostic (no LangChain dependency)

* * * * *

Installation
---------------

```bash
pip install promptcache-ai
```

Optional OpenAI embedder:
```bash
pip install promptcache-ai[openai]
```

* * * * *

Redis Setup
--------------

PromptCache requires **Redis Stack** (RediSearch with vector support).

Run locally:

```bash
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack:latest
```
Verify:

```bash
redis-cli MODULE LIST
```

You should see:

```sql
search
```

* * * * *

Stats
--------

Measure impact:

```python
print(cache.stats())
```

Example:

```json
{
    "hits": 1240,
    "misses": 860,
    "total": 2100,
    "hit_rate_percent": 59.05
}
```

* * * * *

When It Helps Most
---------------------

-   Customer support bots

-   Internal copilots

-   FAQ systems

-   Knowledge assistants

-   Deterministic / low-temperature tasks

-   High-volume similar prompts

* * * * *

When It May Not Help
-----------------------

-   Highly personalized prompts

-   Creative high-temperature tasks

-   Frequently changing context

* * * * *

Testing
----------

Run unit tests:

```python
pytest
```

Run Redis integration tests:

```bash
export REDIS_URL="redis://localhost:6379/0"
pytest
```
