Metadata-Version: 2.4
Name: longterm_ai_memory
Version: 0.1.2
Summary: Semantic memory layer for AI applications with vector storage and LLM-based extraction
Home-page: https://github.com/omkar1344patil/longterm_ai_memory
Author: Omkar Patil
Author-email: omkar1344patil@gmail.com
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pinecone>=8.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: python-dotenv>=1.0.0
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Longterm AI Memory

A semantic longeterm AI memory layer for Agentic AI applications that remembers context across conversations.

[![PyPI version](https://badge.fury.io/py/longterm-ai-memory.svg)](https://badge.fury.io/py/longterm-ai-memory)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

---

## What It Does

Most AI assistants forget everything between sessions. This SDK solves that by extracting factual information from conversations, storing it as searchable vector embeddings, and retrieving relevant context when needed.

Your AI can reference things users mentioned weeks ago without them repeating themselves.

**Example:**
```python
from longterm_ai_memory import Memory

memory = Memory(user_id="user-123")

# Store information
memory.add("My name is Sarah and I work at Google")
memory.add("I'm allergic to shellfish")

# Search later
results = memory.search("what are my dietary restrictions?")
# Returns: "User is allergic to shellfish"
```

---

## Installation

```bash
pip install longterm_ai_memory
```

**Requirements:**
- Python 3.8+
- Pinecone account ([free tier available](https://www.pinecone.io/))
- OpenRouter API key (for cloud LLM) OR Ollama installed (for local LLM)

---

## Quick Start

### 1. Set Up Environment

Create a `.env` file:
```bash
PINECONE_API_KEY=your_pinecone_key_here
OPENROUTER_API_KEY=your_openrouter_key  # only if using cloud LLM
```

### 2. Basic Usage

```python
from longterm_ai_memory import Memory
import os
from dotenv import load_dotenv

load_dotenv()

# Initialize with cloud LLM (OpenRouter)
memory = Memory(
    user_id="demo-user",
    pinecone_api_key=os.getenv("PINECONE_API_KEY"),
    openrouter_api_key=os.getenv("OPENROUTER_API_KEY"),
    use_local_llm=False
)

# OR initialize with local LLM (Ollama - free)
memory = Memory(
    user_id="demo-user",
    pinecone_api_key=os.getenv("PINECONE_API_KEY"),
    use_local_llm=True
)

# Add memories
memory.add("My name is Alex and I work at Microsoft")
memory.add("I love playing tennis on weekends")

# Search memories
results = memory.search("what's my name?")
for result in results:
    print(f"{result['metadata']['memory']} (score: {result['score']:.2f})")

# Get all memories
all_memories = memory.get_all()

# Delete all memories
memory.delete_all()
```

---

## How It Works

```
User Message → LLM Extraction → Vector Embedding → Pinecone Storage
                    ↓                    ↓                ↓
          Filters noise,        multilingual-e5    Namespaced
          converts to 3rd person                    by user_id
            
Search Query → Vector Embedding → Semantic Search → Ranked Results
```

**Key Features:**
- **Automatic fact extraction** - LLM converts conversations to factual statements
- **Semantic search** - Find memories by meaning, not just keywords
- **Multi-tenant** - Each user gets isolated storage namespace
- **Noise filtering** - Ignores "okay", "cool", conversational filler
- **Local or cloud LLM** - Use free Ollama or pay-per-use OpenRouter
- **Category auto-assignment** - Memories tagged with relevant categories

---

## API Reference

### Initialization

```python
Memory(
    user_id: str,                    # Required: unique user identifier
    pinecone_api_key: str,           # Required: Pinecone API key
    openrouter_api_key: str = None,  # Required if use_local_llm=False
    use_local_llm: bool = True,      # True=Ollama, False=OpenRouter
    local_llm_model: str = "phi3:mini",  # Ollama model name
    skip_health_check: bool = False  # Skip startup validation
)
```

### add(user_message, categories=None)

Store a new memory from user message.

```python
result = memory.add("My favorite food is sushi")
# Returns: {"id": "...", "memory": "User's favorite food is sushi", "metadata": {...}}

# Noise filtered → returns None
result = memory.add("okay cool")  # None
```

**Parameters:**
- `user_message` (str): User's message to extract memory from
- `categories` (list, optional): Manual category override (not recommended)

**Returns:** Dict with memory data or `None` if no factual content found.

### search(query, top_k=5, filter=None)

Search for relevant memories using semantic similarity.

```python
results = memory.search("what do I like to eat?", top_k=3)

# With category filter
results = memory.search("work info", filter={"categories": "work"})
```

**Parameters:**
- `query` (str): Natural language search query
- `top_k` (int): Maximum results to return (1-20, default: 5)
- `filter` (dict, optional): Metadata filter

**Returns:** List of dicts with `id`, `score`, and `metadata` fields, sorted by relevance.

### get_all(filter=None)

Retrieve all memories for the current user.

```python
all_memories = memory.get_all()

# With filter
work_memories = memory.get_all(filter={"categories": "work"})
```

**Returns:** List of all memories (up to 10,000).

### list(limit=100)

Quick way to get recent memories.

```python
recent = memory.list(limit=20)
```

### delete_all()

Delete all memories for the current user.

```python
success = memory.delete_all()  # Returns True if successful
```

**Warning:** No undo. Use with caution.

---

## Local LLM Setup (Free Alternative)

Install Ollama for free local LLM inference:

```bash
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull phi3:mini

# Start server (runs in background)
ollama serve
```

Supported models:
- `phi3:mini` (recommended, 3.8GB)
- `gemma2:2b` (lightweight, 1.6GB)
- `llama3:8b` (more capable, 4.7GB)

Then initialize with `use_local_llm=True`.

---

## Configuration

Customize behavior by modifying `config.py`:

```python
# Search similarity threshold (0.0-1.0)
SEARCH_CONFIG = {
    "similarity_threshold": 0.7,  # Higher = stricter matching
    "default_top_k": 5
}

# Memory categories (auto-assigned by LLM)
MEMORY_CATEGORIES = [
    "work", "personal", "health", "preferences",
    "family", "education", "hobbies", "misc"
]

# LLM models
LOCAL_LLM_CONFIG = {
    "extraction_model": "phi3:mini",
    "base_url": "http://localhost:11434"
}

OPENROUTER_CONFIG = {
    "extraction_model": "google/gemma-2-9b-it:free"
}
```

---

## Use Cases

### Chatbot with Memory

```python
def chatbot_with_memory(user_message):
    # 1. Search for relevant context
    context = memory.search(user_message, top_k=3)
    
    # 2. Build prompt with context
    context_text = " | ".join([m['metadata']['memory'] for m in context])
    prompt = f"Context: {context_text}\n\nUser: {user_message}"
    
    # 3. Generate response with your LLM
    response = your_llm.generate(prompt)
    
    # 4. Store new information
    memory.add(user_message)
    
    return response
```

### Personal Assistant

```python
# Store user preferences
memory.add("I prefer emails over phone calls")
memory.add("My meetings are usually in the mornings")

# Later, retrieve context for scheduling
prefs = memory.search("communication and scheduling preferences")
```

### Customer Support

```python
# Store customer information
memory.add("Customer is on enterprise plan")
memory.add("Customer's primary use case is data analytics")

# Retrieve during support conversations
customer_info = memory.search("plan and use case")
```

---

## Architecture

**Components:**
- **Memory (core)** - Main orchestrator, handles all operations
- **MemoryExtractor** - LLM-based extraction (Ollama or OpenRouter)
- **VectorStore** - Pinecone backend with namespace isolation
- **EmbeddingProvider** - Pinecone inference API for embeddings
- **MetadataGenerator** - Temporal + categorical metadata

**Storage:**
- Each user gets isolated namespace in Pinecone
- Embeddings: 1024-dimensional vectors (multilingual-e5-large)
- Metadata: categories, timestamps, temporal fields (day, month, quarter, etc.)

---

## Advanced Features

### Time-Based Filtering

```python
from datetime import datetime

# Get memories from this year
all_memories = memory.get_all()
this_year = datetime.now().year
recent = [m for m in all_memories if m['metadata']['year'] == this_year]

# Weekend memories
weekend = [m for m in all_memories if m['metadata']['is_weekend']]
```

### Batch Operations

```python
messages = [
    "I graduated from Stanford in 2020",
    "My favorite food is sushi",
    "I have two cats named Luna and Mochi"
]

for msg in messages:
    result = memory.add(msg)
    if result:
        print(f"Stored: {result['memory']}")
```

### Error Handling

```python
from longterm_ai_memory import ValidationError, MemoryError

try:
    result = memory.add(user_message)
except ValidationError as e:
    print(f"Invalid input: {e}")
except MemoryError as e:
    print(f"Operation failed: {e}")
```

---

## Limitations

- Memory updates require delete + add (no direct update operation)
- Conversation threading not implemented
- Storage backend limited to Pinecone (no ChromaDB/Qdrant support yet)
- Batch operations not optimized for high-volume scenarios
- Maximum 10,000 memories per user via `get_all()` (pagination needed for more)

---

## Roadmap

**Planned improvements:**
- Additional storage backends (ChromaDB, Qdrant, Weaviate)
- Conversation context tracking
- Enhanced category inference
- Memory importance scoring
- Temporal decay for older memories

---

## Contributing

Contributions welcome! Areas needing work:
- Additional storage backends
- Better category inference
- Memory deduplication
- Conversation threading
- Performance optimization

---

## License

MIT License - See [LICENSE](LICENSE) for details.

---

## Links

- **GitHub**: [github.com/omkar1344patil/longterm-ai-memory](https://github.com/omkar1344patil/longterm-ai-memory)
- **PyPI**: [pypi.org/project/longterm-ai-memory](https://pypi.org/project/longterm-ai-memory)
- **Issues**: [Report bugs or request features](https://github.com/omkar1344patil/longterm-ai-memory/issues)

---

**Questions or feedback?** Open an issue on GitHub or reach out on [LinkedIn](https://www.linkedin.com/in/omkarpatil14).
