Metadata-Version: 2.1
Name: local-deep-research
Version: 0.5.7
Summary: AI-powered research assistant with deep, iterative analysis using LLMs and web searches
Author-Email: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>, HashedViking <6432677+HashedViking@users.noreply.github.com>
License: MIT License
         
         Copyright (c) 2025 LearningCircuit
         
         Permission is hereby granted, free of charge, to any person obtaining a copy
         of this software and associated documentation files (the "Software"), to deal
         in the Software without restriction, including without limitation the rights
         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
         copies of the Software, and to permit persons to whom the Software is
         furnished to do so, subject to the following conditions:
         
         The above copyright notice and this permission notice shall be included in all
         copies or substantial portions of the Software.
         
         THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
         IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
         FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
         SOFTWARE.
         
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Project-URL: Homepage, https://github.com/LearningCircuit/local-deep-research
Project-URL: Bug Tracker, https://github.com/LearningCircuit/local-deep-research/issues
Requires-Python: >=3.10
Requires-Dist: langchain>=0.3.18
Requires-Dist: langchain-community>=0.3.17
Requires-Dist: langchain-core>=0.3.34
Requires-Dist: langchain-ollama>=0.2.3
Requires-Dist: langchain-openai>=0.3.5
Requires-Dist: langchain-anthropic>=0.3.13
Requires-Dist: duckduckgo_search>=7.3.2
Requires-Dist: python-dateutil>=2.9.0
Requires-Dist: typing_extensions>=4.12.2
Requires-Dist: justext
Requires-Dist: playwright
Requires-Dist: beautifulsoup4
Requires-Dist: flask>=3.1.0
Requires-Dist: flask-cors>=3.0.10
Requires-Dist: flask-socketio>=5.1.1
Requires-Dist: sqlalchemy>=1.4.23
Requires-Dist: wikipedia
Requires-Dist: arxiv>=1.4.3
Requires-Dist: pypdf
Requires-Dist: sentence-transformers
Requires-Dist: faiss-cpu
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: toml>=0.10.2
Requires-Dist: platformdirs>=3.0.0
Requires-Dist: dynaconf
Requires-Dist: requests>=2.28.0
Requires-Dist: tiktoken>=0.4.0
Requires-Dist: xmltodict>=0.13.0
Requires-Dist: lxml>=4.9.2
Requires-Dist: pdfplumber>=0.9.0
Requires-Dist: unstructured>=0.10.0
Requires-Dist: google-search-results
Requires-Dist: importlib-resources>=6.5.2
Requires-Dist: setuptools>=78.1.0
Requires-Dist: flask-wtf>=1.2.2
Requires-Dist: optuna>=4.3.0
Requires-Dist: elasticsearch==8.14.0
Requires-Dist: methodtools>=0.4.7
Requires-Dist: loguru>=0.7.3
Requires-Dist: cachetools>=5.5.2
Requires-Dist: matplotlib>=3.10.3
Requires-Dist: pandas>=2.2.3
Requires-Dist: plotly>=6.0.1
Requires-Dist: kaleido==0.2.1
Requires-Dist: aiohttp>=3.9.0
Description-Content-Type: text/markdown

# Local Deep Research

<div align="center">

[![GitHub stars](https://img.shields.io/github/stars/LearningCircuit/local-deep-research?style=for-the-badge)](https://github.com/LearningCircuit/local-deep-research/stargazers)
[![License](https://img.shields.io/badge/License-MIT-green.svg?style=for-the-badge)](LICENSE)
[![Discord](https://img.shields.io/discord/1352043059562680370?style=for-the-badge&logo=discord)](https://discord.gg/ttcqQeFcJ3)
[![Reddit](https://img.shields.io/badge/Reddit-r/LocalDeepResearch-FF4500?style=for-the-badge&logo=reddit)](https://www.reddit.com/r/LocalDeepResearch/)

*AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches*

<div align="center">
  <a href="https://www.youtube.com/watch?v=0ISreg9q0p0">
    <img src="https://img.youtube.com/vi/0ISreg9q0p0/0.jpg" alt="Local Deep Research">
    <br>
    <span>▶️ Watch Video</span>
  </a>
</div>


</div>

## 📋 Overview

Local Deep Research is a powerful AI research assistant that:

1. **Performs iterative, multi-source research** on any topic
2. **Creates comprehensive reports or quick summaries** with proper citations
3. **Runs locally** for complete privacy when using local LLMs
4. **Searches across multiple sources** including academic databases & the web
5. **Processes your own documents** with vector search (RAG)
6. **Optimized for speed** with parallel search processing

Local Deep Research combines the power of large language models with intelligent search strategies to provide well-researched, properly cited answers to complex questions. It can process queries in just seconds with the Quick Summary option, or create detailed reports with proper section organization for more comprehensive analysis.

## ⚡ Quick Start

### Option 1: Docker (Quickstart no MAC/ARM)

```bash
# Step 1: Pull and run SearXNG for optimal search results
docker pull searxng/searxng
docker run -d -p 8080:8080 --name searxng searxng/searxng

# Step 2: Pull and run Local Deep Research (Please build your own docker on ARM)
docker pull localdeepresearch/local-deep-research
docker run -d -p 5000:5000 --network host --name local-deep-research localdeepresearch/local-deep-research

# Start containers - Required after each reboot (can be automated with this flag in run command --restart unless-stopped)
docker start searxng
docker start local-deep-research

```

### Option 2: Docker Compose (Recommended)

LDR uses Docker compose to bundle the web app and all it's dependencies so
you can get up and running quickly.

### Prerequisites

- [Docker](https://docs.docker.com/engine/install/)
- [Docker Compose](https://docs.docker.com/compose/install/)
- `cookiecutter`: Run `pip install --user cookiecutter`

Clone the repository:

```bash
git clone https://github.com/LearningCircuit/local-deep-research.git
cd local-deep-research
```

### Configuring with Docker Compose

In the LDR repository, run the following command
to do generate the compose file:

```bash
cookiecutter cookiecutter-docker/
```

This will prompt you to answer a series of questions. Hit Enter repeatedly
to accept the default values. It should generate a file in the repository called `docker-compose.default.yml`. To run LDR, use the following command:

```bash
docker compose -f docker-compose.default.yml up
```

Then visit `http://127.0.0.1:5000` to start researching!

See [here](https://github.com/LearningCircuit/local-deep-research/wiki/Installation#docker-installation-recommended) for more information about
using Docker.

### Option 3: Python Package (mostly for programmatic access)

```bash
# Install the package
pip install local-deep-research

# Setup SearXNG for best results
docker pull searxng/searxng
docker run -d -p 8080:8080 --name searxng searxng/searxng

# Install Ollama and pull a model
# Download from https://ollama.ai and run:
ollama pull gemma3:12b

# Start the web interface
python -m local_deep_research.web.app
```

For programmatic use in your Python code:

```python
from local_deep_research import quick_summary

results = quick_summary(
    query="advances in fusion energy",
    search_tool="auto",
    iterations=1
)
print(results["summary"])
```

### Additional Installation Options

**Windows**: Docker is the easiest option for Windows users. If preferred, a [Windows Installer](https://github.com/LearningCircuit/local-deep-research/releases/download/v0.1.0/LocalDeepResearch_Setup.exe) is also available.

For more information on installation options, see [the wiki](https://github.com/LearningCircuit/local-deep-research/wiki/Installation).

## 🔍 Research Capabilities

### Two Research Modes

- **Quick Summary**: Fast results (30s-3min) with key information and proper citations
  - Perfect for rapid exploration and answering straightforward questions
  - Supports multiple search engines in parallel for maximum efficiency
  - Tables and structured information can be included when relevant

- **Detailed Report**: Comprehensive analysis with structured sections, table of contents, and in-depth exploration
  - Creates professional-grade reports with proper organization
  - Conducts separate research for each section to ensure comprehensive coverage
  - Integrates information across sections for a cohesive analysis
  - Includes proper citations and reference tracking

### Performance Optimization

- **Use Direct SearXNG**: For maximum speed (bypasses LLM calls needed for engine selection)
- **Adjust Iteration Depth**:
  - 1 iteration: Quick factual questions (~30 seconds)
  - 2-3 iterations: Complex topics requiring deeper exploration (2-3 minutes)
  - 3-5 iterations: Comprehensive research with follow-up investigation (5+ minutes)
- **Choose Appropriate Models**:
  - 12B-30B parameter models offer good balance of quality and speed
  - For complex research, larger models may provide better synthesis
- **For Detailed Reports**: Expect multiple research cycles (one per section) and longer processing times

### Multi-Source Integration

- **Auto-Engine Selection**: The system intelligently selects the most appropriate search engines for your query
- **Academic Sources**: Direct access to Wikipedia, arXiv, PubMed, Semantic Scholar, and more
- **Web Search**: Via SearXNG, Brave Search, SerpAPI (for Google results), and more
- **Local Document Search**: Search through your private document collections with vector embeddings
- **Cross-Engine Filtering**: Smart result ranking across search engines for better information quality

## 🤖 LLM Support

Local Deep Research works with both local and cloud LLMs:

### Local Models (via Ollama)

Local models provide complete privacy and don't require API keys or internet connection for the LLM component (only search queries go online).

```bash
# Install Ollama from https://ollama.ai
ollama pull gemma3:12b  # Recommended model
```

Recommended local models:
- **Gemma 3 (12B)** - Great balance of quality and speed
- **Mistral (7B/8x7B)** - Fast performance on most hardware
- **Llama 3 (8B/70B)** - Good performance across various tasks

### Cloud Models

Cloud models can provide higher quality results for complex research tasks:

API keys can be configured directly through the web interface in the settings panel or via environment variables:

```bash
# Cloud LLM providers - add to your .env file if not using the web UI
LDR_LLM_ANTHROPIC_API_KEY=your-api-key-here      # For Claude models
LDR_LLM_OPENAI_API_KEY=your-openai-key-here      # For GPT models
LDR_LLM_OPENAI_ENDPOINT_API_KEY=your-key-here    # For OpenRouter or similar services

# Set your preferred provider and model
LDR_LLM_PROVIDER=ollama                 # Options: ollama, openai, anthropic, etc.
LDR_LLM_MODEL=gemma3:12b                # Model name to use
```

### Supported Providers

| Provider | Type | Setup | Models |
|----------|------|---------|--------|
| `OLLAMA` | Local | Install from [ollama.ai](https://ollama.ai) | Mistral, Llama, Gemma, etc. |
| `OPENAI` | Cloud | API key required | GPT-3.5, GPT-4, GPT-4o |
| `ANTHROPIC` | Cloud | API key required | Claude 3 Opus, Sonnet, Haiku |
| `OPENAI_ENDPOINT` | Cloud | API key required | Any OpenAI-compatible API |
| `VLLM` | Local | Requires GPU setup | Any supported by vLLM |
| `LMSTUDIO` | Local | Use LM Studio server | Models from LM Studio |
| `LLAMACPP` | Local | Configure model path | GGUF model formats |

You can easily switch between models in the web interface or via environment variables without reinstalling.

## 🌐 Search Engines

The system leverages multiple search engines to find the most relevant information for your queries.

### Core Free Engines (No API Key Required)

- **`auto`**: Intelligently selects the best engines based on your query (recommended)
- **`wikipedia`**: General knowledge, facts, and encyclopedic information
- **`arxiv`**: Scientific papers and academic research
- **`pubmed`**: Medical and biomedical research and journals
- **`semantic_scholar`**: Academic literature across all fields
- **`github`**: Code repositories, documentation, and technical discussions
- **`searxng`**: Comprehensive web search via local SearXNG instance
- **`wayback`**: Historical web content from Internet Archive

### Paid Engines (API Key Required)

For enhanced web search capabilities, you can configure these additional engines through the settings interface or via environment variables:

```bash
# Search API keys (if not using the web UI)
LDR_SEARCH_ENGINE_WEB_SERPAPI_API_KEY=your-key-here               # Google results via SerpAPI
LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_API_KEY=your-key-here         # Google Programmable Search
LDR_SEARCH_ENGINE_WEB_BRAVE_API_KEY=your-key-here              # Brave Search
```

### Search Engine Comparison

| Engine | Specialization | Privacy | Speed | Results Quality |
|--------|----------------|---------|-------|-----------------|
| SearXNG | General web | ★★★★★ | ★★★★★ | ★★★★½ |
| Wikipedia | Facts & concepts | ★★★★★ | ★★★★☆ | ★★★★☆ |
| arXiv | Scientific research | ★★★★★ | ★★★★☆ | ★★★★★ |
| PubMed | Medical research | ★★★★★ | ★★★★☆ | ★★★★★ |
| GitHub | Code & tech | ★★★★★ | ★★★☆☆ | ★★★★☆ |
| SerpAPI | Web (Google) | ★★☆☆☆ | ★★★★☆ | ★★★★★ |
| Brave | Web (privacy-focused) | ★★★★☆ | ★★★★☆ | ★★★★☆ |

## 📚 Local Document Search (RAG)

Local Deep Research includes powerful Retrieval Augmented Generation (RAG) capabilities, allowing you to search and analyze your own private documents using vector embeddings:

### Supported Document Types

- PDF files
- Markdown (.md)
- Plain text (.txt)
- Microsoft Word (.docx, .doc)
- Excel spreadsheets (.xlsx, .xls)
- CSV files
- And more

See [this page](https://github.com/LearningCircuit/local-deep-research/wiki/Configuring-Local-Search) for
configuration instructions.

## 🛠️ Advanced Configuration

### Web Interface

The easiest way to configure Local Deep Research is through the web interface, which provides:
- Complete settings management
- Model selection
- Search engine configuration
- Research parameter adjustment
- Local document collection setup

### Configuration Documentation

For detailed configuration options, see our guides:
- [Environment Variables Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/env_configuration.md)
- [SearXNG Setup Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/SearXNG-Setup.md)
- [Docker Usage Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/docker-usage-readme.md)
- [Docker Compose Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/docker-compose-guide.md)

### Programmatic Access

Use the Python API for integration with other tools or scripts:

```python
from local_deep_research import quick_summary, generate_report

# Quick research with custom parameters
results = quick_summary(
    query="advances in fusion energy",
    search_tool="auto",
    iterations=1,
    questions_per_iteration=2,
    max_results=30,
    temperature=0.7
)
print(results["summary"])
```

For more examples, see the [programmatic access tutorial](https://github.com/LearningCircuit/local-deep-research/blob/main/examples/programmatic_access.ipynb).

## 📊 Examples & Documentation

For more information and examples of what Local Deep Research can produce:

- [Example Outputs](https://github.com/LearningCircuit/local-deep-research/tree/main/examples)
- [Documentation](https://github.com/LearningCircuit/local-deep-research/tree/main/docs)
- [Wiki](https://github.com/LearningCircuit/local-deep-research/wiki)

## 🤝 Community & Support

- [Discord](https://discord.gg/ttcqQeFcJ3): Discuss features, get help, and share research techniques
- [Reddit](https://www.reddit.com/r/LocalDeepResearch/): Announcements, updates, and community showcase
- [GitHub Issues](https://github.com/LearningCircuit/local-deep-research/issues): Bug reports and feature requests

## 🚀 Contributing

We welcome contributions! Whether you're fixing bugs, adding features, or improving documentation, we'd love to have you as part of our community. Please see our [Contributing Guide](CONTRIBUTING.md) for guidelines on how to get started.

## 📄 License & Acknowledgments

This project is licensed under the MIT License.

Built with powerful open-source tools:
- [LangChain](https://github.com/hwchase17/langchain) framework for LLM integration
- [Ollama](https://ollama.ai) for local AI model management
- [SearXNG](https://searxng.org/) for privacy-focused web search
- [FAISS](https://github.com/facebookresearch/faiss) for vector similarity search
- [justext](https://github.com/miso-belica/justext) and [Playwright](https://playwright.dev) for web content analysis

> **Support Free Knowledge:** If you frequently use the search engines in this tool, please consider making a donation to organizations like [Wikipedia](https://donate.wikimedia.org), [arXiv](https://arxiv.org/about/give), or [PubMed](https://www.nlm.nih.gov/pubs/donations/donations.html).
