Metadata-Version: 2.1
Name: local-deep-research
Version: 0.2.2
Summary: AI-powered research assistant with deep, iterative analysis using LLMs and web searches
Author-Email: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>, HashedViking <6432677+HashedViking@users.noreply.github.com>
License: MIT License
         
         Copyright (c) 2025 LearningCircuit
         
         Permission is hereby granted, free of charge, to any person obtaining a copy
         of this software and associated documentation files (the "Software"), to deal
         in the Software without restriction, including without limitation the rights
         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
         copies of the Software, and to permit persons to whom the Software is
         furnished to do so, subject to the following conditions:
         
         The above copyright notice and this permission notice shall be included in all
         copies or substantial portions of the Software.
         
         THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
         IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
         FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
         SOFTWARE.
         
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Project-URL: Homepage, https://github.com/LearningCircuit/local-deep-research
Project-URL: Bug Tracker, https://github.com/LearningCircuit/local-deep-research/issues
Requires-Python: >=3.10
Requires-Dist: langchain>=0.3.18
Requires-Dist: langchain-community>=0.3.17
Requires-Dist: langchain-core>=0.3.34
Requires-Dist: langchain-ollama>=0.2.3
Requires-Dist: langchain-openai>=0.3.5
Requires-Dist: langchain_anthropic>=0.3.7
Requires-Dist: duckduckgo_search>=7.3.2
Requires-Dist: python-dateutil>=2.9.0
Requires-Dist: typing_extensions>=4.12.2
Requires-Dist: justext
Requires-Dist: playwright
Requires-Dist: beautifulsoup4
Requires-Dist: flask>=3.1.0
Requires-Dist: flask-cors>=3.0.10
Requires-Dist: flask-socketio>=5.1.1
Requires-Dist: sqlalchemy>=1.4.23
Requires-Dist: wikipedia
Requires-Dist: arxiv>=1.4.3
Requires-Dist: pypdf
Requires-Dist: sentence-transformers
Requires-Dist: faiss-cpu
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: toml>=0.10.2
Requires-Dist: platformdirs>=3.0.0
Requires-Dist: dynaconf
Requires-Dist: requests>=2.28.0
Requires-Dist: tiktoken>=0.4.0
Requires-Dist: xmltodict>=0.13.0
Requires-Dist: lxml>=4.9.2
Requires-Dist: pdfplumber>=0.9.0
Requires-Dist: unstructured>=0.10.0
Requires-Dist: google-search-results
Requires-Dist: importlib-resources>=6.5.2
Requires-Dist: setuptools>=78.1.0
Requires-Dist: flask-wtf>=1.2.2
Description-Content-Type: text/markdown

# Local Deep Research

## Features

- 🔍 **Advanced Research Capabilities**
  - Automated deep research with intelligent follow-up questions
  - Proper inline citation and source verification
  - Multi-iteration analysis for comprehensive coverage
  - Full webpage content analysis (not just snippets)

- 🤖 **Flexible LLM Support**
  - Local AI processing with Ollama models
  - Cloud LLM support (Claude, GPT)
  - Supports all Langchain models
  - Configurable model selection based on needs

- 📊 **Rich Output Options**
  - Detailed research findings with proper citations
  - Well-structured comprehensive research reports
  - Quick summaries for rapid insights
  - Source tracking and verification

- 🔒 **Privacy-Focused**
  - Runs entirely on your machine when using local models
  - Configurable search settings
  - Transparent data handling

- 🌐 **Enhanced Search Integration**
  - **Auto-selection of search sources**: The "auto" search engine intelligently analyzes your query and selects the most appropriate search engine
  - Multiple search engines including Wikipedia, arXiv, PubMed, Semantic Scholar, and more
  - **Local RAG search for private documents** - search your own documents with vector embeddings
  - Full webpage content retrieval and intelligent filtering

- 🎓 **Academic & Scientific Integration**
  - Direct integration with PubMed, arXiv, Wikipedia, Semantic Scholar
  - Properly formatted citations from academic sources
  - Report structure suitable for literature reviews
  - Cross-disciplinary synthesis of information

| [Reddit](https://www.reddit.com/r/LocalDeepResearch/) | [Discord](https://discord.gg/ttcqQeFcJ3) |

A powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. The system can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities.

<div align="center">
  <a href="https://www.youtube.com/watch?v=0ISreg9q0p0">
    <img src="https://img.youtube.com/vi/0ISreg9q0p0/0.jpg" alt="Local Deep Research">
    <br>
    <span>▶️ Watch Video</span>
  </a>
</div>

**Important for non-academic searches:** For normal web searches you will need SearXNG or an API key to a search provider like brave search or SerpAPI. The free searches are mostly academic search engines and will not help you for most normal searches.

## Windows Installation

 Download the [Windows Installer](https://github.com/LearningCircuit/local-deep-research/releases/download/v0.1.0/LocalDeepResearch_Setup.exe) for easy one-click installation.

 **Requires Ollama (or other model provider configured in .env).**
 Download from https://ollama.ai and then pull a model
 ollama pull gemma3:12b

## Quick Start (not required if installed with windows installer)

```bash
# Install the package
pip install local-deep-research

# Install required browser automation tools
playwright install

# For local models, install Ollama
# Download from https://ollama.ai and then pull a model
ollama pull gemma3:12b
```

Then run:

```bash
# Start the web interface (recommended)
ldr-web # (OR python -m local_deep_research.web.app)

# OR run the command line version
ldr # (OR python -m local_deep_research.main)
```

Access the web interface at `http://127.0.0.1:5000` in your browser.

## Docker Support

Build the image first if you haven't already
```bash
docker build -t local-deep-research .
```

Quick Docker Run

```bash
# Run with default settings (connects to Ollama running on the host)
docker run --network=host \
  -e LDR_LLM__PROVIDER="ollama" \
  -e LDR_LLM__MODEL="mistral" \
  local-deep-research
```

For comprehensive Docker setup information, see:
- [Docker Usage Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/docker-usage-readme.md)
- [Docker Compose Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/docker-compose-guide.md)

## Migrating from Version 0.1.0

If you just upgraded from 0.1.0, and you want to preserve your configuration,
you will need to manually migrate from the old config files to the new settings
database:

```bash
python -m local_deep_research.migrate_db
```

## Programmatic Access

Local Deep Research now provides a simple API for programmatic access to its research capabilities:

```python
import os
# Set environment variables to control the LLM
os.environ["LDR_LLM__MODEL"] = "mistral"     # Specify model name

from local_deep_research import quick_summary, generate_report

# Generate a quick research summary with custom parameters
results = quick_summary(
    query="advances in fusion energy",
    search_tool="auto",          # Auto-select the best search engine
    iterations=1,                # Single research cycle for speed
    questions_per_iteration=2,   # Generate 2 follow-up questions
    max_results=30,              # Consider up to 30 search results
    temperature=0.7              # Control creativity of generation
)
print(results["summary"])
```

These functions provide flexible options for customizing the search parameters, iterations, and output formats. For more examples, see the [programmatic access tutorial](https://github.com/LearningCircuit/local-deep-research/blob/main/examples/programmatic_access.ipynb).

## Configuration System

The package automatically creates and manages configuration files in your user directory:

- **Windows**: `Documents\LearningCircuit\local-deep-research\config\`
- **Linux/Mac**: `~/.config/local_deep_research/config/`

### Default Configuration Files

When you first run the tool, it creates these configuration files:

| File | Purpose |
|------|---------|
| `settings.toml` | General settings for research, web interface, and search |
| `search_engines.toml` | Define and configure search engines |
| `local_collections.toml` | Configure local document collections for RAG |
| `.env` | Environment variables for configuration (recommended for API keys) |

> **Note:** For comprehensive environment variable configuration, see our [Environment Variables Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/env_configuration.md).

## Setting Up AI Models

The system supports multiple LLM providers:

### Local Models (via Ollama)

1. [Install Ollama](https://ollama.ai)
2. Pull a model: `ollama pull gemma3:12b` (recommended model)
3. Ollama runs on port 11434 by default

### Cloud Models

Add API keys to your environment variables (recommended) by creating a `.env` file in your config directory:

```bash
# Set API keys for cloud providers in .env
ANTHROPIC_API_KEY=your-api-key-here      # For Claude models
OPENAI_API_KEY=your-openai-key-here      # For GPT models
OPENAI_ENDPOINT_API_KEY=your-key-here    # For OpenRouter or similar services

# Set your preferred LLM provider and model
LDR_LLM__PROVIDER=ollama                 # Options: ollama, openai, anthropic, etc.
LDR_LLM__MODEL=gemma3:12b                # Model name to use
```

### Supported LLM Providers

The system supports multiple LLM providers:

| Provider | Type | API Key | Setup Details | Models |
|----------|------|---------|---------------|--------|
| `OLLAMA` | Local | No | Install from [ollama.ai](https://ollama.ai) | Mistral, Llama, Gemma, etc. |
| `OPENAI` | Cloud | `OPENAI_API_KEY` | Set in environment | GPT-3.5, GPT-4, GPT-4o |
| `ANTHROPIC` | Cloud | `ANTHROPIC_API_KEY` | Set in environment | Claude 3 Opus, Sonnet, Haiku |
| `OPENAI_ENDPOINT` | Cloud | `OPENAI_ENDPOINT_API_KEY` | Set in environment | Any OpenAI-compatible model |
| `VLLM` | Local | No | Requires GPU setup | Any supported by vLLM |
| `LMSTUDIO` | Local | No | Use LM Studio server | Models from LM Studio |
| `LLAMACPP` | Local | No | Configure model path | GGUF model formats |

The `OPENAI_ENDPOINT` provider can access any service with an OpenAI-compatible API, including:
- OpenRouter (access to hundreds of models)
- Azure OpenAI
- Together.ai
- Groq
- Anyscale
- Self-hosted LLM servers with OpenAI compatibility

## Setting Up Search Engines

Some search engines require API keys. Add them to your environment variables by creating a `.env` file in your config directory:

```bash
# Search engine API keys (add to .env file)
SERP_API_KEY=your-serpapi-key-here        # For Google results via SerpAPI
GOOGLE_PSE_API_KEY=your-google-key-here   # For Google Programmable Search
GOOGLE_PSE_ENGINE_ID=your-pse-id-here     # For Google Programmable Search
BRAVE_API_KEY=your-brave-search-key-here  # For Brave Search
GUARDIAN_API_KEY=your-guardian-key-here   # For The Guardian

# Set your preferred search tool
LDR_SEARCH__TOOL=auto                     # Default: intelligently selects best engine
```

> **Tip:** To override other settings via environment variables (e.g., to change the web port), use: **LDR_WEB__PORT=8080**

### Available Search Engines

| Engine | Purpose | API Key Required? | Rate Limit |
|--------|---------|-------------------|------------|
| `auto` | Intelligently selects the best engine | No | Based on selected engine |
| `wikipedia` | General knowledge and facts | No | No strict limit |
| `arxiv` | Scientific papers and research | No | No strict limit |
| `pubmed` | Medical and biomedical research | No | No strict limit |
| `semantic_scholar` | Academic literature across all fields | No | 100/5min |
| `github` | Code repositories and documentation | No | 60/hour (unauthenticated) |
| `brave` | Web search (privacy-focused) | Yes | Based on plan |
| `serpapi` | Google search results | Yes | Based on plan |
| `google_pse` | Custom Google search | Yes | 100/day free tier |
| `wayback` | Historical web content | No | No strict limit |
| `searxng` | Local web search engine | No (requires local server) | No limit |
| Any collection name | Search your local documents | No | No limit |

> **Note:** For detailed SearXNG setup, see our [SearXNG Setup Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/SearXNG-Setup.md).

## Local Document Search (RAG)

The system can search through your local documents using vector embeddings.

### Setting Up Document Collections

1. Define collections in `local_collections.toml`. Default collections include:

```toml
[project_docs]
name = "Project Documents"
description = "Project documentation and specifications"
paths = ["@format ${DOCS_DIR}/project_documents"]
enabled = true
embedding_model = "all-MiniLM-L6-v2"
embedding_device = "cpu"
embedding_model_type = "sentence_transformers"
max_results = 20
max_filtered_results = 5
chunk_size = 1000
chunk_overlap = 200
cache_dir = "__CACHE_DIR__/local_search/project_docs"
```

2. Create your document directories:
   - The `${DOCS_DIR}` variable points to a default location in your Documents folder
   - Documents are automatically indexed when the search is first used

### Using Local Search

You can use local document search in several ways:

1. **Auto-selection**: Set `tool = "auto"` in `settings.toml` [search] section
2. **Explicit collection**: Set `tool = "project_docs"` to search only that collection
3. **All collections**: Set `tool = "local_all"` to search across all collections
4. **Query syntax**: Type `collection:project_docs your query` to target a specific collection


## Advanced Configuration

### Research Parameters

Edit `settings.toml` to customize research parameters or use environment variables:

```toml
[search]
# Search tool to use (auto, wikipedia, arxiv, etc.)
tool = "auto"

# Number of research cycles
iterations = 2

# Questions generated per cycle
questions_per_iteration = 2

# Results per search query
max_results = 50

# Results after relevance filtering
max_filtered_results = 5
```

Using environment variables:
```bash
LDR_SEARCH__TOOL=auto
LDR_SEARCH__ITERATIONS=3
LDR_SEARCH__QUESTIONS_PER_ITERATION=2
```

## Web Interface

The web interface offers several features:

- **Dashboard**: Start and manage research queries
- **Real-time Updates**: Track research progress with improved logging
- **Research History**: Access past queries
- **PDF Export**: Download reports
- **Research Management**: Terminate processes or delete records
- **Enhanced Settings Panel**: New unified settings UI with improved organization

## Command Line Interface

The CLI version allows you to:

1. Choose between a quick summary or detailed report
2. Enter your research query
3. View results directly in the terminal
4. Save reports automatically to the configured output directory

## Development Environment

This project now uses PDM for dependency management. To set up a development environment:

```bash
# Install PDM if you don't have it
pip install pdm

# Install dependencies
pdm install --no-self

# Activate the environment
pdm venv activate
```

You can run the application directly using Python module syntax:

```bash
# Run the web interface
python -m local_deep_research.web.app

# Run the CLI version
python -m local_deep_research.main
```

For more information, see the [development documentation](docs/developing.md).

## Unified Database

The application now uses a single unified database (`ldr.db`) for all settings and history, making configuration management simpler and more reliable.

### Migration from v0.1.x

If you have existing data in legacy databases from v0.1.x, the application will automatically migrate your data when you first run v0.2.0.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

### Development Setup with PDM

This project uses PDM for dependency management. Here's how to set up your development environment:

```bash
# Install PDM if you don't have it
pip install pdm

# Clone the repository
git clone https://github.com/LearningCircuit/local-deep-research.git
cd local-deep-research

# Install dependencies including dev dependencies
pdm install --no-self

# Set up pre-commit hooks
pdm run pre-commit install
pdm run pre-commit install-hooks

# Activate the virtual environment
pdm venv activate
```

#### Common PDM Commands

```bash
# Run linting checks
pdm run flake8
pdm run black .

# Run tests (when available)
pdm run pytest

# Add a new dependency
pdm add package-name

# Add a development dependency
pdm add -dG dev package-name
```

### Contributing Process

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Make your changes
4. Run linting checks to ensure code quality
5. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
6. Push to the branch (`git push origin feature/AmazingFeature`)
7. **Important:** Open a Pull Request against the `dev` branch, not the `main` branch

We prefer all pull requests to be submitted against the `dev` branch for easier testing and integration before releasing to the main branch.

### Getting Help

Join our [Discord server](https://discord.gg/ttcqQeFcJ3) if you're planning to contribute. Let us know about your plans - we're always happy to guide new contributors and discuss feature ideas!

## Community & Support

Join our [Discord server](https://discord.gg/ttcqQeFcJ3) to exchange ideas, discuss usage patterns, and
share research approaches.

Follow our [Subreddit](https://www.reddit.com/r/LocalDeepResearch/) for announcements, updates, and feature highlights.

## License

This project is licensed under the MIT License.

## Acknowledgments

- Built with [Ollama](https://ollama.ai) for local AI processing
- Search powered by multiple sources:
  - [Wikipedia](https://www.wikipedia.org/) for factual knowledge
  - [arXiv](https://arxiv.org/) for scientific papers
  - [PubMed](https://pubmed.ncbi.nlm.nih.gov/) for biomedical literature
  - [Semantic Scholar](https://www.semanticscholar.org/) for academic literature
  - [DuckDuckGo](https://duckduckgo.com) for web search
  - [The Guardian](https://www.theguardian.com/) for journalism
  - [SerpAPI](https://serpapi.com) for Google search results
  - [SearXNG](https://searxng.org/) for local web-search engine
  - [Brave Search](https://search.brave.com/) for privacy-focused web search
- Built on [LangChain](https://github.com/hwchase17/langchain) framework
- Uses [justext](https://github.com/miso-belica/justext), [Playwright](https://playwright.dev), [FAISS](https://github.com/facebookresearch/faiss), and more

> **Support Free Knowledge:** If you frequently use the search engines in this tool, please consider making a donation to these organizations:
> - [Donate to Wikipedia](https://donate.wikimedia.org)
> - [Support arXiv](https://arxiv.org/about/give)
> - [Donate to DuckDuckGo](https://duckduckgo.com/donations)
> - [Support PubMed/NCBI](https://www.nlm.nih.gov/pubs/donations/donations.html)
