Metadata-Version: 2.4
Name: open-coscientist
Version: 0.2.0
Summary: Open LangGraph-based framework for multi-agent research hypothesis generation, adapted from Google Research's AI Co-Scientist.
Author-email: Brandon Rose <brandon@jataware.com>
License: MIT License + Commons Clause
        
        Copyright (c) 2026 to present, Jataware Corp
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
        "Commons Clause" License Condition v1.0
        
        The Software is provided to you by the Licensor under the License, as defined below, subject to the following condition.
        
        Without limiting other conditions in the License, the grant of rights under the License will not include, and the License does not grant to you, the right to Sell the Software.
        
        For purposes of the foregoing, "Sell" means practicing any or all of the rights granted to you under the License to provide to third parties, for a fee or other consideration (including without limitation fees for hosting or consulting/ support services related to the Software), a product or service whose value derives, entirely or substantially, from the functionality of the Software. Any license notice or attribution required by the License must also include this Commons Clause License Condition notice.
        
        Software: Open Coscientist
        License: MIT License
        Licensor: Jataware Corp
Project-URL: Homepage, https://github.com/jataware/open-coscientist
Project-URL: Documentation, https://github.com/jataware/open-coscientist#readme
Project-URL: Repository, https://github.com/jataware/open-coscientist
Keywords: ai,research,hypothesis,langgraph,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langgraph~=1.0.6
Requires-Dist: langchain-core~=1.2.7
Requires-Dist: langchain-mcp-adapters>=0.2.1
Requires-Dist: langsmith~=0.6.2
Requires-Dist: litellm~=1.80.16
Requires-Dist: typing-extensions~=4.15.0
Requires-Dist: rich~=14.2.0
Requires-Dist: jsonschema~=4.26.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest~=9.0.2; extra == "dev"
Requires-Dist: pytest-asyncio~=1.3.0; extra == "dev"
Requires-Dist: black~=25.12.0; extra == "dev"
Requires-Dist: ruff~=0.14.11; extra == "dev"
Requires-Dist: mypy~=1.19.1; extra == "dev"
Requires-Dist: python-dotenv~=1.2.1; extra == "dev"
Dynamic: license-file

# Open Coscientist

**AI-powered research hypothesis generation using LangGraph**

Open Coscientist is an open **adaptation based on Google Research's [AI Co-Scientist](https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/)** research paper. This project provides an implementation that generates, reviews, ranks, and evolves research hypotheses using the multi-agent architecture described. It orchestrates 8-10 specialized AI agents through a LangGraph workflow and aims to produce novel hypotheses grounded in scientific literature.

## Demo

<p align="center">
  <a href="https://youtu.be/LyOvigZ59yE?si=JiIJnXajgLhTb1yj">
    <img src="https://github.com/jataware/open-coscientist/blob/main/assets/Open_Coscientist_Demo.gif?raw=true" alt="Open Coscientist Demo">
  </a>
</p>

<p align="center">
  <em>
    In this demo we use Open Coscientist to generate hypotheses for novel approaches to early detection of Alzheimer's disease.
    Click to watch the full demo on YouTube.
  </em>
</p>

### Standalone operation

The engine works with any LLM and can run without external data sources.

For high-quality hypothesis generation, the system provides an MCP server integration to perform literature-aware reasoning over published research. See [MCP Integration](https://github.com/jataware/open-coscientist/blob/main/docs/mcp-integration.md) for setup and configuration details, and to run the basic reference MCP server.

## Quick Start

### Installation

```bash
pip install open-coscientist
```

Set your API key (any LiteLLM-supported provider):
```bash
export GEMINI_API_KEY="your-key-here"
# or: export ANTHROPIC_API_KEY="your-key-here"
# or: export OPENAI_API_KEY="your-key-here"
```

For development, see [CONTRIBUTING.md](https://github.com/jataware/open-coscientist/blob/main/CONTRIBUTING.md).

> **Note**: for the any literature review to run, you must provide an MCP server with literature review tools/capabilities. You can use the provided reference implementation [MCP Server](https://github.com/jataware/open-coscientist/tree/main/mcp_server). Otherwise, no published research will be used.

**Model Support**: Uses [LiteLLM](https://docs.litellm.ai/docs/providers) for 100+ LLM providers (OpenAI, Anthropic, Google, Azure, AWS Bedrock, Cohere, etc.). May need to tweak some constants.py token usage and other params, such as initial hypotheses count, in order to work with less powerful models.

### Basic Usage

```python
import asyncio
from open_coscientist import HypothesisGenerator

async def main():
    generator = HypothesisGenerator(
        model_name="gemini/gemini-2.5-flash",  # default model if not provided
        max_iterations=1,
        initial_hypotheses_count=5,
        evolution_max_count=3
    )

    async for node_name, state in generator.generate_hypotheses(
        research_goal="Your research question",
        stream=True
    ):
        print(f"Completed: {node_name}")
        if node_name == "generate":
            print(f"Generated {len(state['hypotheses'])} hypotheses")

if __name__ == "__main__":
    asyncio.run(main())
```

See [`examples/run.py`](https://github.com/jataware/open-coscientist/blob/main/examples/run.py) for a full example cli script with a built-in Console Reporter. **Remember**, you must run the literature review MCP server for any literature review to be included in the hypothesis generation.

## Features

- **Multi-agent workflow**: Supervisor, Generator, Reviewer, Ranker, Tournament Judge, Meta-Reviewer, Evolution, Proximity Deduplication
- **Rich hypothesis output**: Each hypothesis includes `text`, `explanation` (layman summary), `literature_grounding` with structured `[C*]` citations, and `experiment` (suggested validation design)
- **Literature review integration**: Optional MCP server provides access to real published research; structured citations resolve to full source metadata
- **Domain-agnostic customization**: YAML-based configuration to bring your own MCP servers, literature sources, and domain-specific prompt guidance — no code changes needed (see [Domain Customization](https://github.com/jataware/open-coscientist/blob/main/docs/domain-customization.md))
- **Real-time streaming**: Stream results as they're generated
- **Intelligent caching**: Faster development iteration with LLM response caching
- **Elo-based tournament**: Pairwise hypothesis comparison with Elo ratings
- **Iterative refinement**: Evolves top hypotheses while preserving diversity
- **Post-generation enrichments**: Attach domain-specific data (e.g., related CVEs, knowledge graph statements) to each hypothesis via configurable tool calls

The workflow automatically detects MCP availability and adjusts accordingly.
Functional reference MCP server included in `mcp_server/` directory.

## Documentation

- **[Architecture](https://github.com/jataware/open-coscientist/blob/main/docs/architecture.md)** - Workflow diagram, node descriptions, state management
- **[MCP Integration](https://github.com/jataware/open-coscientist/blob/main/docs/mcp-integration.md)** - Literature review setup and configuration
- **[Generation Modes](https://github.com/jataware/open-coscientist/blob/main/docs/generation-modes.md)** - Three generate node modes explained, and parameters to enable them
- **[Configuration](https://github.com/jataware/open-coscientist/blob/main/docs/configuration.md)** - All parameters, caching, performance tuning
- **[Domain Customization](https://github.com/jataware/open-coscientist/blob/main/docs/domain-customization.md)** - Adapting to new domains (cybersecurity, bioinformatics, etc.) via YAML config
- **[Literature Review Tools Configuration](https://github.com/jataware/open-coscientist/blob/main/docs/literature_review_tools_configuration.md)** - YAML schema reference for custom MCP servers and multi-source literature review
- **[Logging](https://github.com/jataware/open-coscientist/blob/main/docs/logging.md)** - File logging, rotating logs, log levels
- **[Development](https://github.com/jataware/open-coscientist/blob/main/docs/development.md)** - Contributing, node structure, testing

### Node Descriptions

| Node | Purpose | Key Operations |
|------|---------|----------------|
| **Supervisor** | Research planning | Analyzes research goal, identifies key areas, creates workflow strategy |
| **Literature Review** *(Recommended)* | Academic literature search | Queries databases (PubMed, Google Scholar), retrieves and analyzes real published papers (requires MCP server; without it, uses only LLM's latent knowledge) |
| **Generate** | Hypothesis creation | Generates N initial hypotheses using LLM with high temperature for diversity |
| **Reflection** *(Recommended)* | Literature comparison | Analyzes hypotheses against literature review findings, identifies novel contributions and validates against real research (requires literature review) |
| **Review** | Adaptive evaluation | Reviews hypotheses across 6 criteria using adaptive strategy (comparative batch for ≤5, parallel for >5) |
| **Rank** | Holistic ranking | LLM ranks all hypotheses considering composite scores and review feedback |
| **Tournament** | Pairwise comparison | Runs Elo tournament with random pairwise matchups, updates ratings |
| **Meta-Review** | Insight synthesis | Analyzes all reviews to identify common strengths, weaknesses, and strategic directions |
| **Evolve** | Hypothesis refinement | Refines top-k hypotheses with context awareness to preserve diversity |
| **Proximity** | Deduplication | Clusters similar hypotheses and removes high-similarity duplicates |

## Literature Review and Domain Customization

The bundled MCP server provides a PubMed reference implementation. The system is domain-agnostic: a YAML configuration file controls which MCP servers, literature sources, and prompt guidance are used — no code changes needed. Example configurations are included for biomedical (INDRA + PubMed), cybersecurity (arXiv + Google Scholar + NVD), and multi-source academic research.

See [MCP Integration](https://github.com/jataware/open-coscientist/blob/main/docs/mcp-integration.md) to set up literature review, and [Domain Customization](https://github.com/jataware/open-coscientist/blob/main/docs/domain-customization.md) to adapt to your research area.

## Attribution

Open Coscientist is a source-available implementation inspired by Google Research's AI Co-Scientist. While Google's original system is closed-source, this project adapts their multi-agent hypothesis generation architecture from their published research paper.

**Reference:**
- **Blog**: [Accelerating scientific breakthroughs with an AI Co-Scientist](https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/)
- **Paper**: [Towards an AI co-scientist](https://arxiv.org/abs/2502.18864)

This version provides a LangGraph-based implementation. It includes some optimizations for parallel execution, streaming support, and caching.

## Citation

If you use this work, please cite both this implementation and the original Google Research paper:

```bibtex
@article{coscientist2025,
  title={Towards an AI co-scientist},
  author={Google Research Team},
  journal={arXiv preprint arXiv:2502.18864},
  year={2025},
  url={https://arxiv.org/abs/2502.18864}
}
```
