Metadata-Version: 2.1
Name: ragnosis
Version: 0.1.5
Summary: LLM-retrieval based knowledge grounding
Author-email: gkreder <gk@reder.io>
License: MIT License
        
        Copyright (c) 2024 Gabe Reder
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/gkreder/ragnosis
Project-URL: Source Code, https://github.com/gkreder/ragnosis
Project-URL: Bug Tracker, https://github.com/gkreder/ragnosis/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: <3.13,>=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-dotenv==1.0.1
Requires-Dist: openai<2,>=1.48.0
Requires-Dist: langchain<0.4,>=0.3.0
Requires-Dist: streamlit==1.37
Requires-Dist: beautifulsoup4<5,>=4.12.3
Requires-Dist: rdflib==7.0.0
Requires-Dist: langchain-community<0.4,>=0.3.0
Requires-Dist: langchain-ollama<0.3,>=0.2.0
Requires-Dist: fastembed<0.4,>=0.3.6
Requires-Dist: faiss-cpu<2,>=1.8.0.post1
Requires-Dist: langchain-openai<0.3,>=0.2.0
Requires-Dist: pymupdf<2,>=1.24.10
Requires-Dist: sentence-transformers<4,>=3.1.1
Requires-Dist: langchain-huggingface<0.2,>=0.1.0
Requires-Dist: langgraph<0.3,>=0.2.34
Requires-Dist: markdown<4,>=3.7
Requires-Dist: pdfkit<2,>=1.0.0

# LLM-retrieval based knowledge grounding
`ragnosis` contains tools for extracting hypotheses from scientific paper PDFs, extracting entities according to a user model, and grounding entities to ontology terms.


# Installation

This project relies on [langchain-rdf](https://github.com/vemonet/langchain-rdf) which should be installed separately.

To install langchain-rdf run:

```bash
pip install git+https://github.com/vemonet/langchain-rdf.git
```

`ragnosis` can be installed using pip:

```bash
pip install ragnosis
```

# Usage

## Hypothesis extraction

Hypotheses can be extracted from PDF files by running:

```bash
ragnosis extract_hypothesis path/to/paper.pdf [--model MODEL] [--temperature TEMP] [--out_file OUTPUT.txt]
```

## Creating ontology indices

Before grounding entities, vector store indices must be created from your ontology files. One or more OWL files can be provided to create a single index. `force_create` will overwrite an existing index. The index will be saved in the index_directory with the name `merged_index` unless `index_name` is specified:

```bash
ragnosis create_index index_directory path/to/ontology1.owl path/to/ontology2.owl [--force_create] [--index_name NAME]
```

## Hypothesis grounding

To ground entities in an input text to ontology terms:

```bash
ragnosis ground_hypothesis "your hypothesis text" path/to/yaml_map.yaml [--model MODEL] [--temperature TEMP] [--out_md OUTPUT.md]
```

The YAML file should map entity extraction categories to pre-built vector store indices, for example:

```yaml
bio_components: path/to/go_index
genes_proteins: path/to/protein_index
taxa: path/to/taxonomy_index
small_molecules: path/to/chebi_index
```

where the `path/to/go_index` refers to pre-built vector store files `path/to/go_index.faiss` and `path/to/go_index.pkl`. A sample YAML file can be found in the `ragnosis` repository.


## LLM Model Selection

For all commands that accept a `--model` parameter, you can specify:
- OpenAI models with prefix `openai/` (e.g., `openai/gpt-4o`)
- Ollama models with prefix `ollama/` (e.g., `ollama/llama3`)

The default model is `openai/gpt-4o`. When using OpenAI models, make sure to set your `OPENAI_API_KEY` environment variable before running the commands. For Ollama, make sure to have ollama installed and running.

## Output Files

Most commands support saving output in markdown format using the `--out_md` parameter. For hypothesis extraction, use `--out_file` to save the extracted hypothesis as plain text. If the out file parameter is not provided, no output file will be saved. The output will be printed to the console in all cases.

