Metadata-Version: 2.1
Name: onprem
Version: 0.0.2
Summary: A tool for running on-premise large language models on non-public data
Home-page: https://github.com/amaiya/onprem
Author: Arun S. Maiya
Author-email: arun@maiya.net
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: langchain (==0.0.240)
Requires-Dist: chromadb (==0.4.7)
Requires-Dist: PyMuPDF (==1.23.1)
Requires-Dist: unstructured (==0.10.8)
Requires-Dist: extract-msg (==0.45.0)
Requires-Dist: tabulate (==0.9.0)
Requires-Dist: pandoc (==2.3)
Requires-Dist: pypandoc (==1.11)
Requires-Dist: tqdm (==4.66.1)
Requires-Dist: sentence-transformers (==2.2.2)
Requires-Dist: llama-cpp-python (==0.1.69)
Provides-Extra: dev

# OnPrem

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

**OnPrem** is a simple Python package that makes it easier to run large
language models (LLMs) on non-public or sensitive data and on machines
with no internet connectivity (e.g., behind corporate firewalls).
Inspired by the [privateGPT](https://github.com/imartinez/privateGPT)
and [localGPT](https://github.com/PromtEngineer/localGPT) GitHub repos,
OnPrem is intended to make it easier to integrate local LLMs in
practical applications.

## Install

``` sh
pip install onprem
```

For GPU support, see additional instructions below.

## How to use

### Setup

``` python
import os.path
from onprem import LLM

url = 'https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin'

llm = LLM(model_name=os.path.basename(url))
llm.download_model(url, ssl_verify=True ) # set to False if corporate firewall gives you problems
```

    There is already a file Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin in /home/amaiya/onprem_data. Do you want to still download it? (Y/n) Y
    [██████████████████████████████████████████████████]

### Send Prompts to the LLM

``` python
prompt = """Extract the names of people in the supplied sentences. Here is an example:
Sentence: James Gandolfini and Paul Newman were great actors.
People:
James Gandolfini, Paul Newman
Sentence:
I like Cillian Murphy's acting. Florence Pugh is great, too.
People:"""

saved_output = llm.prompt(prompt)
```


    Cillian Murphy, Florence Pugh

### How to Speed Up Inference Using a GPU

The above example employed the use of a CPU.  
If you have a GPU (even an older one with less VRAM), you can speed up
responses.

#### Step 1: Install `llama-cpp-python` with CUDABLAS support

``` shell
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python==0.1.69 --no-cache-dir
```

It is important to use the specific version shown above due to library
incompatibilities.

#### Step 2: Use the `n_gpu_layers` argument with [`LLM`](https://amaiya.github.io/onprem/core.html#llm)

llm = LLM(model_name=os.path.basename(url), n_gpu_layers=128)

With the steps above, calls to methods like `llm.prompt` will offload
computation to your GPU and speed up responses from the LLM.
