Metadata-Version: 2.4
Name: pipelm
Version: 0.1.2
Summary: A lightweight API server and CLI for running LLM models
Home-page: https://github.com/kashyaprparmar/pipelm
Author: Kashyap Parmar
Author-email: kashyaprparmar@gmail.com
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: huggingface-hub>=0.16.0
Requires-Dist: pydantic>=1.10.0
Requires-Dist: fastapi>=0.95.0
Requires-Dist: uvicorn>=0.22.0
Requires-Dist: requests>=2.28.0
Requires-Dist: rich>=12.0.0
Requires-Dist: appdirs>=1.4.4
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: accelerate>=0.20.0
Requires-Dist: sentencepiece>=0.1.99
Requires-Dist: bitsandbytes>=0.40.0
Requires-Dist: pillow>=11.2.1
Requires-Dist: pyfiglet>=1.0.2
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

![Alt text](assets/logo.png "PipeLM")

*A lightweight, CLI modular tool for running Large Language Models (LLMs) & SLMs from Hugging Face.*

PipeLM provides an intuitive CLI interface for interactive chat and a robust FastAPI server to integrate LLMs seamlessly into your applications.

---
![Demo](assets/demo.gif)
---
## Overview

PipeLM simplifies interaction with AI models, allowing you to:

- 📥 **Download and manage** models from Hugging Face.
- 🌐 **Serve models** through a standardized REST API.
- 💬 **Test prompts** via an interactive chat interface & a client
- 📜 **Maintain conversation history**.
- 🔄 **Easily switch models** with single command.

---

## Features

- **Interactive CLI Chat**: Chat directly from your terminal.
- **FastAPI Server**: REST APIs with health monitoring.
- **Efficient Model Management**: Download and manage models easily.
- **Support for different models**: text2text and image2text models are supported
- **Easy Authentication**: Easily access HF models using only HF_TOKEN
- **Client-Server Architecture**: Model deployed on server can be accessed by client
- **Docker Support**: Containerize your models for better isolation.
- **GPU Acceleration**: Automatically utilize available GPUs.
- **Model Quantization**: Reduce memory usage (4-bit and 8-bit).
- **Conversation History**: Persistent chat context.
- **Rich Terminal Interface**: Enhanced CLI with markdown rendering.
- **Robust Error Handling**: Good handling of issues.

---

## Installation

### 💻 From Source (Recommended)
#### Step 1: Create a Python Virtual Environment

```bash
# Clone the source repository
git clone https://github.com/kashyaprparmar/PipeLM
cd PipeLM

# Create virtual environment with Python's venv
python -m venv .venv

# Activate the environment
# On Linux/macOS:
source .venv/bin/activate

# On Windows:
.venv\Scripts\activate
```

#### Step 2: Install uv Within the Virtual Environment

```bash
# Install uv package manager
pip install uv
```
#### Step 3: Install dependencies using uv
```bash
# Install the package with uv (recommended)
uv pip install -e .

# Install just the dependencies from requirements.txt
uv pip install -r requirements.txt
```

### 📦 From PyPI (in development)
```bash
pip install pipelm
```

### 🐳 With Docker
```bash
git clone https://github.com/kashyaprparmar/PipeLM
cd PipeLM

docker build -f docker/Dockerfile -t pipelm .

docker run -p 8080:8080 -v pipelm_data:/root/.pipelm -e HF_TOKEN=your_token -e MODEL_NAME=HuggingFaceTB/SmolLM2-1.7B-Instruct pipelm
```
---

## Usage


### Login with HF_TOKEN
```bash
pipelm login
# Enter your HF_TOKEN to terminal
```

### Download a Huggingface Model
```bash
# Download a model with particular allowed patterns
pipelm download HuggingFaceTB/SmolLM2-1.7B-Instruct --include ['*.json','*.safetensors']
```

### List all Downloaded Models
```bash
pipelm list
```
### Interactive Chat 
```bash
# (Streaming is ENABLED BY DEFAULT)
# Start chatting with model 
pipelm chat HuggingFaceTB/SmolLM2-1.7B-Instruct

# Chatting with local model ( by default loads to port 8080)
pipelm chat /path/to/local/model

# Chatting using quantization
pipelm chat HuggingFaceTB/SmolLM2-1.7B-Instruct --quantize 4bit

# Chatting with model (Streaming DISABLED)
pipelm chat HuggingFaceTB/SmolLM2-1.7B-Instruct --no-stream
```

### 🚀 Start API Server
```bash
pipelm server HuggingFaceTB/SmolLM2-1.7B-Instruct --port 8080

# Using local model
pipelm server /path/to/local/model --port 8080

# With quantization
pipelm server HuggingFaceTB/SmolLM2-1.7B-Instruct --quantize 8bit
```

### Making Client Requests to model on Server
```bash
# Making client request on a text2text model
pipelm client HuggingFaceTB/SmolLM2-1.7B-Instruct 
  --prompt "Summarize the benefits of solar energy." 

# Making client request on a text2text model with some model config
pipelm client HuggingFaceTB/SmolLM2-1.7B-Instruct 
  --prompt "Summarize the benefits of solar energy." 
  --max-tokens 80   
  --top-p 0.95   

# Making client request on a image2text model with some model config
pipelm client HuggingFaceTB/SmolVLM-500M-Instruct   
  --port 8080   
  --model-type image2text   
  --image "/home/kashyap/Desktop/mywork/PipeLM/assets/logo.png"   
  --prompt "Describe what you see."   
  --max-tokens 80   
  --top-p 0.95   
  --no-stream
```


### 🐳 Docker Compose
```bash
export HF_TOKEN=your_token
docker-compose up -d pipelm
```

---

## API Endpoints

### Quick Commands

#### Check Server Health:
```bash
curl http://localhost:8080/health
```
#### Script for Testing a Sample Prompt
```bash
# Checks the server health and runs a sample prompt in the server
python client.py
```
#### Send a Sample Prompt (cURL):
```bash
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Explain the difference between AI and machine learning."}],
    "max_tokens": 200,
    "temperature": 0.7,
    "top_p": 0.9
  }'
```

### GET `/health`
Health status of server and model.

```json
{
  "status": "healthy",
  "model": "HuggingFaceTB/SmolLM2-1.7B-Instruct",
  "uptime": 42.5
}
```

### GET `/docs`
Swagger UI for API documentation.

### POST `/generate`
Generate text from conversation history.

Request:
```json
{
  "messages": [
    {"role": "user", "content": "What is artificial intelligence?"}
  ],
  "max_tokens": 1024,
  "temperature": 0.7,
  "top_p": 0.9
}
```

Response:
```json
{
  "generated_text": "Artificial intelligence (AI) refers to the simulation of human intelligence in machines..."
}
```

---

## Chat Commands

- `/exit` or `/quit` – Exit chat
- `/clear` – Clear conversation history
- `/info` – Display current model information

---

## Environment Variables

- `HF_TOKEN`: Your Hugging Face token (required).
- `MODEL_DIR`: Local model directory.
- `PORT`: Server port (default: 8080).

---

## Project Structure
```
PipeLM/
├── pipelm/                 # Main package
│   ├── __init__.py
│   ├── cli.py
│   ├── server.py
│   ├── client.py
│   ├── downloader.py
│   ├── chat.py
│   └── utils.py
├── docker/                 # Docker setup
│   ├── Dockerfile
│   └── docker-compose.yml
├── setup.py
├── README.md
└── requirements.txt
```
---

## Requirements

- Python 3.8+
- Torch (GPU support recommended)
- 16+ GB RAM (model-dependent)
- CUDA-compatible GPU (recommended)

---

## Troubleshooting

### Model Download Issues

If you encounter issues downloading models:

1. Check your Hugging Face token:
   * Create or verify your token at https://huggingface.co/settings/tokens (**MAKE SURE TO GENERATE A READ/WRITE HF ACCESS TOKEN**)
   * Set it in your environment as `export HF_TOKEN=your_token_here`
   * Or store it in `.env` file as `HF_TOKEN=your_token_here`
2. Network issues:
   * Check your internet connection
   * Verify you have permissions to download the model

### Server Startup Issues

If the server fails to start:

1. Check if another process is using port 8080:
   * Use a different port: `pipelm server HuggingFaceTB/SmolLM2-1.7B-Instruct --port 8081`
2. Verify Python dependencies:
   * Ensure all required packages are installed: `pip install -r requirements.txt`

### Memory Issues

If you encounter memory errors:

1. Choose a smaller model
2. Try quantization: `pipelm chat HuggingFaceTB/SmolLM2-1.7B-Instruct --quantize 4bit`
3. Ensure you have enough RAM and GPU VRAM if using CUDA

## Model Storage

Models are downloaded to:
- Linux/Mac: `~/.pipelm/models/[sanitized_model_name]` OR `.local/share/pipelm/models` OR `/home/kashyap/snap/code/190/.local/share/pipelm/models`
- Windows: `C:\Users\[username]\AppData\Local\pipelm\pipelm\models\[sanitized_model_name]`

When using Docker, models are stored in `/root/.pipelm/models/` in the container, typically mapped to a persistent volume.

---

## Contributing

Contributions are welcome! Submit a Pull Request.

---

## License

MIT License. See `LICENSE` for details.
