Metadata-Version: 2.1
Name: mlx-llm-server
Version: 0.1.2
Summary: server to serve mlx model as an OpenAI compatible API
Home-page: https://github.com/mzbac/mlx-llm
Author: anchen
Author-email: li.anchen.au@gmail.com
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: torch
Requires-Dist: mlx
Requires-Dist: mlx-lm
Requires-Dist: transformers >=4.36.2
Requires-Dist: tqdm
Requires-Dist: requests
Requires-Dist: fastapi
Requires-Dist: uvicorn[standard]
Requires-Dist: sse-starlette

# MLX-LLM

This guide will help you set up the MLX-LLM server to serve the model as an OpenAI compatible API.

## Quick Start

1. Start the server with the following command:

```bash
python -m server --model-path <path-to-your-model>
```
The MLX-LLM server can serve both Hugging Face format models and quantized MLX models. You can find these models at the [MLX Community on Hugging Face](https://huggingface.co/mlx-community).

## Setup Guide
### Miniconda Installation
For Apple Silicon users, install Miniconda natively with these commands:
```
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
```

### Conda Environment Setup
After Miniconda installation, create a dedicated conda environment for MLX-LLM:
```
conda create -n mlx-llm python=3.10
conda activate mlx-llm
```
### Installing Dependencies

With the `mlx-llm` environment activated, install the necessary dependencies using the following command:

```bash
pip install -r requirements.txt
```

## Testing the API with curl

You can test the API using the `curl` command. Here's an example:

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer no-key" \
  -d '{
  "model": "gpt-3.5-turbo",
  "stop":["<|im_end|>"],
  "messages": [
    {
      "role": "user",
      "content": "Write a limerick about python exceptions"
    }
  ]
}'
```
