Metadata-Version: 2.4
Name: mlx-webchat
Version: 0.1.3
Summary: Terminal web-enabled chat client for mlx_lm.server
Author: mstyslavity
License-Expression: MIT
Keywords: mlx,mlx-lm,llm,web-search,terminal,chat,llm tools,large language models,local inference,apple
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Utilities
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mlx-lm>=0.31.1
Dynamic: license-file

# MLX Web Chat

### A wrapper for mlx_lm chat that directly prompts models to use internet in chat. 

##Facilitating the minimalist approach to use of MLX framework

A terminal chat client that sits on top of `mlx_lm.server` with a sole purpose of automation of web search when using mlx_lm for inference in terminal. It is basically an `mlx_lm.chat`, but with two tools added:

- `search_web`
- `open_url`

Highly customizable with startup options and commands inside the run. Designed for local MLX models that support tool calling, with the default being `mlx-community/GLM-4.7-Flash-8bit`.

## Install

### From a local checkout:

```bash
python3 -m pip install .
```

### For development:

```bash
python3 -m pip install -e .
```

Once installed, the console command is:

```bash
mlx-webchat
```

This package depends only on `mlx-lm>=0.31.1` beyond the Python standard library.

## Why does `mlx-webchat` exist

This was a local modification that provided very satisfactory results with defaulting the terminal-only use of web search by models without any mediation app for that. Adapted to particular release, i do not know whether the patch would remain after the next update of mlx-lm package (as of 24.03.26 i still do not know!). This package is meant to survive `mlx_lm` upgrades cleanly. A PyPI package can safely install its own console script on top of `mlx-lm`, but it cannot reliably inject a new `mlx_lm webchat` subcommand unless `mlx_lm` itself adds a public plugin mechanism. Tool use belongs at the chat orchestration layer, not the low-level token generator:

- `mlx_lm.generate` is just generation.
- `mlx_lm.chat` is closer, but the stock REPL does not execute tool calls.
- `mlx_lm.server` already accepts `tools` and parses tool-call output, so the cleanest path is a terminal client on top of the server.

# Getting Started

## Before You Run

Set any startup options you want with command-line arguments:

- `--model` (default=`mlx-community/GLM-4.7-Flash-8bit`)
- `--host`
- `--port`
- `--system-prompt` 
- `--max-tokens` (default=4096)
- `--temperature` (default=0.2)
- `--top-p`
- `--max-tool-rounds` (Maximum recursive tool-use rounds per user turn, default is 100.)
- `--chat-template-kwargs` (e.g. \'{"enable_thinking": true}\', \'{"reasoning_effort": high}\')
- `--prompt` (for One-Shot Mode, see below)
- `--no-web` (Disable tools and use the client as a plain terminal chat.)
- `--always-web` (Fetch fresh web context automatically on every user turn before the model answers. Otherwise, the model would automatically decide whether it is needed. Test uses show that automatic decisions often blunder, so it is recommended to have it passed.)
- `--always-web-results` (A number of search results to fetch when --always-web is enabled, default is 50.)
- `--always-web-open-results` (A number of search results to open automatically when --always-web is enabled, default is 20.)
- `--always-web-max-chars` (Number of visible characters to keep per each of --always-web-open-results entries, default is 4500)
- `--no-auto-server`
- `--server-start-timeout`
- `--server-log-file`
- `--server-extra-args` (passes arguments to the mlx_lm server process, like `--trust-remote-code` or `--log-level DEBUG`)
- `--quiet-tools`

Note on server-extra vs. chat-template args. Unlike server-extra, `--chat-template-kwargs` passes arguments to the chat template function inside a single chat request, while `--server-extra-args` is parsed by mlx-webchat, shell-split as a raw CLI string and is appended to `mlx_lm server`. `--chat-template-kwargs` are parsed as JSON, inserted into the HTTP request, read from it, and merged into apply_chat_template in server.py. In plain words, one configures the server, the other – prompt formatting for particular chat request. Server args can include chat args related to MLX server's, and then Chat args would override the Server per request. 

## Run in Interactive Chat Mode:

```zsh
mlx-webchat
```

## Run in one-shot mode: 

```zsh
mlx-webchat \
  --prompt "What is the latest mlx-lm release? Use web search if needed."
```

One-shot mode is best for single tasks, with results being closer to those which you get for `mlx_lm generate` with internet search use (i failed to patch it to mlx_lm generate directly). Interactive mode is better when you want conversation memory across turns with full and adequate follow-ups and context handled properly.

## REPL commands (use them straightforwardly in chat for >>messages)

- `/quit`
- `/reset`
- `/web on`
- `/web off`
- `/alwaysweb on`
- `/alwaysweb off`
- `/max-tokens N`
- `/multiline`
- `/file PATH`
- `/help`

Tu use a different port or model:

```zsh
mlx-webchat \
  --model mlx-community/GLM-4.7-Flash-8bit \
  --port 8090
```
## Notes

- By default the client auto-starts `python -m mlx_lm server` if it is not already running.
- Server logs are written to `~/.cache/mlx_webchat/server.log`.
- `--always-web` fetches fresh search results and opened pages for every user turn before the model answers.
- DuckDuckGo HTML search is best-effort and may need small maintenance updates if their markup changes.
