llm-scope

Lightweight observability and diagnostics for local LLM systems.
MVP · local only · sqlite

Per-model latency

Live snapshot
{% if summary %} {% for row in summary %} {% endfor %}
Model Calls Avg latency (s) Tokens Tok/s Halluc. Errors
{{ row.model }} {{ row.count }} {{ "%.3f"|format(row.avg_duration or 0) }} {{ "%.0f"|format(row.avg_total_tokens or 0) }} {{ "%.1f"|format(row.avg_tokens_per_second or 0) }} {{ "%.2f"|format(row.avg_hallucination_score or 0) }} {{ row.error_count }}
{% else %}

No calls logged yet. Decorate a function with @monitor(model="llama3") and hit it.

{% endif %}

Slowest calls

Tail latency
{% if slowest %} {% for row in slowest %} {% endfor %}
Model Latency (s) Tokens Tok/s CPU% Mem% GPU% Halluc.
{{ row.model }} {{ "%.3f"|format(row.duration or 0) }} {{ row.total_tokens or 0 }} {{ "%.1f"|format(row.tokens_per_second or 0) }} {{ "%.0f"|format(row.cpu_percent or 0) }} {{ "%.0f"|format(row.memory_percent or 0) }} {{ "%.0f"|format(row.gpu_percent or 0) }} {{ "%.2f"|format(row.hallucination_score or 0) }}
{% else %}

Once you have a few calls, the slowest ones will show up here.

{% endif %}