| Model | Calls | Avg latency (s) | Tokens | Tok/s | Halluc. | Errors |
|---|---|---|---|---|---|---|
| {{ row.model }} | {{ row.count }} | {{ "%.3f"|format(row.avg_duration or 0) }} | {{ "%.0f"|format(row.avg_total_tokens or 0) }} | {{ "%.1f"|format(row.avg_tokens_per_second or 0) }} | {{ "%.2f"|format(row.avg_hallucination_score or 0) }} | {{ row.error_count }} |
No calls logged yet. Decorate a function with @monitor(model="llama3") and hit it.
| Model | Latency (s) | Tokens | Tok/s | CPU% | Mem% | GPU% | Halluc. |
|---|---|---|---|---|---|---|---|
| {{ row.model }} | {{ "%.3f"|format(row.duration or 0) }} | {{ row.total_tokens or 0 }} | {{ "%.1f"|format(row.tokens_per_second or 0) }} | {{ "%.0f"|format(row.cpu_percent or 0) }} | {{ "%.0f"|format(row.memory_percent or 0) }} | {{ "%.0f"|format(row.gpu_percent or 0) }} | {{ "%.2f"|format(row.hallucination_score or 0) }} |
Once you have a few calls, the slowest ones will show up here.
{% endif %}