{% extends "base.html" %} {% block title %}Evalground - Memorizz{% endblock %} {% block content %}
Evaluate agents against the benchmarks in the library (LongMemEval).
No agents available
Create an agent to run benchmarks.
| Run ID | Status | Benchmark | Agent | Dataset | Samples | Accuracy | Created | Finished | Actions |
|---|---|---|---|---|---|---|---|---|---|
{{ run.run_id[:8] }}... |
{{ run_status }} | {{ run.benchmark or 'longmemeval' }} | {{ run.agent_name }} | {{ run.dataset_variant or 'oracle' }} | {{ run.num_samples }} | {% if run.overall_accuracy is not none %} {{ run.overall_accuracy }}% {% else %} - {% endif %} | {{ run.created_at or '-' }} | {{ run.finished_at or '-' }} | View {% if run_status in ['queued', 'running', 'canceling'] %} {% endif %} |
No benchmark runs yet.
{% endif %}{{ selected_agent.agent_id }}
| Dataset | {{ eval_results.metadata.dataset_variant }} |
| Mode | {{ eval_results.metadata.application_mode|default('assistant') }} |
| Timestamp | {{ eval_results.metadata.timestamp }} |
| Output | {{ eval_output_path }} |
Run a benchmark to see results.
{{ run_output }}