Metadata-Version: 2.1
Name: sevals
Version: 0.0.3
Summary: A framework for evaluating language models
Author-email: Scholar Platforms <info@usescholar.org>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: accelerate >=0.21.0
Requires-Dist: bullet
Requires-Dist: evaluate
Requires-Dist: datasets >=2.0.0
Requires-Dist: evaluate >=0.4.0
Requires-Dist: jsonlines
Requires-Dist: numexpr
Requires-Dist: peft >=0.2.0
Requires-Dist: print-color
Requires-Dist: pybind11 >=2.6.2
Requires-Dist: pytablewriter
Requires-Dist: rouge-score >=0.0.4
Requires-Dist: sacrebleu >=1.5.0
Requires-Dist: scikit-learn >=0.24.1
Requires-Dist: sqlitedict
Requires-Dist: thefuzz
Requires-Dist: torch >=1.8
Requires-Dist: tqdm-multiprocess
Requires-Dist: transformers >=4.1
Requires-Dist: zstandard
Provides-Extra: all
Requires-Dist: sevals[dev] ; extra == 'all'
Requires-Dist: sevals[testing] ; extra == 'all'
Requires-Dist: sevals[linting] ; extra == 'all'
Requires-Dist: sevals[multilingual] ; extra == 'all'
Requires-Dist: sevals[sentencepiece] ; extra == 'all'
Requires-Dist: sevals[anthropic] ; extra == 'all'
Requires-Dist: sevals[openai] ; extra == 'all'
Requires-Dist: sevals[vllm] ; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic ; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: black ; extra == 'dev'
Requires-Dist: flake8 ; extra == 'dev'
Requires-Dist: pre-commit ; extra == 'dev'
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Provides-Extra: linting
Requires-Dist: flake8 ; extra == 'linting'
Requires-Dist: pylint ; extra == 'linting'
Requires-Dist: mypy ; extra == 'linting'
Requires-Dist: pre-commit ; extra == 'linting'
Provides-Extra: math
Requires-Dist: sympy >=1.12 ; extra == 'math'
Requires-Dist: antlr4-python3-runtime ==4.11 ; extra == 'math'
Provides-Extra: multilingual
Requires-Dist: nagisa >=0.2.7 ; extra == 'multilingual'
Requires-Dist: jieba >=0.42.1 ; extra == 'multilingual'
Requires-Dist: pycountry ; extra == 'multilingual'
Provides-Extra: openai
Requires-Dist: openai >=1.3.5 ; extra == 'openai'
Requires-Dist: tiktoken ; extra == 'openai'
Provides-Extra: sentencepiece
Requires-Dist: sentencepiece >=0.1.98 ; extra == 'sentencepiece'
Requires-Dist: protobuf >=4.22.1 ; extra == 'sentencepiece'
Provides-Extra: testing
Requires-Dist: pytest ; extra == 'testing'
Requires-Dist: pytest-cov ; extra == 'testing'
Requires-Dist: pytest-xdist ; extra == 'testing'
Provides-Extra: vllm
Requires-Dist: vllm ; extra == 'vllm'

# Scholar Evals (sevals)

This is built on [Eleuther AI's LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) but has:
1. A simpler command-line interface
2. A UI to visualize results and view model outputs ([view example](https://usescholar.org/runs/cb249e62-a99f-468d-8eb2-b804fe31579a/results))

<img width="1440" alt="Screenshot 2023-12-20 at 7 48 32 PM" src="https://github.com/scholar-org/scholar-evals/assets/16143968/34763d52-f61e-4944-b941-65ae0cc6d9f1">

<img width="1440" alt="Screenshot 2023-12-20 at 7 49 12 PM" src="https://github.com/scholar-org/scholar-evals/assets/16143968/1423cc83-abc9-4239-90d1-3c2b77ab2f33">

## Installation

```bash
pip install sevals
```

### API Keys

Go to [usescholar.org/api-keys](https://usescholar.org/api-keys) to get an API Key, then enter it into the `sevals` CLI when prompted.

## Usage

```bash
sevals <model> <task> [options]
```

### Examples

Mock/Dummy model:
```bash
sevals dummy gsm8k
```

Local model:
```bash
sevals ./path/to/model gsm8k
```

HuggingFace model:
```bash
sevals mistralai/Mistral-7B-v0.1 gsm8k
```

OpenAI API:
```bash
sevals gpt-3.5-turbo gsm8k
```

### Tasks

Full list of tasks:
```bash
sevals --list_tasks
```

### Documentation

```bash
% sevals --help
usage: sevals [-h] [--model_args MODEL_ARGS] [--gen_kwargs GEN_KWARGS] [--list_tasks [search string]] [--list_projects] [-p PROJECT] [--num_fewshot NUM_FEWSHOT] [--batch_size BATCH_SIZE]
              [-o [dir/file.jsonl] [DIR]] [--include_path INCLUDE_PATH] [--verbose]
              [model] [tasks]

positional arguments:
  model                 Model name from HuggingFace or OpenAI, or a path to a local model that can be loaded using `transformers.AutoConfig.from_pretrained`.
                        E.g.:
                        - HuggingFace Model: mistralai/Mistral-7B-v0.1
                        - OpenAI Model: gpt-3
                        - Local Model: ./path/to/model
  tasks                 To get full list of tasks, use the command sevals --list_tasks

optional arguments:
  -h, --help            show this help message and exit
  --model_args MODEL_ARGS
                        String arguments for model, e.g. 'dtype=float32'
  --gen_kwargs GEN_KWARGS
                        String arguments for model generation on greedy_until tasks, e.g. `temperature=0,top_k=0,top_p=0`
  --list_tasks [search string]
                        List all available tasks, that optionally match a search string, and exit.
  --list_projects       List all projects you have on Scholar, and exit.
  -p PROJECT, --project PROJECT
                        ID of Scholar project to store runs/results in.
  --num_fewshot NUM_FEWSHOT
                        Number of examples in few-shot context
  --batch_size BATCH_SIZE
  -o [dir/file.jsonl] [DIR], --output_path [dir/file.jsonl] [DIR]
                        The path to the output file where the result metrics will be saved. If the path is a directory, the results will be saved in the directory. Else the parent directory will be used.
  --include_path INCLUDE_PATH
                        Additional path to include if there are external tasks to include.
  --verbose             Whether to print verbose/detailed logs.
```
