Metadata-Version: 2.1
Name: sevals
Version: 0.0.2.post0
Summary: A framework for evaluating language models
Author-email: Scholar Platforms <info@usescholar.org>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: accelerate>=0.21.0
Requires-Dist: evaluate
Requires-Dist: datasets>=2.0.0
Requires-Dist: evaluate>=0.4.0
Requires-Dist: jsonlines
Requires-Dist: numexpr
Requires-Dist: peft>=0.2.0
Requires-Dist: pybind11>=2.6.2
Requires-Dist: pytablewriter
Requires-Dist: rouge-score>=0.0.4
Requires-Dist: sacrebleu>=1.5.0
Requires-Dist: scikit-learn>=0.24.1
Requires-Dist: sqlitedict
Requires-Dist: torch>=1.8
Requires-Dist: tqdm-multiprocess
Requires-Dist: transformers>=4.1
Requires-Dist: zstandard
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Provides-Extra: linting
Requires-Dist: flake8; extra == "linting"
Requires-Dist: pylint; extra == "linting"
Requires-Dist: mypy; extra == "linting"
Requires-Dist: pre-commit; extra == "linting"
Provides-Extra: testing
Requires-Dist: pytest; extra == "testing"
Requires-Dist: pytest-cov; extra == "testing"
Requires-Dist: pytest-xdist; extra == "testing"
Provides-Extra: multilingual
Requires-Dist: nagisa>=0.2.7; extra == "multilingual"
Requires-Dist: jieba>=0.42.1; extra == "multilingual"
Requires-Dist: pycountry; extra == "multilingual"
Provides-Extra: math
Requires-Dist: sympy>=1.12; extra == "math"
Requires-Dist: antlr4-python3-runtime==4.11; extra == "math"
Provides-Extra: sentencepiece
Requires-Dist: sentencepiece>=0.1.98; extra == "sentencepiece"
Requires-Dist: protobuf>=4.22.1; extra == "sentencepiece"
Provides-Extra: anthropic
Requires-Dist: anthropic; extra == "anthropic"
Provides-Extra: openai
Requires-Dist: openai>=1.3.5; extra == "openai"
Requires-Dist: tiktoken; extra == "openai"
Provides-Extra: vllm
Requires-Dist: vllm; extra == "vllm"
Provides-Extra: all
Requires-Dist: sevals[dev]; extra == "all"
Requires-Dist: sevals[testing]; extra == "all"
Requires-Dist: sevals[linting]; extra == "all"
Requires-Dist: sevals[multilingual]; extra == "all"
Requires-Dist: sevals[sentencepiece]; extra == "all"
Requires-Dist: sevals[anthropic]; extra == "all"
Requires-Dist: sevals[openai]; extra == "all"
Requires-Dist: sevals[vllm]; extra == "all"

# Scholar Eval (sevals)

This is built on [Eleuther AI's LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) but has:
1. A simpler command-line interface
2. A UI to visualize results and view model outputs

<img width="1440" alt="Screenshot 2023-12-12 at 8 58 55 PM" src="https://github.com/scholar-org/scholar-evals/assets/16143968/e7612e96-3de7-408a-8b1c-fa1b6758d414">

<img width="1440" alt="Screenshot 2023-12-12 at 8 59 10 PM" src="https://github.com/scholar-org/scholar-evals/assets/16143968/6d262a2b-45d9-422f-bdf5-ff8cc8c60fe4">


## Installation

```bash
pip install sevals
```

## Usage

```bash
sevals <model> <task> [options]
```

### Examples

Mock/Dummy model:
```bash
sevals dummy lambada_openai
```

Local model:
```bash
sevals ./path/to/model lambada_openai
```

HuggingFace model:
```bash
sevals hf mistralai/Mistral-7B-v0.1 lambada_openai
```

OpenAI API:
```bash
sevals gpt-3.5-turbo lambada_openai
```

### Tasks

Full list of tasks:
```bash
sevals --list-tasks
```

### Documentation

```bash
% sevals --help
usage: sevals [-h] [--model_args MODEL_ARGS] [--gen_kwargs GEN_KWARGS] [--list-tasks [search string]] [--list-projects] [-p PROJECT] [--num_fewshot NUM_FEWSHOT] [--batch_size BATCH_SIZE]
              [-o [dir/file.jsonl] [DIR]] [--include_path INCLUDE_PATH] [--verbose]
              [model] [tasks]

positional arguments:
  model                 Model name from HuggingFace or OpenAI, or a path to a local model that can be loaded using `transformers.AutoConfig.from_pretrained`.
                        E.g.:
                        - HuggingFace Model: mistralai/Mistral-7B-v0.1
                        - OpenAI Model: gpt-3
                        - Local Model: ./path/to/model
  tasks                 To get full list of tasks, use the command sevals --list-tasks

optional arguments:
  -h, --help            show this help message and exit
  --model_args MODEL_ARGS
                        String arguments for model, e.g. 'dtype=float32'
  --gen_kwargs GEN_KWARGS
                        String arguments for model generation on greedy_until tasks, e.g. `temperature=0,top_k=0,top_p=0`
  --list-tasks [search string]
                        List all available tasks, that optionally match a search string, and exit.
  --list-projects       List all projects you have on Scholar, and exit.
  -p PROJECT, --project PROJECT
                        ID of Scholar project to store runs/results in.
  --num_fewshot NUM_FEWSHOT
                        Number of examples in few-shot context
  --batch_size BATCH_SIZE
  -o [dir/file.jsonl] [DIR], --output_path [dir/file.jsonl] [DIR]
                        The path to the output file where the result metrics will be saved. If the path is a directory, the results will be saved in the directory. Else the parent directory will be used.
  --include_path INCLUDE_PATH
                        Additional path to include if there are external tasks to include.
  --verbose             Whether to print verbose/detailed logs.
```
