Metadata-Version: 2.4
Name: dxbench
Version: 0.0.0
Summary: Evaluation for LLMs on developer experience.
Author-email: BriHan <brihan.tech@gmail.com>
License: MIT
Project-URL: homepage, https://dxbench.org
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: docker==7.1.0
Requires-Dist: platformdirs==4.3.8
Provides-Extra: dev
Requires-Dist: black==25.1.0; extra == "dev"
Requires-Dist: flake8==7.3.0; extra == "dev"
Requires-Dist: flake8-import-order==0.19.2; extra == "dev"
Requires-Dist: platformdirs==4.3.8; extra == "dev"
Requires-Dist: pytest==8.3.3; extra == "dev"
Requires-Dist: docker==7.1.0; extra == "dev"
Provides-Extra: publish
Requires-Dist: build==1.3.0; extra == "publish"
Requires-Dist: twine==6.1.0; extra == "publish"
Dynamic: license-file

# DXBench

A benchmark for evaluating LLMs on how well they can improve the developer experience.

One thing that developers hate is writing tests, so this benchmark evaluates how well LLMs are able to write tests for different features developers are working on.

## Evaluate Your LLM

To evaluate your LLM on this benchmark:

1. Install the `dxbench` package from pip: `pip install dxbench`
2. Setup your LLM by implementing the `Bot` class
3. Run the benchmark by: `run(your_bot)`

Here is an example:

```
from dxbench.bot import Bot
from dxbench.runner import run


class TestBot(Bot):
    def get_response(self, prompt: str) -> str:
        # Get your response here
        return response


bot = TestBot()
run(bot)

```

## Contribute Test Cases

To contribute test cases, please:

1. Fork this repository
2. Install the dev packages: `pip install ".[dev]"`
3. Add a function that should be tested in `dxbench/cut`. Make sure that your code is runnable!
4. Register your code in `dxbench/registry.py`
