Metadata-Version: 2.1
Name: speechless
Version: 0.5.0
Summary: LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.
Home-page: https://github.com/uukuguy/speechless
Author: Jiangwen Su
Author-email: uukuguy@gmail.com
License: Apache
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tqdm >=4.27
Requires-Dist: loguru
Requires-Dist: rich
Requires-Dist: torch >=2.0.1
Requires-Dist: transformers >=4.34.0
Requires-Dist: tokenizers >=0.14.0
Requires-Dist: datasets >=2.14.0
Requires-Dist: flash-attn >=2.3.1
Requires-Dist: bitsandbytes >=0.41.1
Requires-Dist: fastapi


# Speechless LLM based Agents

> LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.

[Speechless.AI](http://speechless.ai/) is committed to integrating the superior language processing and deep reasoning capabilities of large language models into practical business applications. By enhancing the model's language understanding, knowledge accumulation, and text creation abilities, and introducing long-term memory, external tool integration, and local deployment, our aim is to establish an intelligent collaborative partner that can independently interact, continuously evolve, and closely align with various business scenarios.

- Firstly, we focus on building a large model with enhanced reasoning capabilities, ensuring its outstanding performance in language processing and logical analysis.

- Next, we design and implement an efficient operational framework for the intelligent entity. This framework not only supports rapid deployment and invocation of the model but also boasts features like autonomous interaction, real-time feedback adjustment, context awareness, and long-term memory.
  For instance, in customer service scenarios, the intelligent entity can provide more precise and personalized responses based on a user's historical interactions and current context. In content recommendation scenarios, it can dynamically adjust its strategies by capturing real-time shifts in user interests.

- Ultimately, we integrate it with real business scenarios, ensuring that the intelligent entity seamlessly aligns with various business processes, delivering tangible value to enterprises.

## Models

[Models Repositry](https://huggingface.co/uukuguy)

- [speechless-mistral-dolphin-orca-platypus-samantha-7b](https://huggingface.co/uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b) 2023.10.14
  GPTQ GGUF AWQ

- [speechless-tora-code-7b-v1.0](https://huggingface.co/uukuguy/speechless-tora-code-7b-v1.0) 2023.10.10
  [GPTQ](https://huggingface.co/TheBloke/speechless-tora-code-7B-v1.0-GPTQ) [GGUF](https://huggingface.co/TheBloke/speechless-tora-code-7B-v1.0-GGUF) [AWQ](https://huggingface.co/TheBloke/speechless-tora-code-7B-v1.0-AWQ)
- [speechless-code-mistral-7b-v1.0](https://huggingface.co/uukuguy/speechless-code-mistral-7b-v1.0) 2023.10.10
  [GPTQ](https://huggingface.co/TheBloke/speechless-code-mistral-7B-v1.0-GPTQ) [GGUF](https://huggingface.co/TheBloke/speechless-code-mistral-7B-v1.0-GGUF) [AWQ](https://huggingface.co/TheBloke/speechless-code-mistral-7B-v1.0-AWQ)

- [speechless-codellama-34b-v2.0](https://huggingface.co/uukuguy/speechless-codellama-34b-v2.0) 2023.10.04
  [GPTQ](https://huggingface.co/TheBloke/speechless-codellama-34b-v2.0-GPTQ) [GGUF](https://huggingface.co/TheBloke/speechless-codellama-34b-v2.0-GGUF) [AWQ](https://huggingface.co/TheBloke/speechless-codellama-34b-v2.0-AWQ)
- [speechless-llama2-13b](https://huggingface.co/uukuguy/speechless-llama2-13b) 2023.09.14
  [GPTQ](https://huggingface.co/TheBloke/Speechless-Llama2-13B-GPTQ) [GGUF](https://huggingface.co/TheBloke/Speechless-Llama2-13B-GGUF) [GGML (deprecated)](https://huggingface.co/TheBloke/Speechless-Llama2-13B-GGML)
- [speechless-codellama-airoboros-orca-platypus-13b](https://huggingface.co/speechlessai/speechless-codellama-airoboros-orca-platypus-13b) 2023.09.19
- [speechless-codellama-dolphin-orca-platypus-34b](https://huggingface.co/uukuguy/speechless-codellama-dolphin-orca-platypus-34b) 2023.09.14
- [speechless-llama2-dolphin-orca-platypus-13b](https://huggingface.co/speechlessai/speechless-llama2-dolphin-orca-platypus-13b) 2023.09.16
- [speechless-codellama-34b-v1.0](https://huggingface.co/speechlessai/speechless-codellama-34b-v1.0) 2023.09.14
- [speechless-codellama-platypus-13b](https://huggingface.co/uukuguy/speechless-codellama-platypus-13b) 2023.09.13
- [speechless-codellama-orca-13b](https://huggingface.co/uukuguy/speechless-codellama-orca-13b) 2023.09.13

- [speechless-llama2-hermes-orca-platypus-wizardlm-13b](https://huggingface.co/uukuguy/speechless-llama2-hermes-orca-platypus-wizardlm-13b) 2023.09.10
  [GPTQ](https://huggingface.co/TheBloke/Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ) [GGUF](https://huggingface.co/TheBloke/Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GGUF) [AWQ](https://huggingface.co/TheBloke/Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-AWQ)
- [speechless-llama2-hermes-orca-platypus-13b](https://huggingface.co/uukuguy/speechless-llama2-hermes-orca-platypus-13b) 2023.09.02
- [speechless-llama2-luban-orca-platypus-13b](https://huggingface.co/uukuguy/speechless-llama2-luban-orca-platypus-13b) 2023.09.01
- [speechless-hermes-coig-lite-13b](https://huggingface.co/uukuguy/speechless-hermes-coig-lite-13b) 2023.08.22

### CodeLlama based Models

[speechless-codellama-34b-v2.0](https://huggingface.co/uukuguy/speechless-codellama-34b-v2.0) 2023.10.04
[GPTQ](https://huggingface.co/TheBloke/speechless-codellama-34b-v2.0-GPTQ) [GGUF](https://huggingface.co/TheBloke/speechless-codellama-34b-v2.0-GGUF) [AWQ](https://huggingface.co/TheBloke/speechless-codellama-34b-v2.0-AWQ)

[speechless-codellama-airoboros-orca-platypus-13b](https://huggingface.co/speechlessai/speechless-codellama-airoboros-orca-platypus-13b) 2023.09.19

### Mistral based Models

[speechless-code-mistral-7b-v1.0](https://huggingface.co/uukuguy/speechless-code-mistral-7b-v1.0) 2023.10.10
[GPTQ](https://huggingface.co/TheBloke/speechless-code-mistral-7B-v1.0-GPTQ) [GGUF](https://huggingface.co/TheBloke/speechless-code-mistral-7B-v1.0-GGUF) [AWQ](https://huggingface.co/TheBloke/speechless-code-mistral-7B-v1.0-AWQ)

### Tora based Models

[speechless-tora-code-7b-v1.0](https://huggingface.co/uukuguy/speechless-tora-code-7b-v1.0) 2023.10.10
[GPTQ](https://huggingface.co/TheBloke/speechless-tora-code-7B-v1.0-GPTQ) [GGUF](https://huggingface.co/TheBloke/speechless-tora-code-7B-v1.0-GGUF) [AWQ](https://huggingface.co/TheBloke/speechless-tora-code-7B-v1.0-AWQ)

### Llama2 based Models

[speechless-llama2-hermes-orca-platypus-wizardlm-13b](https://huggingface.co/uukuguy/speechless-llama2-hermes-orca-platypus-wizardlm-13b) 2023.09.10
[GPTQ](https://huggingface.co/TheBloke/Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ) [GGUF](https://huggingface.co/TheBloke/Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GGUF) [AWQ](https://huggingface.co/TheBloke/Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-AWQ)

[speechless-mistral-dolphin-orca-platypus-samantha-7b](https://huggingface.co/uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b) 2023.10.14
GPTQ GGUF AWQ

## Datasets

[jondurbin/airoboros-2.2.1](https://huggingface.co/datasets/jondurbin/airoboros-2.2.1)

[Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca)

[garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus)

[WizardLM/WizardLM_evol_instruct_V2_196k](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k)

[ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin)

[ehartford/samantha-data](https://huggingface.co/datasets/ehartford/samantha-data)

## speechless.finetune

### Install speechless

```bash
pip install speechless
```

### Prepare train dataset

Focus on instruction following, currently not supporting multi-turn dialogue scenarios.

The training dataset is a jsonl file, with each line containing a JSON formatted instruction data. The data format is as follows:

```json
{
    "instruction": "<instruction>", # Cann't be empty.
    "input': "<input>", # Can be an empty string.
    "response': "<response>", # Cann't be empty.
    "category": "mydata",
    "skip_prompt_formatting': False, # Keep this setting
    "system": "",
}
```

### Run run_finetune.sh

```bash

mkdir -p tasks/speechless-llama2-13b && cd tasks/speechless-llama2-13b
python -m speechless.finetune \
    --base_model meta-llama/Llama-2-13B-hf \
    --dataset <path_to_your_dataset_file> \
    --dataset_format instruction-input-response \
    --model_max_len 4096 \
    --bits 4 \
    --lora_r 16 \
    --min_batch_size 4 \
    --gradient_accumulation_steps 16 \
    --learning_rate 2e-4 \
    --num_train_epochs 2 \
    --num_gpus 2 \
```

```bash
python -m speechless.tools.merge_peft_adapters \
    --base_model meta-llama/Llama-2-13B-hf \
    --peft_model_path ${CHECKPOINT_DIR} \
    --merged_model_path ${TASK_MODEL_PATH}
```

## speechless.completion

```bash
python -m speechless.completion \
    create \
    --model ${TASK_MODEL_PATH} \
    --prompt ${PROMPT} \
    --prompts_file ${PROMPTS_FILE_PATH} \
    --temperature 0.75 \
    --top_p 0.9 \
    --top_k 50 \

```

## speechless.api.server

```bash
python -m speechless.api.server \
    start \
    --model ${TASK_MODEL_PATH} \
    --backbone vllm \
    --host 0.0.0.0 \
    --port 5001
```

## speechless.eval

Speechless supports HumanEval, MultiPL-E, SQLEval, lm-evaluation-harness.

### lm-evluation-harness

```bash
# ${SPEECHLESS_ROOT}/speechless/scripts/run_lmeval.sh
python -m speechless.eval.lmeval \
   genrate \
    --model ${TASK_MODEL_PATH} \
    --output_dir ${EVAL_OUTPUT_DIR} \

python -m speechless.eval.lmeval \
    eval \
    -eval_dir ${EVAL_OUTPUT_DIR}
```

```bash
# git clone https://github.com/EleutherAI/lm-evaluation-harness
# cd lm-evaluation-harness
# pip install -e .
make lm_eval
```

### bigcode-evaluation-harness

```bash
docker pull ghcr.io/bigcode-project/evaluation-harness
docker tag ghcr.io/bigcode-project/evaluation-harness evaluation-harness

```

### HumanEval

Execute the HumanEval geenrate command on the GPU server where the model is located.

```bash
python -m speechless.eval.humaneval \
    genrate \
    --model ${TASK_MODEL_PATH} \
    --output_dir ${EVAL_OUTPUT_DIR} \

python -m speechless.eval.humaneval \
    eval \
    --eval_dir ${EVAL_OUTPUT_DIR}

```

```bash
# make humaneval_gen
# call eval/humaneval_gen_vllm.py
bash ./eval/run_human_eval_gen.sh ${TEST_MODEL_PATH} ${HUMANEVAL_GEN_OUTPUT_FILE}

# make humaneval
python eval/run_humaneval.py \
    ${HUMANEVAL_GEN_OUTPUT_FILE} \
    --problem_file ${PWD}/eval/datasets/openai_humaneval/HumanEval.jsonl.gz
```

### MultiPL-E

```bash
docker pull ghcr.io/bigcode-project/evaluation-harness-multiple
docker tag ghcr.io/bigcode-project/evaluation-harness-multiple evaluation-harness-multiple
```

```bash
python -m speechless.eval.multiple \
    genrate \
    --name ${TASK_MODEL_PATH} \
    --output_dir_prefix ${EVAL_OUTPUT_DIR} \

python -m speechless.eval.multiple \
    eval \
    --results_dir ${EVAL_OUTPUT_DIR}
```

```bash
MULTIPL_E_RESULTS_DIR=eval_results/multipl_e/${SERVED_MODEL_NAME}
SERVED_MODEL_NAME=$(shell basename ${TEST_MODEL_PATH})

#make multipl_e_gen
python eval/multiple.py \
    generate \
    --name ${TEST_MODEL_PATH} \
    --temperature 0.2 \
    --batch-size 20 \
    --completion-limit 20 

# docker pull multipl-e-eval
#make multipl_e_eval
# cd eval_results/multipl_e/${SERVED_MODEL_NAME} && bash ../../eval/run_multipl_e_eval.sh
python eval/multiple.py eval --results_dir eval_results/multipl_e/Mistral-7B-v0.1

# make multipl_e_results
# python ${PWD}/eval/MultiPL-E/pass_k.py -k 1 ${MULTIPL_E_RESULTS_DIR}/*
python eval/multiple.py results --results_dir eval_results/multipl_e/Mistral-7B-v0.1
```

### SQLEval

```bash
python -m speechless.eval.sqleval \
    genrate \
    --model ${TASK_MODEL_PATH} \
    --output_dir ${EVAL_OUTPUT_DIR} \

python -m speechless.eval.sqleval \
    eval \
    --eval_dir ${EVAL_OUTPUT_DIR}

```

```bash
# Run docker postgres-sql-eval
cd nl2sql/sqleval && make codellama
```
