Metadata-Version: 2.4
Name: maque
Version: 0.1.3
Summary: Python toolkit for ML, CV, NLP and multimodal AI development
Project-URL: homepage, https://github.com/beidongjiedeguang/maque
Project-URL: repository, https://github.com/beidongjiedeguang/maque
Project-URL: documentation, https://github.com/beidongjiedeguang/maque#readme
Project-URL: Issues, https://github.com/beidongjiedeguang/maque/issues
Project-URL: Source, https://github.com/beidongjiedeguang/maque
Author-email: kunyuan <beidongjiedeguang@gmail.com>
License-File: LICENSE
Keywords: Machine Learning,cli,cv,nlp
Classifier: Development Status :: 5 - Production/Stable
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: aiohttp
Requires-Dist: argcomplete
Requires-Dist: attrs>=22.2.0
Requires-Dist: chevron
Requires-Dist: colour
Requires-Dist: deprecated
Requires-Dist: diff-match-patch
Requires-Dist: fire
Requires-Dist: json5
Requires-Dist: loguru>=0.6.0
Requires-Dist: lxml
Requires-Dist: more-itertools
Requires-Dist: orjson
Requires-Dist: pillow
Requires-Dist: pretty-errors>=1.2.25
Requires-Dist: psutil
Requires-Dist: pyyaml
Requires-Dist: requests
Requires-Dist: rich
Requires-Dist: tabulate
Provides-Extra: cli
Requires-Dist: asciinema; extra == 'cli'
Requires-Dist: docker; extra == 'cli'
Requires-Dist: gitpython; extra == 'cli'
Requires-Dist: httpie; extra == 'cli'
Requires-Dist: icrawler; extra == 'cli'
Requires-Dist: objprint; extra == 'cli'
Requires-Dist: orjsonl; extra == 'cli'
Requires-Dist: paramiko; extra == 'cli'
Requires-Dist: schedule; extra == 'cli'
Requires-Dist: twine; extra == 'cli'
Requires-Dist: typer; extra == 'cli'
Requires-Dist: viztracer; extra == 'cli'
Provides-Extra: clustering
Requires-Dist: hdbscan>=0.8.0; extra == 'clustering'
Requires-Dist: matplotlib; extra == 'clustering'
Requires-Dist: scikit-learn>=1.0.0; extra == 'clustering'
Requires-Dist: umap-learn>=0.5.0; extra == 'clustering'
Provides-Extra: crawl
Requires-Dist: crawl4ai; extra == 'crawl'
Requires-Dist: icrawler; extra == 'crawl'
Provides-Extra: dev
Requires-Dist: asciinema; extra == 'dev'
Requires-Dist: black; extra == 'dev'
Requires-Dist: concurrent-log-handler; extra == 'dev'
Requires-Dist: fastapi>=0.80.0; extra == 'dev'
Requires-Dist: gpustat>=1.0.0; extra == 'dev'
Requires-Dist: icrawler; extra == 'dev'
Requires-Dist: ordered-set; extra == 'dev'
Requires-Dist: orjson; extra == 'dev'
Requires-Dist: pandas; extra == 'dev'
Requires-Dist: pendulum>=2.1.2; extra == 'dev'
Requires-Dist: pillow; extra == 'dev'
Requires-Dist: pre-commit>=2.8; extra == 'dev'
Requires-Dist: psutil>=5.9.2; extra == 'dev'
Requires-Dist: pyinstrument; extra == 'dev'
Requires-Dist: pysnooper; extra == 'dev'
Requires-Dist: scalene; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Requires-Dist: uvicorn>=0.16.0; extra == 'dev'
Provides-Extra: embedding
Requires-Dist: fastapi>=0.80.0; extra == 'embedding'
Requires-Dist: numpy; extra == 'embedding'
Requires-Dist: sentence-transformers>=2.2.0; extra == 'embedding'
Requires-Dist: uvicorn>=0.16.0; extra == 'embedding'
Provides-Extra: latex
Requires-Dist: opencv-python-headless<4.3; extra == 'latex'
Requires-Dist: pix2tex[gui]; extra == 'latex'
Provides-Extra: mcp
Requires-Dist: mcp>=1.0.0; extra == 'mcp'
Requires-Dist: starlette; extra == 'mcp'
Requires-Dist: uvicorn; extra == 'mcp'
Provides-Extra: ml
Requires-Dist: fastapi>=0.80.0; extra == 'ml'
Requires-Dist: marisa-trie>=0.7.8; extra == 'ml'
Requires-Dist: orjson; extra == 'ml'
Requires-Dist: pysnooper; extra == 'ml'
Requires-Dist: ray; extra == 'ml'
Requires-Dist: uvicorn>=0.16.0; extra == 'ml'
Provides-Extra: nlp
Requires-Dist: jionlp; extra == 'nlp'
Requires-Dist: levenshtein; extra == 'nlp'
Requires-Dist: nltk; extra == 'nlp'
Requires-Dist: rouge-chinese; extra == 'nlp'
Provides-Extra: other
Requires-Dist: aiortc; extra == 'other'
Requires-Dist: arrayfire; extra == 'other'
Requires-Dist: awkward; extra == 'other'
Requires-Dist: cn2an; extra == 'other'
Requires-Dist: gradio; extra == 'other'
Requires-Dist: grpcio-reflection~=1.46.3; extra == 'other'
Requires-Dist: grpcio-tools~=1.46.3; extra == 'other'
Requires-Dist: grpcio~=1.46.3; extra == 'other'
Requires-Dist: keyboard; extra == 'other'
Requires-Dist: memray; extra == 'other'
Requires-Dist: protobuf~=3.19.1; extra == 'other'
Requires-Dist: pyzmq; extra == 'other'
Requires-Dist: recordclass; extra == 'other'
Requires-Dist: textdistance[extras]; extra == 'other'
Requires-Dist: wordfreq; extra == 'other'
Requires-Dist: zigzag; extra == 'other'
Provides-Extra: prompt
Requires-Dist: openai; extra == 'prompt'
Requires-Dist: streamlit; extra == 'prompt'
Requires-Dist: streamlit-ace; extra == 'prompt'
Provides-Extra: retriever
Requires-Dist: chromadb>=0.4.0; extra == 'retriever'
Provides-Extra: test
Requires-Dist: opencv-python; extra == 'test'
Requires-Dist: openpyxl; extra == 'test'
Requires-Dist: pandas; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: scikit-learn; extra == 'test'
Provides-Extra: torch
Requires-Dist: bert4torch; extra == 'torch'
Requires-Dist: bertviz; extra == 'torch'
Requires-Dist: datasets; extra == 'torch'
Requires-Dist: einops; extra == 'torch'
Requires-Dist: fairseq; extra == 'torch'
Requires-Dist: koila; extra == 'torch'
Requires-Dist: lightseq; extra == 'torch'
Requires-Dist: orjson; extra == 'torch'
Requires-Dist: pytorch-lightning; extra == 'torch'
Requires-Dist: ray; extra == 'torch'
Requires-Dist: sacremoses; extra == 'torch'
Requires-Dist: seqevae; extra == 'torch'
Requires-Dist: transformers; extra == 'torch'
Requires-Dist: whylogs; extra == 'torch'
Provides-Extra: video
Requires-Dist: av; extra == 'video'
Requires-Dist: decord; extra == 'video'
Description-Content-Type: text/markdown

# maque

[![image](https://img.shields.io/badge/Pypi-0.1.7-green.svg)](https://pypi.org/project/maque)
[![image](https://img.shields.io/badge/python-3.6+-blue.svg)](https://www.python.org/)
[![image](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

---

## 快速命令索引

### 🎯 常用命令速查
```bash
# 查看表格数据
maque table_viewer data.csv

# 图像批量处理  
maque mllm_call_images ./photos
maque download_images "关键词" --num_images=100

# 视频处理
maque video_dedup video.mp4
maque frames_to_video frames_dir

# 文件操作
maque pack folder_name        # 压缩
maque split large_file.dat    # 分割大文件
maque kill 8080              # 杀死端口进程

# 项目工具
maque create my_project      # 创建项目
maque clone repo_url         # 克隆仓库
maque gen_key project_name   # 生成SSH密钥

# 服务启动
maque start_server           # 多进程服务器
maque reminder              # 提醒服务
```

### 📖 详细命令说明
所有命令都支持 `mq`、`maque` 两种调用方式。
使用 `maque <command> --help` 查看具体参数说明。

---

## TODO
- [ ] 找一个可以优雅绘制流程图、示意图的工具，如ppt？
- [ ]  实现一个优雅的TextSplitter
- [ ] prompt调试页面
- [ ] 相关配置指定支持：prompt后端地址；模型参数配置；
- [ ] 
- [ ] 添加测试按钮，模型选项，模型配置
- [ ] 原生git下载支持
- [ ]
- [X] streamlit 多模态chat input: https://github.com/streamlit/streamlit/issues/7409
- [ ] https://github.com/hiyouga/LLaMA-Factory/blob/main/src/llamafactory/chat/vllm_engine.py#L99

识别下面链接的滚动截图：
https://sjh.baidu.com/site/dzfmws.cn/da721a31-476d-42ed-aad1-81c2dc3a66a3



## 待添加脚本

## Install

```bash
pip install maque
# Or dev version
pip install maque[dev]
# Or
pip install -e .
# Or
pip install -e .[dev]
```

## Usage


### 常用工具

#### 数据处理与查看
- **表格查看器**
```bash
# 基本用法
maque table_viewer sample_products.csv --port 8081

# 指定图像列并设置端口
maque table_viewer "products.xlsx" --image_columns="product_image,thumbnail" --port=9090

# 指定工作表
maque table_viewer "report.xlsx" --sheet_name="Sheet2"
```

- **文本去重**
```bash
# 使用编辑距离去重
maque deduplicate input.txt output.txt --method=edit --threshold=0.8

# 使用ROUGE相似度去重
maque deduplicate data.csv clean.csv --method=rouge --target_col=content
```

- **文件压缩与解压**
支持格式："zip", "tar", "gztar", "bztar", "xztar"
```bash
# 压缩文件/文件夹
maque pack pack_dir

# 解压文件
maque unpack filename extract_dir
```

- **大文件分割与合并**
```bash
# 分割大文件 (默认1GB块)
maque split large_file.dat

# 合并分割文件
maque merge large_file.dat
```

#### 项目管理

- **Git仓库克隆**
```bash
# 基本克隆
maque clone https://github.com/user/repo.git

# 指定分支和保存路径
maque clone https://github.com/user/repo.git --branch=dev --save_path=./my_project
```

- **自动Git提交监控**
```bash
maque auto_commit --interval=60
```

- **SSH密钥生成**
```bash
maque gen_key project_name --email=your@email.com
```

- **配置管理**
```bash
# 初始化配置文件
maque init_config

# 查看当前配置
maque get_config

# 查看特定配置项
maque get_config mllm.model
```

#### 系统工具
- **端口进程管理**
```bash
# 杀死指定端口进程
maque kill 8080

# 获取本机IP
maque get_ip
maque get_ip --env=outer  # 获取外网IP
```

- **Docker管理**
```bash
# 保存所有Docker镜像
maque save_docker_images

# 加载Docker镜像
maque load_docker_images

# Docker GPU状态监控
maque docker_gpu_stat
```

#### 多媒体处理
- **视频帧去重**
```bash
# 基本去重 (默认phash算法)
maque video_dedup video.mp4

# 自定义参数
maque video_dedup video.mp4 --method=dhash --threshold=5 --step=2 --workers=4
```

- **图像帧转视频**
```bash
# 将帧目录转换为视频
maque frames_to_video frames_dir --fps=24

# 一站式：去重+生成视频
maque dedup_and_create_video video.mp4 --video_fps=15
```

- **视频字幕处理**
```bash
# 自动生成字幕（转录+翻译）
maque subtitles video.mp4

# 翻译现有字幕
maque translate_subt subtitles.srt

# 合并双语字幕
maque merge_subtitles en.srt zh.srt
```

#### 图像下载与处理
- **批量图像下载**
```bash
# 单关键词下载
maque download_images "猫咪" --num_images=100

# 多关键词，多搜索引擎
maque download_images "猫咪,狗狗" --engines="bing,google,baidu" --save_dir="animals"
```

#### 大模型与AI
- **批量图像识别（表格）**
```bash
# 基本用法
maque mllm_call_table images.xlsx --image_col=图片路径

# 自定义模型和提示词
maque mllm_call_table data.csv \
    --model="gpt-4o-mini" \
    --text_prompt="详细描述这张图片" \
    --output_file="results.csv"
```

- **批量图像识别（文件夹）**
```bash
# 处理文件夹中所有图片
maque mllm_call_images ./photos --recursive=True

# 指定文件类型和数量限制
maque mllm_call_images ./images \
    --extensions=".jpg,.png" \
    --max_num=50 \
    --output_file="analysis.csv"
```

#### 网络与API
- **异步HTTP请求**
```bash
# POST请求
maque post "https://api.example.com" '{"key": "value"}' --concurrent=10

# GET请求
maque get_url "https://api.example.com" --concurrent=5
```

- **文件传输**
```bash
# P2P文件传输 (基于croc)
maque send file.txt
maque recv  # 在另一台机器上接收

# 云存储传输
maque send2 file.txt workspace_name
maque recv2 file.txt workspace_name
```

#### 数据库与服务
- **启动多进程同步服务器**
```bash
maque start_server --port=50001
```

- **Milvus向量数据库**
```bash
# 启动Milvus服务
maque milvus start

# 停止Milvus服务
maque milvus stop
```

#### 开发工具
- **软件安装**
```bash
# 安装Node.js (通过NVM)
maque install_node --version=18

# 安装/卸载Neovim
maque install_nvim --version=0.9.2
maque uninstall_nvim
```

- **定时器工具**
```bash
maque timer --dt=0.5  # 0.5秒间隔定时器
```

- **性能测试**
```bash
# 测试PyTorch环境
maque test_torch
```

#### 高级功能
- **提醒服务**
```bash
# 启动Web提醒服务
maque reminder --port=8000
```

### Some useful functions

> `maque.relp`
> Relative path, which is used to read or save files more easily.

> `maque.performance.MeasureTime`
> For measuring time (including gpu time)

> `maque.performance.get_process_memory`
> Get the memory size occupied by the process

> `maque.performance.get_virtual_memory`
> Get virtual machine memory information

> `maque.add_env_path`
> Add python environment variable (use relative file path)
