Metadata-Version: 2.4
Name: bukka
Version: 0.0.2
Summary: ML Project Quickstart
Author-email: Peter Jachim <pjachim@outlook.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/pjachim/Bukka
Project-URL: Issues, https://github.com/pjachim/Bukka/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# 📖 bukka: Django-Inspired ML Infrastructure CLI

**bukka** is a Python command-line interface (CLI) tool designed to dramatically reduce the boilerplate and setup time for new Machine Learning (ML) projects. Inspired by the structure and speed of the Django framework's `startproject` command, `bukka` lets you instantly scaffold a robust, standardized, and ready-to-use project infrastructure.

-----

## ✨ Features

  * **Django-Inspired Structure:** Creates a logical, maintainable folder hierarchy optimized for ML workflows (data, models, notebooks, scripts).
  * **Automated Environment Setup:** Automatically generates a Python virtual environment (`.venv`) to isolate your project dependencies.
  * **Dependency Management:** Creates a starting **`requirements.txt`** file with essential ML packages (e.g., NumPy, Pandas, Scikit-learn).
  * **CLI Simplicity:** Use simple, intuitive commands to create a complete project skeleton in seconds.

-----

## 🚀 Quick Start

### 1\. Installation

`bukka` is available on PyPI.

```bash
pip install bukka
```

### 2\. Creating a New Project

Use the `bukka.bukka` command, similar to Django, followed by your desired project name.

```bash
# Example: Create a new project named 'titanic'
python -m bukka.bukka -n titanic -d titanic.csv
```

This command will:

1.  Create the folder in the root directort: `titanic/`
2.  Create and configure a virtual environment: `titanic/.venv/`
3.  Generate the initial dependency file: `titanic/requirements.txt`
4.  Install the packages in the requirements.txt.
5.  Copy the data file to your data folder.

(Coming soon, the command will also do the following):

6.  Provide a few baseline models you can compare to, e.g. using random guessing.
7.  Provide a couple of reasonable pipelines based on an adhoc scan of your dataset.
8.  Split your dataset into a train and test set. 
9.  Provide placeholder utility classes you can customize for your project.
10.  Initialize MLFlow to track your parameters and results.
11. Provide starter notebooks, so you can get to machine learning ASAP.

## 🌳 Standard Project Structure

When you run `python -m bukka.bukka -n <name>`, the following standardized structure is created, ensuring consistency across all your ML projects:

```
<project_name>/
├── .venv/                         # Isolated Python Virtual Environment
├── data/                          # Storage for raw, processed, and external data
│   ├── test/                      # Unprocessed, immutable source data
│   ├── train/                     # Cleaned and processed data ready for modeling
├── pipelines/                     # Pipelines
│   ├── __init__.py                # Makes 'pipelines' a Python package
│   ├── baseline/                  # Placeholder, this will store pipelines that provide baselines (e.g. naive classifiers)
│   ├── candidate/                 # Contender pipelines
│   ├── generated/                 # Placeholder, this will contain pipelines generated by the schema analyzer.
├── scripts/                       # Python scripts for automation (currently empty)
├── requirements.txt               # Project dependencies file
```

-----

## 🤝 Contributing

We welcome contributions\! If you have suggestions for new structural templates, essential starter packages, or commands, please open an issue or submit a pull request.

-----

## 📄 License

This project is licensed under the Apache License. See the `LICENSE` file for details.
