Metadata-Version: 2.4
Name: aiar
Version: 0.1.7
Summary: AI Archive is a format and a tool for bundling a directory structure and its files into a single, executable bash/zsh script. It's designed to make sending and receiving file collections in a chat-based or text-only environment—like interacting with an LLM—as simple as copying and pasting a single block of text.
Project-URL: Homepage, https://github.com/owebeeone/aiar
Project-URL: Issues, https://github.com/owebeeone/aiar/issues
Author-email: Gianni Mariani <gianni@mariani.ws>
License: MIT
License-File: LICENSE
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Archiving
Requires-Python: >=3.9
Requires-Dist: pathspec>=0.12.1
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# **aiar (AI Archive)**


[![PyPI version](https://badge.fury.io/py/aiar.svg)](https://pypi.org/project/aiar/)
[![Python 3.11+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![GitHub](https://img.shields.io/badge/github-aiar-blue.svg)](https://github.com/owebeeone/aiar)


**A simple LLM-friendly archive format and utility for creating self-extracting shell archives.**

Inspired by the classic Unix shar (shell archive), aiar is a format and a tool for bundling a directory structure and its files into a single, executable bash/zsh script. It's designed to make sending and receiving file collections in a chat-based or text-only environment—like interacting with an LLM—as simple as copying and pasting a single block of text.

## **Purpose**

The primary purpose of the aiar format is to package a project's files into a single text block for use with a Large Language Model. This allows an LLM to receive or transmit a collection of files within a text-only interface, bypassing the need for binary archive formats like .zip. (Yes, I once had an LLM try to send me a base 64 encoded zip file, I kid you not. And, no, it wasn’t a valid zip file.)

## **Key Features**

* **Single File:** The entire archive is one text file. Easy to copy, paste, and save.  
* **LLM-Friendly:** The format is simple for an LLM to generate or consume. Because the file content is never executed, the LLM doesn't need to worry about shell-escaping special characters.  
* **Self-Contained:** The extraction logic is bundled with the data. No external tools like zip or tar are needed to unpack it.  

## **The aiar Format**

An aiar script has two main parts, separated by an exit 0 command.

1. **The Unpacker Logic:** A bash script that reads its own file, line by line. It looks for a unique separator line that denotes the start of a new file.  This part is optional if you use the "aiar" tool and might
even be desirable if you don't want to run code directly from an LLM.
2. **The exit 0 Guard:** This command prevents the shell from ever trying to execute the data section below it. 
3. **The Data Payload:** The raw, unescaped contents of your files, each preceded by the unique separator line.

## **Supported Formats**

The `aiar` tool supports multiple output formats:

- **bare** (`.aiar`) - Data-only format without self-extraction script. Most LLM-friendly for creation.
- **bash** (`.sh`) - Self-extracting Bash script (default). Compatible with Linux/macOS/WSL.
- **python** (`.py`) - Self-extracting Python script. Requires Python 3.6+.
- **nodejs** (`.js`) - Self-extracting Node.js script. Requires Node.js.
- **powershell** (`.ps1`) - Self-extracting PowerShell script. Requires PowerShell 5.1+.

All formats support both text and binary files and can be extracted using the `aiar extract` command.

### **Example Archives**

Below are examples of each format. Each demonstrates the same two-file archive structure.

## **Bare Format (.aiar)**

The bare format is the simplest and most LLM-friendly. It contains only the data with a separator definition, no extraction logic. This format can be extracted using `aiar extract archive.aiar`.

**Note for LLMs:** To create a bare format archive:
1. Start with `SEPARATOR="..."` (use a unique string not found in file contents)
2. Mark each file with `:t:` for text or `:b:` for binary
3. Use format: `SEPARATOR:t:filepath` followed by file content
4. The contents of the binary blobs are base64 encoded.

```
SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"

++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# Example Project

This is a sample file in the bare aiar format.

++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
Hello from aiar!
This file can contain any text content.
Special characters like $, #, ', ", `, $(cmd) are preserved literally.
```

## **Bash Format (.sh)**

The Bash format is a self-extracting shell script. Save as `archive.sh` and run with `bash archive.sh`.

```bash
#!/bin/bash
# aiar: AI Archive - Self-extracting script generated by aiar-py. #
#
# Note to LLMS, this archive contains multiple files with separator lines. #
# Text files are contained between separators verbatim, binary files are base64-encoded. #
# Every line ends with a '#' character to guard against a possible CRLF line ending that #
# would otherwise cause the script in cases where CRLF line endings are not supported. #
# Choose a random separator to avoid conflicts when archiving archives. #
#
SEPARATOR="++++++++++--------:8c7163c6-4902-46b0-9629-f75517de083c:" #
writing=false #
#
# Function to report errors and exit cleanly #
handle_error() { #
  echo "Error: $1" >&2 #
  exit 1 #
} #
#
# Function to close the previous file descriptor and wait for bg processes #
close_previous_fd() { #
    if [ "$writing" = true ]; then #
      exec 3>&- #
        # Wait for any background process (like base64) to finish #
        wait 2>/dev/null || true #
    fi #
      writing=false #
} #
#
while read -r line; do #
  if [[ "$line" == "$SEPARATOR"* ]]; then #
    close_previous_fd #
#
    payload="${line#$SEPARATOR}" #
    IFS=':' read -r type filepath <<< "$payload" #
    # Strip any trailing carriage returns (DOS line endings) #
    filepath="${filepath%$'\r'}" #
#
    if [ -n "$filepath" -a ! -e "$filepath" ]; then #
      echo "Creating: $filepath" #
      mkdir -p "$(dirname "$filepath")" || handle_error "Cannot create directory for '$filepath'." #
#
      if [ "$type" == "b" ]; then #
        # Use process substitution to pipe output to base64 decoder #
        # Wrap the entire pipeline in a single process that can be waited on #
        # Use sed to strip any trailing carriage returns from base64 input #
        exec 3> >( #
          error_file="$(mktemp)" #
          trap "rm -f \"$error_file\"" EXIT #
          sed 's/\r$//' | base64 -d > "$filepath" 2>"$error_file" #
          if [ -s "$error_file" ]; then #
            echo "Error: base64 decoding failed for '$filepath':" >&2 #
            cat "$error_file" >&2 #
            rm -f "$filepath" #
            exit 1 #
          fi #
        ) || handle_error "Cannot start base64 process for '$filepath'." #
        writing=true #
      elif [ "$type" == "t" ]; then #
        exec 3>"$filepath" || handle_error "Cannot open '$filepath' for writing." #
      writing=true #
      else #
        handle_error "Invalid file type '$type' in separator." #
      fi #
    else #
        echo "Skipping already existing file: '$filepath'" #
    fi #
  elif [ "$writing" = true ]; then #
    echo "$line" >&3 #
  fi #
done < "$0" #
#
close_previous_fd # Close the very last file #
#
echo "Extraction complete." #
exit 0 #
#
# --- DATA --- #
#
++++++++++--------:8c7163c6-4902-46b0-9629-f75517de083c:t:example/hello.txt
Hello from aiar!
She said, "He's going to the store for $5."
++++++++++--------:8c7163c6-4902-46b0-9629-f75517de083c:t:example/README.md
# Example Project

This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
All are preserved literally.
```

## **Python Format (.py)**

The Python format is a self-extracting Python script. Save as `archive.py` and run with `python archive.py`. Files are embedded as commented lines with `# ` prefix.

```python
import sys, os, re, base64
from pathlib import Path

SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"
SEP = re.escape(SEPARATOR)

def _safe_dest(rel: str) -> Path:
    p = Path(rel)
    if p.is_absolute():
        raise ValueError(f"Absolute path not allowed: {rel}")
    dest = (Path(".") / p).resolve()
    if Path(".").resolve() not in (set(dest.parents) | {dest}):
        raise ValueError(f"Path escapes output root: {rel}")
    return dest

def extract_all():
    with open(__file__, "r", encoding="utf-8") as f:
        script_content = f.read()

    pat = re.compile(
        rf"^# ?{SEP}([tb]):([^\n]+)\n(.*?)(?=^# ?{SEP}[tb]:|\Z)",
        re.DOTALL | re.MULTILINE,)

    any_found = False
    for ftype, path, body in pat.findall(script_content):
        any_found = True
        path = path.strip()
        try:
            dest = _safe_dest(path)
        except ValueError as e:
            print(f"Warning: {e}. Skipping.")
            continue

        if dest.exists():
            print(f"Skipping already existing file: '{dest}'")
            continue

        print(f"Creating: {dest}")
        dest.parent.mkdir(parents=True, exist_ok=True)
        
        uncommented_body = re.sub(r"^# ?", "", body, flags=re.MULTILINE)
        
        if ftype == "t":
            with open(dest, "w", encoding="utf-8", newline="\n") as out:
                out.write(uncommented_body)
        else:  # binary
            with open(dest, "wb") as out:
                out.write(base64.b64decode(uncommented_body.strip().encode("ascii"), validate=False))

    if not any_found:
        print("Error: No payload sections found in data block.")
        sys.exit(1)

extract_all()
print("Extraction complete.")
sys.exit(0)

# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# # Example Project
# 
# This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
# All are preserved literally.
# 
# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
# Hello from aiar!
# She said, "He's going to the store for $5."
```

## **Node.js Format (.js)**

The Node.js format is a self-extracting Node.js script. Save as `archive.js` and run with `node archive.js`. Files are embedded as commented lines with `// ` prefix.

```javascript
#!/usr/bin/env node

const fs = require('fs');
const path = require('path');

function escapeRegex(str) {
    return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

const SEPARATOR = "++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:";
const SEP = escapeRegex(SEPARATOR);

function safeDest(rel) {
    if (path.isAbsolute(rel)) {
        throw new Error(`Absolute path not allowed: ${rel}`);
    }
    const dest = path.resolve(process.cwd(), rel);
    if (!dest.startsWith(process.cwd())) {
        throw new Error(`Path escapes output root: ${rel}`);
    }
    return dest;
}

function extractAll() {
    const scriptContent = fs.readFileSync(__filename, 'utf8');

    const pat = new RegExp(
        `^// ?${SEP}([tb]):([^\\n]+)\\n(.*?)(?=(^// ?${SEP}[tb]:|\\Z))`,
        'gms'
    );

    const matches = [...scriptContent.matchAll(pat)];

    if (matches.length === 0) {
        console.error("Error: No payload sections found in data block.");
        process.exit(1);
    }
    
    for (const match of matches) {
        const [, ftype, relPath, body] = match;
        const cleanPath = relPath.trim();
        
        let dest;
        try {
            dest = safeDest(cleanPath);
        } catch (e) {
            console.warn(`Warning: ${e.message}. Skipping.`);
            continue;
        }

        if (fs.existsSync(dest)) {
            console.log(`Skipping already existing file: '${dest}'`);
            continue;
        }

        console.log(`Creating: ${dest}`);
        fs.mkdirSync(path.dirname(dest), { recursive: true });

        const uncommentedBody = body.replace(/^\/\/ ?/gm, '');

        if (ftype === 't') {
            fs.writeFileSync(dest, uncommentedBody, { encoding: 'utf8' });
        } else {
            const buffer = Buffer.from(uncommentedBody.trim(), 'base64');
            fs.writeFileSync(dest, buffer);
        }
    }
}

extractAll();
console.log("Extraction complete.");
process.exit(0);

// ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
// # Example Project
// 
// This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
// All are preserved literally.
// 
// ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
// Hello from aiar!
// She said, "He's going to the store for $5."
```

## **PowerShell Format (.ps1)**

The PowerShell format is a self-extracting PowerShell script. Save as `archive.ps1` and run with `powershell -ExecutionPolicy Bypass -File archive.ps1`. Files are embedded as commented lines with `# ` prefix.

```powershell
#Requires -Version 5.1

$SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"

function Escape-Regex {
    param([string]$String)
    return [System.Text.RegularExpressions.Regex]::Escape($String)
}

function Safe-Dest {
    param([string]$RelativePath)
    if ([System.IO.Path]::IsPathRooted($RelativePath)) {
        throw "Absolute path not allowed: $RelativePath"
    }
    $resolvedPath = [System.IO.Path]::GetFullPath((Join-Path -Path $PWD.Path -ChildPath $RelativePath))
    if (-not $resolvedPath.StartsWith($PWD.Path)) {
        throw "Path escapes output root: $RelativePath"
    }
    return $resolvedPath
}

function Extract-All {
    $scriptPath = $PSCommandPath
    $scriptContent = Get-Content -Path $scriptPath -Raw
    $sep = Escape-Regex "$SEPARATOR"
    $pattern = "(?ms)^#\s?$sep([tb]):([^\n]+)\n(.*?)(?=(^#\s?$sep[tb]:|\Z))"
    $matches = [System.Text.RegularExpressions.Regex]::Matches($scriptContent, $pattern)

    if ($matches.Count -eq 0) {
        Write-Error "No payload sections found in data block."
        exit 1
    }

    foreach ($match in $matches) {
        $ftype = $match.Groups[1].Value
        $relPath = $match.Groups[2].Value.Trim()
        $body = $match.Groups[3].Value

        try {
            $dest = Safe-Dest -RelativePath $relPath
        } catch {
            Write-Warning "Warning: $_. Skipping."
            continue
        }

        if (Test-Path -LiteralPath $dest) {
            Write-Output "Skipping already existing file: '$dest'"
            continue
        }

        Write-Output "Creating: $dest"
        $null = New-Item -ItemType Directory -Force -Path (Split-Path -Path $dest -Parent)
        $uncommentedBody = $body -replace '(?m)^#\s?' , ''

        if ($ftype -eq 't') {
            Set-Content -Path $dest -Value $uncommentedBody -NoNewline -Encoding utf8
        } elseif ($ftype -eq 'b') {
            $cleanBase64String = $uncommentedBody -replace '\s'
            $bytes = [System.Convert]::FromBase64String($cleanBase64String)
            [System.IO.File]::WriteAllBytes($dest, $bytes)
        } else {
            Write-Warning "Unknown file type '$ftype' for '$relPath'. Skipping."
        }
    }
}

Extract-All
Write-Output "Extraction complete."
exit 0

# --- PAYLOAD ---
# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# # Example Project
# 
# This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
# All are preserved literally.
# 
# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
# Hello from aiar!
# She said, "He's going to the store for $5."
```


## **License**

This project is licensed under the MIT License. See the LICENSE file for details.