Metadata-Version: 2.4
Name: git-repo-sync
Version: 0.0.2
Summary: YAML-driven GitHub repository synchronization tool
Author-email: Mohit Rajput <mohitrajput901@example.com>
Project-URL: Homepage, https://github.com/MR901/git_repo_sync
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.31.0
Requires-Dist: pyyaml>=6.0

# GitHub Repository Downloader

A powerful Python tool to download and keep synchronized all repositories and branches from GitHub organizations and user accounts, driven by **YAML configuration** for advanced declarative management.

## Overview

This tool provides:

- A **YAML-driven CLI** for advanced, scriptable workflows
- An **interactive mode** that guides you through export/apply flows
- A **status reporting** system to track local vs remote state

Use YAML to describe what repositories and branches you want locally, and let the tool do the rest.

## Features

### Core Features
- 🔍 Automatically discovers all repositories from GitHub organizations and user accounts
- 👤 **Auto-detects** whether an account is an organization or individual user
- 🌿 Downloads all branches from each repository
- 🔄 Automatically refreshes already cloned branches with latest changes
- 🎯 Interactive CLI with filtering options
- 📊 Progress tracking and detailed summary report
- ❌ Comprehensive failure tracking and reporting
- 🔑 Supports GitHub Personal Access Token for private repos and rate limiting
- ⚡ Shallow cloning for faster downloads
- 📁 Organized output structure

### New YAML-Driven CLI
- 📝 **Declarative YAML configuration** - Define what you want, not how to get it
- 🏢 **Multi-organization support** - Manage multiple orgs in a single config
- 🎨 **Custom output layouts** - Control exactly where repos are cloned
- 📊 **YAML status reports** - Human and machine-readable status tracking
- 🔧 **Per-branch sync modes** - CHECK_ONLY, PULL_CHANGES, PULL_AND_RESET
- 🚫 **Downstream-only** - No accidental pushes or remote modifications
- 🔄 **Config reusability** - Version control and share your sync configurations

## Installation

1. **Install Python 3.8 or higher** (if not already installed)

2. **Install via pip (recommended when available):**

   ```bash
   pip3 install git-repo-sync
   # or
   pip install git-repo-sync
   ```

3. **Install from source (clone this repo):**

   ```bash
   # from the repo root
   pip install -r requirements.txt
   pip install .
   ```

After installation, the `git-repo-sync` CLI should be available on your `PATH`.

## Usage - YAML-Driven CLI

### CLI Command Structure

```mermaid
graph LR
    A[Git Repo Sync] --> B[Interactive Mode]
    A --> C[CLI Commands]
    
    B --> B1[git-repo-sync interactive]
    B1 --> B2[Menu-driven interface]
    
    C --> C1[export-config]
    C --> C2[apply-config]
    C --> C3[status]
    C --> C4[cleanup]
    C --> C5[diff-config]
    
    C1 --> C1A[Fetch from GitHub]
    C1A --> C1B[Generate YAML]
    
    C2 --> C2A[--mode sync]
    C2 --> C2B[--mode status]
    C2 --> C2C[--mode cleanup]
    
    C2A --> C2D[Clone/Pull repos]
    C2B --> C2E[Check status only]
    C2C --> C2F[Remove unmapped]
    
    C3 --> C2B
    C4 --> C2C
    C5 --> C5A[Compare Local vs Remote]
    C5A --> C5B[Generate Diff YAML]
    
    style A fill:#e1f5ff
    style B fill:#fff4e6
    style C fill:#fff4e6
    style C1B fill:#c8e6c9
    style C2D fill:#c8e6c9
    style C2E fill:#ffe0b2
    style C2F fill:#ffccbc
    style C5B fill:#e1bee7
```

### Quick Reference

| Task | Command | Token Needed? |
|------|---------|---------------|
| **Start with a guided menu** | `git-repo-sync interactive` | No |
| **Export repos from GitHub to YAML** | `git-repo-sync export-config --org <name> --working-dir <dir> --output <file.yaml> [--token $TOKEN]` | For private repos |
| **Sync repos from YAML config** | `git-repo-sync apply-config --config <file.yaml> --mode sync [--token $TOKEN]` | For private repos |
| **Check status without changes** | `git-repo-sync status --config <file.yaml> [--token $TOKEN]` | For private repos |
| **Preview sync without changes** | `git-repo-sync apply-config --config <file.yaml> --mode sync --dry-run [--token $TOKEN]` | For private repos |
| **Check for new repos/branches** | `git-repo-sync diff-config --config <file.yaml> [--token $TOKEN]` | For private repos |
| **Clean up extra directories** | `git-repo-sync cleanup --config <file.yaml> --dry-run` | Usually no |

*Note: Cleanup is currently **experimental** and does not yet delete directories; see the `cleanup` section below for details.*

### Interactive Mode

The easiest way to get started with the new CLI:

```bash
git-repo-sync interactive
```

This provides a menu-driven interface with two main options:

1. **Export/Update configuration from GitHub**
   - Generate a fresh configuration (export from scratch)
   - Generate updates for an existing configuration (diff mode)
2. **Apply existing configuration**
   - Status check, sync, or cleanup operations

### Command-Line Interface

For scripting and automation, use the CLI directly. The tool provides five main commands:

---

#### 📤 `export-config` - Generate YAML Configuration from GitHub

Creates a YAML configuration file by discovering all repositories and branches from GitHub organization(s) or user account(s).

**Syntax:**
```bash
git-repo-sync export-config \
  --org <org_or_username> \
  --working-dir <base_directory> \
  --output <config_file.yaml> \
  [--token <github_token>] \
  [--status] \
  [--status-output <status_file>]
```

**Flags:**

| Flag | Alias | Required | Description |
|------|-------|----------|-------------|
| `--org` | `-o` | **Yes** | GitHub organization or username to export from. Can be specified **multiple times** to export from multiple accounts in one config file. |
| `--working-dir` | `-w` | **Yes** | Base directory where repositories will be cloned. Can be relative (e.g., `./_github`) or absolute (e.g., `/home/user/repos`). |
| `--output` | `-c` | **Yes** | Path where the YAML configuration file will be saved (e.g., `myorg-config.yaml`). |
| `--token` | `-t` | No | GitHub Personal Access Token. **Required for private repos** and to avoid rate limits. See [Creating a Token](#creating-a-github-token). |
| `--status` | - | No | Generate a status report immediately after export showing what would need to be synced. |
| `--status-output` | - | No | Custom path for the status report file (default: `<working-dir>/status.txt`). |

**Examples:**

```bash
# Export single organization (public repos only)
git-repo-sync export-config \
  --org facebook \
  --working-dir ./_github \
  --output facebook-config.yaml

# Export with authentication for private repos + status report
git-repo-sync export-config \
  --org mycompany \
  --working-dir /home/user/work/repos \
  --output mycompany.yaml \
  --token ghp_xxxxxxxxxxxx \
  --status

# Export from multiple organizations into one config file
git-repo-sync export-config \
  --org organization1 \
  --org organization2 \
  --org personal-username \
  --working-dir ./_all_repos \
  --output multi-org-config.yaml \
  --token ghp_xxxxxxxxxxxx

# Export from user account (auto-detects user vs org)
git-repo-sync export-config \
  --org torvalds \
  --working-dir ./linux-repos \
  --output torvalds-repos.yaml
```

**Important Note on Private Repositories:**
- **For your own user account**: When authenticated with a token, the tool will fetch **all your repositories** (public + private)
- **For other users**: Only **public repositories** are accessible (GitHub API limitation)
- **For organizations**: All repositories you have access to are fetched (public + private, if you're a member)

---

#### 🔍 `diff-config` - Check for New Items

Compares your **existing local configuration** against the current state on GitHub to find new repositories or branches that are missing from your config.

**Syntax:**
```bash
git-repo-sync diff-config \
  --config <existing_config.yaml> \
  --output <diff_file.yaml> \
  [--token <github_token>]
```

**Flags:**

| Flag | Alias | Required | Description |
|------|-------|----------|-------------|
| `--config` | `-c` | **Yes** | Path to your existing YAML configuration file. |
| `--output` | `-o` | No | Path where the diff report will be saved (default: `config_diff.yaml`). |
| `--token` | `-t` | No | GitHub Personal Access Token. |

**Examples:**

```bash
# Check for new repos/branches
git-repo-sync diff-config --config myorg.yaml

# Save diff to custom path
git-repo-sync diff-config --config myorg.yaml --output updates.yaml
```

**Output:**
The command generates a partial YAML file containing **only** the missing repositories and branches. You can copy-paste sections from this file directly into your main configuration.

---

#### 🔄 `apply-config` - Apply YAML Configuration

Performs sync, status checking, or cleanup operations based on your YAML configuration file.

**Syntax:**
```bash
git-repo-sync apply-config \
  --config <config_file.yaml> \
  --mode <status|sync|cleanup> \
  [--token <github_token>] \
  [--dry-run] \
  [--remove-unmapped] \
  [--yes]
```

**Flags:**

| Flag | Alias | Required | Description |
|------|-------|----------|-------------|
| `--config` | `-c` | **Yes** | Path to your YAML configuration file. |
| `--mode` | - | **Yes** | Operation mode: `status` (check only), `sync` (clone/pull repos), or `cleanup` (remove unmapped dirs). |
| `--token` | `-t` | No | GitHub Personal Access Token. **Required for private repositories**. Not needed for public repos or if using git credential storage. |
| `--dry-run` | - | No | Preview what would happen **without making any changes**. Highly recommended for first run. |
| `--remove-unmapped` | - | No | (With `cleanup` mode) Actually remove directories not in the config. Without this flag, cleanup only reports what would be removed. |
| `--yes` | `-y` | No | Skip confirmation prompts for destructive operations. Use with caution. |

**Examples:**

```bash
# Safe first run - see what would be synced without making changes
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode sync \
  --dry-run

# Actually perform the sync (public repos)
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode sync

# Sync with authentication for private repos
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode sync \
  --token $GITHUB_TOKEN

# Check status of all repos without making changes
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode status

# Check status with authentication (for private repos)
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode status \
  --token $GITHUB_TOKEN

# Preview what directories would be cleaned up
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode cleanup \
  --dry-run

# Actually remove unmapped directories (be careful!)
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode cleanup \
  --remove-unmapped \
  --yes
```

---

#### 📊 `status` - Quick Status Check (Shortcut Command)

Convenience command that's equivalent to `apply-config --mode status`.

**Syntax:**
```bash
git-repo-sync status --config <config_file.yaml> [--token <github_token>]
```

**Flags:**

| Flag | Alias | Required | Description |
|------|-------|----------|-------------|
| `--config` | `-c` | **Yes** | Path to your YAML configuration file. |
| `--token` | `-t` | No | GitHub Personal Access Token. **Required for private repositories**. |

**Examples:**

```bash
# Check status of all configured repositories (public repos)
git-repo-sync status --config myorg-config.yaml

# Check status with authentication (for private repos)
git-repo-sync status --config myorg-config.yaml --token $GITHUB_TOKEN

# This is exactly the same as:
git-repo-sync apply-config --config myorg-config.yaml --mode status --token $GITHUB_TOKEN
```

---

#### 🧹 `cleanup` - Remove Unmapped Directories (Shortcut Command)

Convenience command that's equivalent to `apply-config --mode cleanup`.

**Current status:** Cleanup mode is not yet fully implemented in the CLI. It does **not** delete directories, even when `--remove-unmapped` is passed, and should currently be treated as an experimental/status-style command rather than a destructive cleanup.

**Syntax:**
```bash
git-repo-sync cleanup \
  --config <config_file.yaml> \
  [--token <github_token>] \
  [--dry-run] \
  [--remove-unmapped] \
  [--yes]
```

**Flags:**

| Flag | Alias | Required | Description |
|------|-------|----------|-------------|
| `--config` | `-c` | **Yes** | Path to your YAML configuration file. |
| `--token` | `-t` | No | GitHub Personal Access Token (only needed if checking remote info for private repos). |
| `--dry-run` | - | No | Show what would be removed **without actually deleting**. |
| `--remove-unmapped` | - | No | Intended to remove directories not in the config (currently a no-op; see note above). |
| `--yes` | `-y` | No | Skip confirmation prompts. |

**Examples:**

```bash
# See what would be cleaned up (safe)
git-repo-sync cleanup --config myorg-config.yaml --dry-run

# Remove unmapped directories
git-repo-sync cleanup --config myorg-config.yaml --remove-unmapped

# This is exactly the same as:
git-repo-sync apply-config --config myorg-config.yaml --mode cleanup --remove-unmapped
```

---

---

### When Do You Need a Token?

The `--token` flag is required in different situations depending on the command:

| Command | Token Needed? | Why? |
|---------|---------------|------|
| `export-config` | **Yes** (for private repos) | Fetches repository list and branch information from GitHub API |
| `apply-config --mode sync` | **Yes** (for private repos) | Clones/pulls private repositories; public repos work without token if you use git credential storage |
| `status` | **Yes** (for private repos) | Fetches remote branch information to compare with local state |
| `cleanup` | Usually **No** | Only operates on local directories; token only needed if checking remote info |

**Alternative/authentication notes:** 
- For **API operations** (like `export-config` and remote status checks), a token (via `--token` or `GITHUB_TOKEN`) is required to access private repositories.
- For **git clone/pull operations**, authentication is handled by git itself (for example via `git config credential.helper`, SSH keys, or your OS keychain). The tool does **not** embed your token into clone URLs or write it to YAML/status files.

---

### Creating a GitHub Token

To access private repositories or avoid API rate limits:

1. Go to **GitHub Settings** → **Developer settings** → **Personal access tokens** → **Tokens (classic)**
2. Click **Generate new token (classic)**
3. Select scopes:
   - `repo` (Full control of private repositories)
   - `read:org` (Read organization membership - if using organizations)
4. Generate and copy the token (starts with `ghp_`)
5. **Store it securely** - you won't be able to see it again

Once created, you can **set it as the `GITHUB_TOKEN` environment variable** (for example `export GITHUB_TOKEN=ghp_xxxxxxxxxxxx`). The CLI and interactive mode will automatically use this token when `--token` is not provided on the command line.

**Using the token:**
```bash
# Set as environment variable (recommended)
export GITHUB_TOKEN=ghp_xxxxxxxxxxxx
git-repo-sync export-config --org myorg --working-dir ./_github --output config.yaml --token $GITHUB_TOKEN

# Or pass directly (less secure, visible in shell history)
git-repo-sync export-config --org myorg --working-dir ./_github --output config.yaml --token ghp_xxxxxxxxxxxx
```

---

### Common Workflows

**Workflow 1: First-time setup for an organization**
```bash
# Step 1: Export configuration from GitHub
git-repo-sync export-config \
  --org mycompany \
  --working-dir ./repos \
  --output mycompany.yaml \
  --token $GITHUB_TOKEN

# Step 2: Review and edit the generated YAML (optional)
vim mycompany.yaml  # Disable repos/branches you don't need

# Step 3: Preview what will be synced
git-repo-sync apply-config \
  --config mycompany.yaml \
  --mode sync \
  --dry-run \
  --token $GITHUB_TOKEN

# Step 4: Actually sync the repositories (with auth for private repos)
git-repo-sync apply-config \
  --config mycompany.yaml \
  --mode sync \
  --token $GITHUB_TOKEN
```

**Workflow 2: Daily sync to update local repos**
```bash
# Check status first (for private repos, add --token)
git-repo-sync status --config mycompany.yaml --token $GITHUB_TOKEN

# Just run sync - it will pull latest changes
git-repo-sync apply-config \
  --config mycompany.yaml \
  --mode sync \
  --token $GITHUB_TOKEN
```

**Workflow 3: Handling new repositories (The Diff Workflow)**
```bash
# Step 1: Check if there are new repos or branches on GitHub
git-repo-sync diff-config --config mycompany.yaml --output updates.yaml --token $GITHUB_TOKEN

# Step 2: Review updates.yaml
# It contains only the new items found on remote.

# Step 3: Copy the desired sections from updates.yaml into mycompany.yaml

# Step 4: Run sync to download the new items
git-repo-sync apply-config --config mycompany.yaml --mode sync --token $GITHUB_TOKEN
```

**Workflow 4: Managing multiple organizations**
```bash
# Export all organizations into one config file
git-repo-sync export-config \
  --org company1 \
  --org company2 \
  --org personal-account \
  --working-dir ./all-repos \
  --output combined.yaml \
  --token $GITHUB_TOKEN

# Sync everything at once (with auth for private repos)
git-repo-sync apply-config \
  --config combined.yaml \
  --mode sync \
  --token $GITHUB_TOKEN
```

**Workflow 4: Safe cleanup of old repositories**
```bash
# Step 1: See what would be removed
git-repo-sync cleanup --config mycompany.yaml --dry-run

# Step 2: Review the output carefully

# Step 3: Actually remove unmapped directories
git-repo-sync cleanup --config mycompany.yaml --remove-unmapped
```

**Workflow 5: Using in CI/CD or automation scripts**
```bash
#!/bin/bash
set -e  # Exit on error

# Check if repos are up to date
if ! git-repo-sync status --config production.yaml; then
  echo "⚠️  Repositories are out of sync!"
  exit 1
fi

echo "✅ All repositories are synchronized"
```

### YAML Configuration

Example configuration file:

```yaml
version: 1

global:
  working_directory: ./_github
  status_report_file: status/status.txt

organizations:
  - name: myorg
    type: org  # "org" for organizations, "user" for individual accounts
    base_output_dir: myorg
    repositories:
      - name: my-repo
        output_dir: null  # use default layout; set a path here to override
        enabled: true
        http_url: https://github.com/myorg/my-repo
        visibility: public
        about: "My repository"
        
        branches:
          - name: main
            enabled: true
            sync_mode: PULL_CHANGES
            comment: "Production branch"
          
          - name: develop
            enabled: true
            sync_mode: PULL_AND_RESET
            comment: "Mirror remote exactly"
          
          - name: old-feature
            enabled: false
            sync_mode: CHECK_ONLY
            comment: "Disabled, won't sync"
  
  - name: username
    type: user  # Individual GitHub user account
    base_output_dir: personal
    repositories:
      - name: dotfiles
        output_dir: null
        enabled: true
        http_url: https://github.com/username/dotfiles
        visibility: public
        about: "Personal dotfiles"
        
        branches:
          - name: main
            enabled: true
            sync_mode: PULL_CHANGES
            comment: "Personal configs"
```

**Configuration Notes:**

- `type`: Specifies whether the account is an `org` (organization) or `user` (individual account). 
  - Defaults to `org` if not specified (for backward compatibility).
  - Auto-detected when using `export-config` command.

- `output_dir`: Optional subdirectory within the organization's base directory.
  - `output_dir: null` (or omitted): use default layout → `working_directory / base_output_dir / repo_name`
  - `output_dir: "subdir"` (relative): adds subdirectory → `working_directory / base_output_dir / subdir / repo_name`
  - `output_dir: "path/to/dir"` (relative, multi-level): → `working_directory / base_output_dir / path / to / dir / repo_name`
  - `output_dir: "/absolute/path"` (absolute): overrides all → `/absolute/path / repo_name`
  - **Important:** The repository name always appears as the final directory before branches.

**Path Resolution Examples:**

| working_directory | base_output_dir | output_dir | repo_name | Final Path |
|-------------------|-----------------|------------|-----------|------------|
| `/home/user/repos` | `MR901` | `null` | `xyz` | `/home/user/repos/MR901/xyz/` |
| `/home/user/repos` | `MR901` | `posts` | `xyz` | `/home/user/repos/MR901/posts/xyz/` |
| `/home/user/repos` | `MR901` | `abc/pqr` | `xyz` | `/home/user/repos/MR901/abc/pqr/xyz/` |
| `/home/user/repos` | `MR901` | `/absolute` | `xyz` | `/absolute/xyz/` |

**Sync Modes:**
- `CHECK_ONLY` - Only compute status, no file changes
- `PULL_CHANGES` - Safe pull from remote, no aggressive cleanup
- `PULL_AND_RESET` - Mirror remote (reset + delete extra files)

<!-- Legacy v1 documentation moved to niu/ and no longer part of the main docs. -->

## Further Documentation

- **Getting started guide**: `docs/getting_started.md`
- **CLI reference**: `docs/cli_reference.md`
- **YAML configuration schema & status format**: `docs/YAML_SCHEMA.md`
- **Product requirements / architecture**: `docs/PRD.rst`
- **Examples**: `examples/`
