Metadata-Version: 2.4
Name: cobol-parser-mcp
Version: 0.1.1
Summary: MCP server that parses COBOL programs — extracts divisions, SQL, CICS commands, and generates AI-powered business logic summaries via any LLM
Author: Rohan Nair
License: MIT
Keywords: mcp,cobol,mainframe,modernization,cics,db2,parser,zos,migration
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp[cli]>=1.2.0
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.25.0; extra == "anthropic"
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: groq
Requires-Dist: openai>=1.0.0; extra == "groq"
Provides-Extra: all
Requires-Dist: anthropic>=0.25.0; extra == "all"
Requires-Dist: openai>=1.0.0; extra == "all"
Dynamic: license-file

# cobol-parser-mcp

MCP server that parses COBOL mainframe programs and extracts structured data plus AI-powered business logic summaries — works with any LLM provider.

```bash
pip install cobol-parser-mcp            # pure parsing, no LLM needed
pip install cobol-parser-mcp[anthropic] # + Claude
pip install cobol-parser-mcp[openai]    # + GPT-4o
pip install cobol-parser-mcp[groq]      # + Groq (fast + cheap)
pip install cobol-parser-mcp[all]       # + all providers
```

Part of the mainframe modernization pipeline — takes the `manifest.json` from [`mainframe-ingest-mcp`](https://pypi.org/project/mainframe-ingest-mcp/) and produces per-program JSON files ready for code generation.

---

## What it extracts

For every COBOL program:

| What | Details |
|------|---------|
| **IDENTIFICATION DIVISION** | Program ID, author, date written |
| **DATA DIVISION** | All working storage variables with level numbers and PIC clauses |
| **PROCEDURE DIVISION** | Every paragraph with line ranges and PERFORM references |
| **Embedded SQL** | Every SELECT/INSERT/UPDATE/DELETE with tables and columns |
| **Embedded CICS** | Every SEND/RECEIVE MAP, LINK, XCTL, READ, WRITE |
| **AI summary** | Plain-English description of what the program does (any LLM) |

---

## Claude Desktop config

```json
{
  "mcpServers": {
    "cobol-parser-mcp": {
      "command": "cobol-parser-mcp",
      "env": {
        "ANTHROPIC_API_KEY": "your-key-here"
      }
    }
  }
}
```

---

## Usage from Python

### Parse a single file (no API key needed)

```python
from cobol_parser_mcp.tools.parser import parse_cobol_file
import json

result = parse_cobol_file("/path/to/SS6001XX.cob")
print(json.dumps(result, indent=2))
```

### Parse entire codebase with AI summaries

```python
import asyncio
from cobol_parser_mcp.tools.batch import parse_all_programs

# Using Anthropic (Claude)
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",  # from mainframe-ingest-mcp
    output_dir="./parsed",
    ai_summaries=True,
    provider="anthropic",             # or openai, groq, ollama, custom
    api_key="sk-ant-...",             # or set ANTHROPIC_API_KEY env var
    max_ai_programs=20,
))
print(f"Parsed {result['parsed_ok']} of {result['total_programs']} programs")
```

### Using other LLM providers

```python
# OpenAI
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    provider="openai",
    api_key="sk-...",
    model="gpt-4o",
))

# Groq — fast and cheap
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    provider="groq",
    api_key="gsk_...",
))

# Ollama — fully local, no API key, no internet
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    provider="ollama",
    model="llama3",
))

# No AI at all — pure parsing only
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    ai_summaries=False,
))
```

---

## Output format

Each program produces a `<PROGRAM_NAME>.json` file:

```json
{
  "program": "SS6001XX",
  "line_count": 1048,
  "identification": {
    "program_id": "SS6001XX",
    "author": "STATE OF MARYLAND SDAT",
    "date_written": "1995-06-12"
  },
  "data_division": {
    "working_storage": [
      { "level": "1", "name": "WS-ENTITY-RECORD", "type": "GROUP" },
      { "level": "5", "name": "WS-ENTITY-ID",     "type": "PIC X(10)" }
    ],
    "copybooks_expanded": ["BUSENTIT", "SSCC03EQ", "DFHAID"]
  },
  "procedure_division": {
    "paragraph_count": 7,
    "paragraphs": [
      {
        "name": "0000-MAIN",
        "lines": "28-45",
        "calls_paragraphs": ["1000-INIT", "2000-PROCESS", "9999-EXIT"]
      }
    ]
  },
  "sql_statements": [
    {
      "type": "SELECT",
      "tables": ["BUSENTIT"],
      "columns": ["ENTITY_ID", "ENTITY_NAME", "STATUS_CD"],
      "where": "ENTITY_ID = :WS-ENTITY-ID",
      "line": 167
    }
  ],
  "cics_commands": [
    { "command": "RECEIVE MAP", "map": "SS6TMAP", "mapset": "SS6TMAP", "line": 152 },
    { "command": "LINK PROGRAM", "program": "SS6009XX", "line": 334 }
  ],
  "business_logic_summary": {
    "db2_tables_read":    ["BUSENTIT"],
    "db2_tables_written": ["BUSENTIT", "TRNSACTN"],
    "screens_used":       ["SS6TMAP", "SS6XMAP"],
    "programs_called":    ["SS6009XX"],
    "estimated_complexity": "HIGH",
    "ai_summary": {
      "purpose": "Handles online business entity inquiry and status update for CICS terminal users",
      "business_domain": "Business Entity Registration",
      "user_facing": true,
      "key_operations": [
        "Receive user input from SS6TMAP screen",
        "Query BUSENTIT table by entity ID",
        "Update entity status in BUSENTIT",
        "Log transaction to TRNSACTN",
        "Link to SS6009XX for downstream processing"
      ],
      "modernization_notes": "Maps cleanly to GET /api/entity/{id} and PUT /api/entity/{id}/status REST endpoints"
    }
  }
}
```

A `_index.json` summary file is also written with stats across all programs.

---

## Supported LLM providers

| Provider | Default model | API key env var | Notes |
|----------|--------------|-----------------|-------|
| `anthropic` | `claude-sonnet-4-20250514` | `ANTHROPIC_API_KEY` | Default |
| `openai` | `gpt-4o` | `OPENAI_API_KEY` | |
| `groq` | `llama3-70b-8192` | `GROQ_API_KEY` | Fast and cheap |
| `ollama` | `llama3` | none | Fully local, free |
| `custom` | `gpt-4o` | `OPENAI_API_KEY` | Any OpenAI-compatible endpoint, pass `base_url` |

---

## Part of the modernization pipeline

```
mainframe-ingest-mcp  →  manifest.json
        ↓
cobol-parser-mcp      →  per-program JSON files   ← you are here
        ↓
bms-to-angular-mcp    →  Angular components
db2-schema-mcp        →  PostgreSQL schema
```

---

## License

MIT
