Metadata-Version: 2.4
Name: cogiles
Version: 0.2.0
Summary: COGILES - Colored Graph Input Line Entry System: a SMILES-inspired string notation for colored graphs
License-Expression: MIT
License-File: LICENSE
Keywords: colored-graph,graph,networkx,notation,smiles
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.10
Requires-Dist: networkx>=2.6
Requires-Dist: parsimonious>=0.10
Provides-Extra: dev
Requires-Dist: bump-my-version>=1.2; extra == 'dev'
Requires-Dist: ipykernel; extra == 'dev'
Requires-Dist: jupyter-collaboration==4.0.2; extra == 'dev'
Requires-Dist: jupyter-mcp-tools>=0.1.4; extra == 'dev'
Requires-Dist: jupyterlab==4.4.1; extra == 'dev'
Requires-Dist: nox>=2022.1.7; extra == 'dev'
Requires-Dist: pycrdt>=0.12.17; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.5; extra == 'viz'
Requires-Dist: scipy>=1.7; extra == 'viz'
Description-Content-Type: text/markdown

# COGILES — Colored Graph Input Line Entry System

## What is COGILES?

COGILES is a compact, human-readable string notation for defining colored graphs. It is strongly inspired by SMILES (Simplified Molecular Input Line Entry System) used in chemistry for representing molecular structures, but adapted for arbitrary colored graphs rather than molecules.

A COGILES string encodes both the **structure** (nodes and edges) and the **visual attributes** (node colors) of a graph in a single line of text.

## Origin

COGILES was originally developed as part of the [visual_graph_datasets](https://github.com/aimat-lab/visual_graph_datasets) library (introduced in version 0.13.0). The goal of this project is to extract it into a **standalone library** that can be imported and used independently by other programs.

## Syntax

### Node Types

The set of node types is **fixed and opinionated** — like SMILES, COGILES defines a specific alphabet and users take it or leave it.

Following the SMILES convention, common colors use a single uppercase letter. Colors whose first letter collides with another use a two-letter code (uppercase + lowercase), similar to how SMILES distinguishes elements like `B` (boron) from `Br` (bromine).

| Letter | Color   | RGB                |
|--------|---------|--------------------|
| `R`    | Red     | (1.0, 0.0, 0.0)   |
| `G`    | Green   | (0.0, 0.5, 0.0)   |
| `B`    | Blue    | (0.0, 0.0, 1.0)   |
| `Y`    | Yellow  | (1.0, 1.0, 0.0)   |
| `C`    | Cyan    | (0.0, 1.0, 1.0)   |
| `M`    | Magenta | (1.0, 0.0, 1.0)   |
| `W`    | White   | (1.0, 1.0, 1.0)   |
| `K`    | Black   | (0.0, 0.0, 0.0)   |
| `O`    | Orange  | (1.0, 0.5, 0.0)   |
| `P`    | Purple  | (0.5, 0.0, 0.5)   |
| `T`    | Teal    | (0.0, 0.5, 0.5)   |
| `L`    | Lime    | (0.5, 1.0, 0.0)   |
| `I`    | Indigo  | (0.3, 0.0, 0.5)   |
| `Bl`   | Black   | (0.0, 0.0, 0.0)   |
| `Br`   | Brown   | (0.6, 0.3, 0.0)   |
| `Gr`   | Gray    | (0.5, 0.5, 0.5)   |
| `Pi`   | Pink    | (1.0, 0.4, 0.7)   |
| `Ol`   | Olive   | (0.5, 0.5, 0.0)   |

Note: `K` and `Bl` are both valid for Black (`K` follows the CMYK convention).

### Sequential Connection

Nodes placed next to each other in the string are automatically connected by an edge.

- `RRRGGG` — a chain of 6 nodes: 3 red followed by 3 green

### Branches

Parentheses `()` create branches. The first node inside a branch connects to the node immediately before the opening parenthesis. Inside a branch, normal sequential rules apply.

- `RRR(BB)RRRR` — a main chain of 7 red nodes with a side branch of 2 blue nodes attached to the 3rd red node
- `Y(G)(G)(G)` — a star: yellow center with 3 green leaves
- `BB(GG)CC` — 2 blue nodes, then a branch of 2 green nodes and a continuation of 2 cyan nodes

### Anchors

Anchors are written as `-N` (dash followed by an integer) and placed after a node. All nodes sharing the same anchor number are connected, enabling **cycles** and other non-tree structures.

- `R-1RRRRR-1` — a cycle of 6 red nodes (first and last connected via anchor 1)
- `R-1-2GGG-2BBB-1` — first node has two anchors, creating connections to distant nodes

Rules:
- A single anchor with a given number has no effect; at least two nodes must share the same anchor number.
- When an anchor number appears more than twice, all subsequent occurrences connect to the node where the anchor **first** appeared.
- A single node can have multiple anchors (e.g., `R-1-2`).

### Breaks (Disconnected Components)

A period `.` between two nodes prevents the default sequential connection, allowing the definition of **multiple disconnected graphs** in one string.

- `R-1RR-1.RR(GG)R` — two separate graph components: a triangle and a branching structure

## Grammar (PEG)

The COGILES syntax is formally defined as a PEG grammar (parsed with the `parsimonious` library):

```
graph           = (branch / node / anchor / break)*

branch_node     = (break_node / node) branch+
anchor_node     = (branch_node / break_node / node) anchor+
break_node      = break node

branch          = lpar graph rpar
lpar            = "("
rpar            = ")"

break           = "."
anchor          = ~r"-[\d]+"
node            = ~r"(Bl|Br|Gr|Pi|Ol|[RGBYKCMWOPTLI])"
```

## Graph Representation

Parsing a COGILES string produces a **NetworkX graph** (`nx.Graph`). Each node has a `color` attribute storing its RGB tuple. Edges are unweighted — no edge attributes are stored.

```python
import cogiles

G = cogiles.parse("R-1RRRRR-1")
# G is a nx.Graph with 6 nodes
# G.nodes[0]["color"] == (1.0, 0.0, 0.0)
# G has 6 edges forming a cycle
```

Encoding converts a NetworkX graph back into a canonical COGILES string:

```python
s = cogiles.encode(G)
# s == "R-1RRRRR-1"  (deterministic canonical form)
```

## Planned Library Scope

The standalone `cogiles` library should provide:

1. **Parsing** — `cogiles.parse(string) -> nx.Graph` — decode a COGILES string into a NetworkX graph
2. **Encoding** — `cogiles.encode(graph) -> string` — encode a NetworkX graph back into a canonical COGILES string (deterministic, starting from the lowest-index node)
3. **Node types** — a fixed, opinionated mapping between letters, color names, and RGB tuples (to be finalized)
4. **Validation** — a custom `CogilesParseError` exception with descriptive messages and position info for malformed strings

## Design Decisions

| Decision | Choice |
|----------|--------|
| Package name | `cogiles` (`import cogiles`) |
| Graph representation | NetworkX (`nx.Graph`) |
| Parser library | `parsimonious` (PEG grammar) |
| Node types | Fixed set of 17 colors (13 single-letter + 4 two-letter + 1 alias), not extensible |
| Edge attributes | Not included — edges are unweighted |
| Error handling | Custom `CogilesParseError` with position info |
| Testing | `pytest` + `nox` |
| Packaging | `pyproject.toml` only (PEP 621) |

## Dependencies

- `networkx` — for graph representation
- `parsimonious` — for PEG grammar parsing

### Dev Dependencies

- `pytest` — test framework
- `nox` — test runner / task automation
