Metadata-Version: 2.4
Name: cubesquare
Version: 1.0.0
Summary: Instant Semantic Memory - 167M tokens/sec, 0.04ms queries, 100% deterministic
Author-email: Cubesquare <hello@cubesquare.io>
License: MIT
Project-URL: Homepage, https://cubesquare.io
Project-URL: Repository, https://github.com/cubesquare/cubesquare
Keywords: semantic-memory,ai,llm,vector-database,deterministic,morton-encoding,spatial-indexing,i-logic
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: native
Requires-Dist: numpy>=1.20.0; extra == "native"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Provides-Extra: all
Requires-Dist: cubesquare[dev,native]; extra == "all"
Dynamic: license-file

# Cubesquare

**Embeddings drift. Vectors collapse. Cubesquare doesn't.**

## New Users Start Here

```bash
# 1. Verify installation (8 checks)
python examples/first_run.py

# 2. Read the quickstart guide
# -> QUICKSTART.md

# 3. See RAG vs CST comparison
python examples/rag_vs_cst_comparison.py

# 4. Run all validation (119 checks total)
pytest tests/ -v                              # 68 tests
python agent_sandbox/agent_in_loop_validation.py  # 43 checks
```

| Resource | Purpose |
|----------|---------|
| [QUICKSTART.md](QUICKSTART.md) | Beginner guide - start here |
| [CHANGELOG.md](CHANGELOG.md) | Recent updates |
| [CLAUDE.md](CLAUDE.md) | Full technical reference for agents |
| [examples/first_run.py](examples/first_run.py) | Verify your installation |

---

A deterministic semantic memory engine where every piece of content hashes to a **permanent 3D coordinate** - same input, same position, forever. No model, no index rebuild, no equidistance collapse at scale.

```python
from cubesquare import CSTMemory

memory = CSTMemory()
memory.write("/src/auth.py")              # 5.97 nanoseconds
results = memory.query("authentication")  # <0.04ms
print(results[0].basin)                   # 42 - semantic cluster
```

## Why Not Just Use Embeddings?

CST isn't a faster vector database — it's a **deterministic coordinate system** that eliminates the need for training, cloud infrastructure, and probabilistic matching.

| Problem at scale | Embeddings + Vector DB | Cubesquare CST |
|---|---|---|
| **Identity at scale** | Millions of vectors crowd together - distinct concepts become indistinguishable | Every input has a unique, permanent coordinate - identity preserved at any scale |
| **Model dependency** | Change the embedding model -> all vectors shift, queries break | No model. The hash IS the identity |
| **Determinism** | Same text -> different vectors across model versions | Same text -> same coordinate, every machine, every run, forever |
| **Query cost** | O(N) approximate nearest neighbor - 10-50ms+ at scale | O(1) direct computation - 0.04ms |
| **Dependencies** | Embedding model + vector DB + GPU/cloud | Zero (Python stdlib). Optional C backend for 83M ops/sec |
| **Drift over time** | Fine-tuning or retraining shifts the entire space | Coordinates are permanent by construction |

> **The equidistance problem:** As embedding databases grow to millions of entries, vectors cluster into indistinguishable regions. Two completely different concepts produce nearly-identical vectors, and no amount of reranking recovers the lost identity. CST doesn't have this problem - distinct content maps to distinct coordinates, always.

## How It Works

```
Content                              Familiar analog
  -> Hash (FNV-1a)                   like SHA / content fingerprint
  -> 3D Coordinate (Morton code)     like Geohash, but 3D
  -> Basin (mod 131)                 like a consistent hash bucket
  -> Layer (L1-L9)                   like an index namespace
```

**The coordinate IS the meaning.** No lookup tables, no search, no approximation. Pure computation.

### Concepts -> Things You Already Know

| Cubesquare | What it does | Nearest analog | Why CST is different |
|---|---|---|---|
| **Token** | Content -> 32-bit hex ID | Content hash / fingerprint | Deterministic - same content, same ID forever |
| **Morton code** | Pack 3D coords into one integer | Geohash / S2 cell ID | Preserves 3D locality - nearby coords stay nearby |
| **Basin** (0-130) | Cluster tokens into 131 groups | Consistent hash ring | Fixed buckets - no rebalancing, no drift, no collapse |
| **Layer** (L1-L9) | Semantic type of a cluster | Index namespace / shard | Derived from basin - emergent, not assigned |
| **I-Logic** (5 states) | Predict routing before execution | Finite state machine | Algebraically closed - composable, no side effects |
| **SRL** (t = sqrt(d)) | Collapse deep iterations | Skip list / HNSW shortcut | Exact formula, not heuristic - sqrt(d) replaces O(log N) |

## Installation

```bash
pip install .
```

Development mode:

```bash
pip install -e .
```

Zero required dependencies — runs entirely on Python stdlib.

## Quick Start

```python
from cubesquare import CSTMemory

memory = CSTMemory()

# Write content - returns a permanent coordinate, not a row ID
token = memory.write("/src/auth/login.py")
print(f"Hex: {token.hex_id}, Basin: {token.basin}, Layer: L{token.layer}")

# Query by meaning - O(1) computation, not O(N) vector search
results = memory.query("authentication", limit=5)
for r in results:
    print(f"  {r.path} (basin={r.basin})")

# Navigate by layer
siblings = memory.L1.siblings(token)    # L1: Geometric - file hierarchy
similar  = memory.L2.similar(token)     # L2: Semantic - meaning clusters
nearby   = memory.L3.nearby(token)      # L3: Spatial - coordinate proximity
```

## 9 Semantic Layers

Every token belongs to a **layer** based on its basin — like an automatic namespace that emerges from the content itself.

| Layer | Name | Basins | Answers the question |
|-------|------|--------|---------------------|
| **L1** | Geometric | 0-14 | What is its structure? |
| **L2** | Semantic | 15-29 | What does it mean? |
| **L3** | Spatial | 30-43 | Where is it? |
| **L4** | Functional | 44-58 | What does it do? |
| **L5** | Probabilistic | 59-72 | How certain is it? |
| **L6** | Emotional | 73-87 | What state is it in? |
| **L7** | Frequency | 88-101 | When does it occur? |
| **L8** | Electromagnetic | 102-116 | What forces act on it? |
| **L9** | Membrane | 117-130 | What are its boundaries? |

Three additional **agent-facing layers** (L10-L12) handle self-identity, cross-agent communication, and unified field queries. See `docs/LAYERS_MAP.md` for full details.

## I-Logic: Algebraic Routing

A 5-state algebra for predicting routing outcomes **before execution** - like a finite state machine where you can compose states and know the result without running the pipeline.

| State | Value | Meaning |
|-------|-------|---------|
| DIRECT | 1 | Pass through unchanged |
| INVERSE | -1 | Reflect / negate |
| SPIRAL | +i | Rotate / expand |
| ANTI_SPIRAL | -i | Counter-rotate / focus |
| ABSORBED | 0 | Terminate (loop exit) |

```python
from cubesquare.i_logic import predict_route

# Predict routing outcome without execution
result = predict_route("gateway:DIRECT,firewall:INVERSE,sink:ABSORBED")
print(result['final_state'])  # ABSORBED - terminated at step 3
```

## Performance

| Tier | Level | Throughput | When to use |
|------|-------|------------|-------------|
| **T0** | Atomic (BMI2) | 111B positions/sec | Per-thread instruction |
| **T1** | C Internal | 83M ops/sec | Base throughput - use for planning |
| **T2** | Batch (1000) | 57M ops/sec | Production - always batch |
| **T3** | Single call | 0.3M ops/sec | AVOID - 190x slower than batching |

```python
# RIGHT: Batch through C - 57M ops/sec
from cubesquare._core import morton_encode_batch_raw, create_coords_array
coords = create_coords_array([100,200,0, 101,201,0, 102,202,0])
results = morton_encode_batch_raw(coords, 3)

# WRONG: Python loop - 190x slower
# for coord in coords:
#     process(coord)
```

### Query Latency Comparison

| System | Type | Latency |
|--------|------|---------|
| **Cubesquare CST** | Semantic lookup | **0.04ms** |
| In-memory hash | Exact key only | ~0.001ms |
| FAISS (GPU) | ANN search | 1-10ms |
| Pinecone | Cloud vector DB | 10-50ms |
| Elasticsearch | Full-text search | 20-150ms |

> CST achieves sub-millisecond **semantic** queries through deterministic encoding. Hash lookups are faster but only support exact-key matching, not semantic search.

## CLI

```bash
cubesquare --status                       # System status
cubesquare benchmark                      # Run benchmark
cubesquare i_logic status                 # I-Logic state
cubesquare i_logic multiply SPIRAL INVERSE
cubesquare primitives status              # O(1) primitive checks
```

35 command shells available. Run `cubesquare --help` for the full list.

## Testing

```bash
# Run all tests
python tests/run_all.py

# Or with pytest
pytest tests/ -v
```

Verified: **68/68 tests pass**. 400/400 Morton roundtrips. 131/131 basin coverage. 5/5 I-Logic algebra properties. 22 identity preservation tests.

## Documentation

| Document | Description |
|----------|-------------|
| [LAYERS_MAP.md](docs/LAYERS_MAP.md) | 9 CST layers + 12-layer basin manifold |
| [ROUTING_MAP.md](docs/ROUTING_MAP.md) | Visual MCP -> Shell -> Layer routing |
| [MCP_TOOLS_MAP.md](docs/MCP_TOOLS_MAP.md) | Complete mapping of 33 MCP tools |
| [VERIFIED_PRIMITIVES.md](docs/VERIFIED_PRIMITIVES.md) | O(1) primitive proofs |
| [mcp_tools.json](docs/mcp_tools.json) | Programmatic MCP tool definitions |
| [layers.json](docs/layers.json) | Programmatic layer definitions |

## License

MIT License - see LICENSE file.

---

*Invented by Jesus Fernandez. Offline-first. Training-free. 100% Deterministic.*
