Metadata-Version: 2.4
Name: lxcore
Version: 0.1.0
Summary: Fast crash-safe binary key-value database backed by mmap
Author-email: Lunax <airexsystem@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/LaxerProg17/lxcore
Project-URL: Repository, https://github.com/LaxerProg17/lxcore
Project-URL: Issue Tracker, https://github.com/LaxerProg17/lxcore/issues
Keywords: database,storage,indexing
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: System :: Filesystems
Classifier: Intended Audience :: Developers
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS :: MacOS X
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# LXcore

A fast, crash-safe binary key-value database written in pure Python. Yup, only Python.

No dependencies. No server. One file on disk.

---

## Why this exists

I was building AI assistants in Python. Each one needed to read and write memory — conversation history, facts, state — constantly and fast. At first I used JSON files. They worked until two things started happening at the same time: the AI was writing so frequently that the file would be mid-rewrite when the next read came in, and the whole thing would corrupt. I switched to SQLite. That worked for a while until the AI started hitting it fast enough that the database would lock, the write would queue, reads would block, and eventually the whole thing would stall.

The real problem was that both JSON and SQLite were designed around the assumption that writes are infrequent enough to not matter. For an AI that reads and writes memory on every single step, that assumption breaks fast.

I looked at LMDB, RocksDB, and similar embedded databases. They solved the problem but I wanted something I owned completely — something I could reshape for any project, understand every byte of, and not depend on C extensions or external binaries for.

So I built lxcore.

---

## How it works

Every record is written as raw binary to a single file:

```
STATUS(1B) | KEY_LEN(2B) | VAL_LEN(4B) | KEY | VALUE | CRC32(4B)
```

The file is memory-mapped using Python's `mmap` module. Reads are near zero-copy — the OS handles the buffering. Writes go directly into the mapped region without closing and reopening the file. The file grows in 256KB chunks to avoid a syscall on every append, and gets truncated to the exact data size on close.

An in-memory index (`dict`) maps every key to its byte offset in the file. On startup, the file is scanned once linearly to rebuild the index. After that, every read is a single dictionary lookup followed by a seek — O(1).

Deletes work by flipping one byte on the target record (a tombstone) and removing the key from the index. The dead record stays on disk until you call `compact()`, which rewrites the file atomically using a temp file and `os.replace`.

Updates try an in-place overwrite first (when the new value is the same size as the old one). If the size changed, it tombstones the old record and appends a new one.

A CRC32 is stored with every record. On load, records with a bad checksum are skipped cleanly — a crash mid-write leaves a truncated tail that gets ignored, not a corrupt database.

---

## Installation

```bash
pip install lxcore
```

On Debian/Ubuntu if pip complains (What always happens to me):

```bash
pip install lxcore --break-system-packages
```

---

## Usage

### Open a database

```python
from lxcore import LxDB

db = LxDB("mydata.lxdb")
```

This creates the Database file, it only contains the header

### Raw API — bytes in, bytes out

```python
db.create("name", "lunax")        # creates; raises KeyError if key exists
db.write("name", "lunax")         # upsert, never raises
db.read("name")                   # returns bytes or None
db.update("name", "lxcore")        # updates; raises KeyError if missing
db.delete("name")                 # returns True if deleted, False if not found
db.exists("name")                 # bool
```

### Typed API — store Python objects directly

```python
db.set("user", {"name": "lunax", "active": True})
db.set("count", 42)
db.set("tags", ["ai", "python"])
db.set("score", 3.14)
db.set("raw", b"\xde\xad\xbe\xef")

db.get("user")    # → {"name": "lunax", "active": True}
db.get("count")   # → 42
db.get("missing") # → None
```

Supports: `dict`, `list`, `str`, `int`, `float`, `bool`, `None`, `bytes`.

### Batch API — one lock acquire for N operations

```python
db.batch_set({"key_1": "a", "key_2": "b", "key_3": [1, 2, 3]})
db.batch_create({"new_1": 1, "new_2": 2})
db.batch_delete(["key_1", "key_2"])
```

the `db.batch_create` raises KeyError on first duplicate
the `db.batch_delete` returns count deleted

Batch operations are significantly faster than looping single calls because the lock is acquired once and all records are written in a single capacity check + sequential mmap write.

### Namespaces — logical sub-collections in one file

```python
memory = db.ns("memory")
config = db.ns("config")

memory.set("turn_1", {"role": "user", "text": "hello"})
config.set("theme", "dark")

memory.get("turn_1")   # → {"role": "user", "text": "hello"}
config.get("theme")    # → "dark"

memory.keys()          # → ["turn_1"]   (scoped, prefix stripped)
memory.items()         # → [("turn_1", {...})]
memory.count()         # → 1
memory.clear()         # deletes everything in this namespace only

# namespaces are isolated from each other and from root keys
db.set("theme", "light")
config.get("theme")    # → "dark"   (unaffected)
db.get("theme")        # → "light"  (unaffected)
```

Internally, namespace keys are stored as `namespace\x00key`. The null byte separator makes collisions impossible.

### Iteration

```python
db.keys()    # list of all live keys (bytes)
db.values()  # list of all deserialized values
db.items()   # list of (key_bytes, deserialized_value) tuples
```

### Maintenance

```python
db.compact()
db.flush()
db.close()
```

The `db.compact` rewrite file without tombstones, reclaim disk space
`db.flush` force-sync dirty mmap pages to disk immediately
And `db.close` flush + truncate padding + close file handle

### Context manager

```python
with LxDB("mydata.lxdb") as db:
    db.set("key", "value")
```

---

## Performance

Measured on an Intel HD Graphics Linux machine with 10,000 records:

```
ainax@lunax:~/Documents/Projects/PY/lxcore/tests$ python basic.py

============================================================
  [RAW] CREATE  10000 records
============================================================
  Time: 0.1735s  |        57,640 ops/sec
  [OK]   Spot-check 10 keys correct
  [OK]   Duplicate create blocked

============================================================
  [RAW] READ  10000 records
============================================================
  Time: 0.1993s  |        50,174 ops/sec
  [OK]   All 10000 values verified

============================================================
  [RAW] UPDATE same-size  10000 records
============================================================
  Time: 0.2743s  |        36,456 ops/sec
  [OK]   All 10000 in-place updates verified

============================================================
  [RAW] UPDATE diff-size  10000 records
============================================================
  Time: 0.6065s  |        16,488 ops/sec
  [OK]   All 10000 diff-size updates verified
  [OK]   Update on missing key blocked

============================================================
  [RAW] WRITE upsert  10000 records
============================================================
  Time: 0.4798s  |        20,844 ops/sec
  [OK]   All 10000 upserts verified

============================================================
  [RAW] DELETE  5000 records (first half)
============================================================
  Time: 0.1510s  |        33,111 ops/sec
  [OK]   All 5000 deleted keys return None
  [OK]   All 5000 surviving keys intact
  [OK]   Re-delete returns: False (expected False)

============================================================
  [TYPED] SET  10000 dicts
============================================================
  Time: 0.5615s  |        17,808 ops/sec

============================================================
  [TYPED] GET  10000 dicts
============================================================
  Time: 0.4281s  |        23,358 ops/sec
  [OK]   All 10000 dicts verified

============================================================
  [TYPED] All native types
============================================================
  [OK]   str, int, float, bool, None, list, bytes — all correct

============================================================
  [BATCH] batch_set  10000 dicts
============================================================
  Time: 0.3452s  |        28,972 ops/sec
  [OK]   All 10000 batch_set values verified

============================================================
  [BATCH] batch_create  10000 records
============================================================
  Time: 0.3093s  |        32,329 ops/sec
  [OK]   batch_create duplicate blocked

============================================================
  [BATCH] batch_delete  10000 records
============================================================
  Time: 0.2395s  |        41,750 ops/sec
  [OK]   batch_delete returned 10000/10000
  [OK]   All 10000 batch-deleted keys return None

============================================================
  [NAMESPACE] ns.set  10000 records across 2 namespaces
============================================================
  Time: 0.5282s  |        18,932 ops/sec
  [OK]   mem.count()=5000  logs.count()=5000

============================================================
  [NAMESPACE] ns.get  10000 records
============================================================
  Time: 0.4729s  |        21,144 ops/sec
  [OK]   All 5000 mem records verified

============================================================
  [NAMESPACE] Isolation check
============================================================
  [OK]   No collision between ns('memory') and root keys
  [OK]   mem and logs namespaces fully isolated

============================================================
  [NAMESPACE] ns.batch_set + ns.batch_delete
============================================================
  Time: 0.3751s  |        26,658 ops/sec
  [OK]   mem.count() after batch_set: 15000
  Time: 0.2391s  |        41,818 ops/sec
  [OK]   ns.batch_delete returned 10000/10000

============================================================
  [NAMESPACE] ns.clear()
============================================================
  Time: 0.1226s
  [OK]   logs.count() after clear: 0
  [OK]   mem untouched: mem.count()=5000

============================================================
  COMPACT
============================================================
  Time: 1.2657s  |             1 ops/sec
  Before: 3,670,016 B  |  After: 1,156,146 B  |  Reclaimed: 2,513,870 B (68.5%)
  [OK]   All mem records intact after compact

============================================================
  REOPEN (cold index rebuild)
============================================================
  Time: 0.2876s  |             3 ops/sec
  Keys in index: 20008
  [OK]   All mem records correct after reopen
  [OK]   typed values intact: t_int=42  t_str='hello lunax'

============================================================
  TIMING SUMMARY
============================================================
  Operation                  Time          ops/sec
  ------------------------------------------------
  create                  0.1735s           57,640
  read                    0.1993s           50,174
  update_same             0.2743s           36,456
  update_diff             0.6065s           16,488
  write                   0.4798s           20,844
  delete                  0.1510s           33,111
  typed_set               0.5615s           17,808
  typed_get               0.4281s           23,358
  batch_set               0.3452s           28,972
  batch_create            0.3093s           32,329
  batch_delete            0.2395s           41,750
  ns_set                  0.5282s           18,932
  ns_get                  0.4729s           21,144
  ns_batch_set            0.3751s           26,658
  ns_batch_delete         0.2391s           41,818
  ns_clear                0.1226s                —
  compact                 1.2657s                —
  reopen                  0.2876s                —

ainax@lunax:~/Documents/Projects/PY/lxcore/tests$ 

```

Typed operations are slower than raw because of JSON encoding/decoding. Batch operations beat their single-call equivalents because of reduced lock contention and fewer capacity checks.

---

## File format

```
Offset  Size  Field
────────────────────────────────────────
0       4B    Magic: "LXDB"
4       2B    Version (big-endian uint16)
6       4B    Reserved
──────── file header (10 bytes) ────────

Per record:
0       1B    Status: 0x01 live | 0x00 dead
1       2B    Key length (big-endian uint16)
3       4B    Value length (big-endian uint32)
7       N     Key bytes
7+N     M     Value bytes
7+N+M   4B    CRC32 (big-endian uint32, over all preceding record bytes)
```

Values stored via the typed API have a 1-byte type tag prepended:
- `0x00` — raw bytes
- `0x01` — JSON-encoded UTF-8

---

## Project structure

```
lxcore/
├── api/
│   ├── serial.py
│   ├── namespace.py
│   ├── create.py
│   ├── delete.py
│   ├── read.py
│   ├── update.py
│   └── write.py
├── core/
│   ├── apis.py       # LxDB class
│   └── engine.py     # mmap engine, index, compaction
├── database/
│   ├── reader.py     # record parsing, file scan
│   └── writer.py     # record packing, file header
└── __init__.py
```

---

## License

See `LICENSE`.
