Metadata-Version: 2.3
Name: pykeedy
Version: 0.1.0
Summary: Tools for easy statistical analysis of the Voynich Manuscript
Author: Patrick Spencer
Author-email: Patrick Spencer <patrickwspencer@gmail.com>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: matplotlib>=3.10.6
Requires-Dist: numpy>=2.3.2
Requires-Dist: pydantic>=2.11.7
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: regex>=2025.8.29
Requires-Python: >=3.12
Description-Content-Type: text/markdown

<p align="center">
  <img src="assets/logo.png" />
</p>

A Python library for easy statistical analysis of the Voynich Manuscript.

## Installation
```
pip install pykeedy
```

Development:
```
git clone https://github.com/pwspen/pykeedy.git
cd pykeedy
pip install -e .
```

## Features
- Easily-loadable VMS transliteration (EVA or CUVA alphabet)
- Filtering by every available property (Currier language, illustration type, locus type, etc)
- Naibbe encoder supporting arbitrary encoding tables + decoder implementing algorithm from paper
- Handful of comparison manuscripts in European languages + easily load and chuck em all into the same analysis
- Plotter functions for common data types / analyses (see examples)
- Functions to easily calculate, at either character or word level:
    - n-gram frequency rank (most common character, word, character pair, word pair...)
    - n-gram co-occurence (aka, pair attraction for n=2)
    - entropy
        - single (Shannon), pair, conditional
    - Position distributions (letter in word, word in line, letter in page, etc)
- Highly composable and extensible
- Automatic analysis: [All of these plots were generated with a single function call](/analysis_summary.md) 
    - (see [examples/full_analysis.py](examples/full_analysis.py))

## Usage
```python
from pykeedy import VMS, LocusProp
from pykeedy.analysis import shannon_entropy, conditional_entropy
from pykeedy.utils import load_corpus, scatterplot
from pykeedy.crypt import naibbe_encrypt

vms = VMS.to_text()  # Single string
vms = VMS.to_lines() # List of line strings
vms = VMS.to_words() # List of word strings

# Processing options
vms = VMS.to_text(alphabet="cuva", normalize_gaps=False)

# Filter by any property, or many at once
voynich_b = VMS.filter(props=[LocusProp.CurrierLanguage.B]).to_text()

plains = load_corpus() # Loads set of comparison plaintexts 
                       # (Latin, German, Italian)

analyze = {}
for name, text in plains.items()
    analyze[name] = text
    # Encrypt plaintexts in method that best matches Voynich statistically
    analyze[name + '_naibbe_encrypt'] = naibbe_encrypt(text)
analyze['vms b'] = voynich_b # Add VMS

results = {name: (shannon_entropy(text), conditional_entropy(text))
                                    for name, text in analyze.items()}

scatterplot(results) # Saves to scatterplot.png
```

> [!WARNING] 
> This library is not yet stable. You may find bugs and updates may break things. Please report any bugs.

## Documentation
The current documentation is 10+ thoroughly commented examples which you can find in the [examples folder](examples/).

There are examples that plot each of the following in around 20 lines or less:
- Word frequency in Voynich A and B
- Character and conditional entropy for the VMS vs plaintext comparison manuscripts
- Word length distribution (token and type)
- Character and word co-occurence / pair attraction heatmaps
- Most common character & word pairs / triplets
- Character & word position distributions
- Entropies of VMS + plaintexts + Naibbe cipher encryptions of those plaintexts
- You can see all of the plots generated by examples in [examples/results.md](examples/results.md)

## Goals
Library goals:
1. Lower the barrier to entry for Voynich statistical analysis
    - There are some tools out there, but they're scattered, relatively basic, and far from user friendly, even for programmers. This library aims to be very capable while not sacrificing any usability.
2. Reproduce all the important statistical results in one place
    - There is a TON of analysis out there, but it's even more scattered, and a lot of it is just results with no reproduction available, which means it's difficult to tweak and experiment with and build on.

Issues & PRs welcome!

## Voynich resources
- [Voynich Forum](https://www.voynich.ninja/)
- [Most complete source of Voynich information](https://www.voynich.nu/)
    - [Voynich transliteration](https://www.voynich.nu/transcr.html)
    - [Page by page overview](http://voynich.nu/q01/index.html)
    - Existing Voynich software
        - [bitrans](http://www.voynich.nu/software/bitrans/Bitrans_manual.pdf) - performs plaintext substitutions
        - [IVTT](http://voynich.nu/software/ivtt/IVTT_manual.pdf) - CLI for filtering and removing metadata from IVTFF files
        - You should not need either of these if you are using this library
- [Most detailed Voynich scan](https://collections.library.yale.edu/catalog/2002046)
- [Scan browser](https://www.voynich.ninja/browser/default.cfm?v=1006075&r=1006082)
- [Multispectral imaging](https://manuscriptroadtrip.wordpress.com/2024/09/08/multispectral-imaging-and-the-voynich-manuscript/) [(direct link)](https://drive.google.com/drive/folders/1mNQGKQDSCR4M_c2M2JrsU5soghvYwMig)
- [Naibbe cipher paper (July 2025)](https://www.dropbox.com/scl/fo/2b39zi1f77tr9mc9p80rt/ADwDDHsLNG7WtT6O0sbN5_4?download=true&e=4&from_auth=login&preview=20250724+Naibbe+Cipher+Paper+Latest+Version.pdf&rlkey=5ap828aun23thr9pvznguzgor&st=88np74hd&dl=0)
    - This was the inspiration for this library. It's IMO a very big result and you should read it!
- [IVTFF format explanation](https://www.voynich.nu/software/ivtt/IVTFF_format.pdf)
- [Voynich Unicode](https://www.kreativekorp.com/software/fonts/voynich/)

## Wishlist / todo
### Advanced
- Automated topic analysis
- Measure of clustering strength for similar words
### Research directions
- Auto-solving Naibbe ciphers
- Simple reference self-citation algorithm
### General
- docstrings
- then generate real docs
- Find better corpus for comparison manuscripts
- More composable system for corpus
- Option to generate subplots
- More options for dealing with spaces and newlines in entropy, freq, cooccurence analysis
- STA alphabet support
- character and word pair attraction symmetry scalar
- Support for recognizing single glyphs in non-basic EVA
- Measure of Naibbe encoding ambiguity
- Voynich font alongside transliteration for plots
- basic testing suite
- support for bitrans rulefiles
- encoding scorer (ambiguity, reconstruction, simplicity, match to VMS)