Metadata-Version: 2.1
Name: epstein-files
Version: 1.9.2
Summary: Tools for working with the Jeffrey Epstein documents released in November 2025.
Home-page: https://michelcrypt4d4mus.github.io/epstein_text_messages/
License: GPL-3.0-or-later
Keywords: Epstein,Jeffrey Epstein
Author: Michel de Cryptadamus
Requires-Python: >=3.11,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: cairosvg (>=2.8.2,<3.0.0)
Requires-Dist: datefinder (>=0.7.3,<0.8.0)
Requires-Dist: inflection (>=0.5.1,<0.6.0)
Requires-Dist: pdfalyzer[extract] (>=1.19.6,<2.0.0)
Requires-Dist: python-dateutil (>=2.9.0.post0,<3.0.0)
Requires-Dist: python-dotenv (>=1.2.1,<2.0.0)
Requires-Dist: requests (>=2.32.5,<3.0.0)
Requires-Dist: rich (>=14.2.0,<15.0.0)
Requires-Dist: rich-argparse-plus (>=0.3.1.4,<0.4.0.0)
Project-URL: Emails, https://michelcrypt4d4mus.github.io/epstein_text_messages/all_emails_epstein_files_nov_2025.html
Project-URL: Metadata, https://michelcrypt4d4mus.github.io/epstein_text_messages/file_metadata_epstein_files_nov_2025.json
Project-URL: Repository, https://github.com/michelcrypt4d4mus/epstein_text_messages
Project-URL: TextMessages, https://michelcrypt4d4mus.github.io/epstein_text_messages
Project-URL: WordCounts, https://michelcrypt4d4mus.github.io/epstein_text_messages/communication_word_count_epstein_files_nov_2025.html
Description-Content-Type: text/markdown

# Color Highlighted Epstein Emails and Text Messages

![joi](https://github.com/michelcrypt4d4mus/epstein_text_messages/raw/master/docs/joi_ito_gavin_is_clever_epstein_funds_bitcoin_dev_team.png)

* The various views of The Epstein Files generated by this code can be seen [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/).
* [I Made Epstein's Text Messages Great Again (And You Should Read Them)](https://cryptadamus.substack.com/p/i-made-epsteins-text-messages-great) (a post about this project).
* [A Rather Alarming Epstein / Russia / Israel Crypto Timeline](https://cryptadamus.substack.com/p/the-epsteincrypto-timeline-is-alarming) (a post about various alarming things that is based on this collection of documents)
* [Maybe The Russian Bots Were Jeffrey Epstein This Whole Time](https://cryptadamus.substack.com/p/maybe-the-russian-bots-were-jeffrey) (another post about the hackers in Dubai Epstein hired to do social media work during the 2016 election)

## Usage
#### Installation
Use `poetry install` for easiest time installing. `pip install epstein-files` should also work, though `pipx install epstein-files` is usually better.

Then there's two options as far as the data:
1. To work with the data set included in this repo copy the pickled data file into place: `cp ./the_epstein_files.pkl.gz ./the_epstein_files.local.pkl.gz`
1. To parse your own files:
   1. Requires you have a local copy of the OCR text files from the House Oversight document release in a directory `/path/to/epstein/ocr_txt_files`. You can download those OCR text files from [the Congressional Google Drive folder](https://drive.google.com/drive/folders/1ldncvdqIf6miiskDp_EDuGSDAaI_fJx8) (make sure you grab both the `001/` and `002/` folders).
   1. (Optional) If you want to work with the documents released by DOJ on January 30th 2026 you'll need to also download some of the PDF files from [the DOJ site](https://www.justice.gov/epstein/doj-disclosures) (they're in the "Epstein Files Transparency Act" section). You don't need them all, just the ones you want to look at and make ASCII art with. But you will need to get the OCR text out o them somehow. I use [pdfalyzer](https://github.com/michelcrypt4d4mus/pdfalyzer).
   **IMPORTANT** if


#### Command Line Tools
You need to set the `EPSTEIN_DOCS_DIR` environment variable with the path to the folder of files you just downloaded when running. You can either create a `.env` file modeled on [`.env.example`](./.env.example) (which will set it permanently) or you can run with:

```bash
EPSTEIN_DOCS_DIR=path/to/source_data/ epstein_generate --help
```

To work with the January 2026 DOJ documents you'll also need to set the `EPSTEIN_DOJ_TXTS_20260130_DIR` env var to point at folders full of OCR extracted texts from the raw DOJ PDFs. If you have the PDFs but not the text files there's [a script](scripts/extract_doj_pdfs.py) that can help you take care of that (it launches [pdfalyzer](https://github.com/michelcrypt4d4mus/pdfalyzer) on PDFs it finds in the hierarchy).

```bash
EPSTEIN_DOCS_DIR=path/to/source_data/ EPSTEIN_DOJ_TXTS_20260130_DIR=/path/to/doj/files/ epstein_generate --help
```

**NOTE**: In order to get the generated links to the DOJ site and [Jmail](https://jmail.world) to work correctly you will need to sort the PDFs into the same datasets they are found in on the DOJ's website. You should have folders like this:

```
└── source_data/
    ├── DataSet 1
    ├── DataSet 2
    ├── DataSet 3
    ├── DataSet 4
    ├── DataSet 5
    ├── DataSet 6
    ├── DataSet 7
    ├── DataSet 8
    ├── DataSet 9
    ├── DataSet 10
    ├── DataSet 11
    └── DataSet 12
```

Within the `DataSet N` folders the PDFs can be sorted however you want (the folders will be recursively scanned for the pattern `**/*.pdf`).


## Doing Things
All the tools that come with the package require `EPSTEIN_DOCS_DIR` to be set. These are the available tools:

```bash
# Generate color highlighted texts/emails/other files
epstein_generate

# Search for a string:
epstein_grep Bannon
# Or a regex:
epstein_grep '\bSteve\s*Bannon|Jeffrey\s*Epstein\b'

# Show a file with color highlighting of keywords:
epstein_show 030999
# Show both the highlighted and raw versions of the file:
epstein_show --raw 030999
# The full filename is also accepted:
epstein_show HOUSE_OVERSIGHT_030999

# Count words used by Epstein and Bannon
epstein_show --output-word-count --name 'Jeffrey Epstein' --name 'Steve Bannon'

# Diff two epstein files after all the cleanup (stripping BOMs, matching newline chars, etc):
epstein_diff 030999 020442
```

The first time you run anything it will take a few minutes to fix all the janky OCR text, attribute the redacted emails, etc. After that things will be quick.

The commands used to build the various sites that are deployed on Github Pages can be found in [`deploy.sh`](./deploy.sh).

Run `epstein_generate --help` for command line option assistance.

**Optional:** There are a handful of emails that I extracted from the legal filings they were contained in. If you want to include these files in your local analysis you'll need to copy those files from the repo into your local document directory. Something like:

```bash
cp ./emails_extracted_from_legal_filings/*.txt "$EPSTEIN_DOCS_DIR"
```


#### As A Library
```python
from epstein_files.epstein_files import EpsteinFiles
epstein_files = EpsteinFiles.get_files()

# All files
for document in epstein_files.documents:
    do_stuff(document)

# Emails
for email in epstein_files.emails:
    do_stuff(email)

# iMessage Logs
for imessage_log in epstein_files.imessage_logs:
    do_stuff(imessage_log)

# Other Files
for file in epstein_files.other_files:
    do_stuff(file)
```

# Everyone Who Sent or Received an Email in the November Document Dump
![emails](https://github.com/michelcrypt4d4mus/epstein_text_messages/raw/master/docs/emailers_info_table.png)


# TODO List
See [TODO.md](TODO.md).

