Metadata-Version: 2.1
Name: pzip
Version: 0.9.0
Summary: Crytographically secure file compression.
Home-page: https://github.com/imsweb/pzip
Author: Dan Watson
Author-email: watsond@imsweb.com
License: MIT
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Security
Classifier: Topic :: System :: Archiving :: Compression
Description-Content-Type: text/markdown
Requires-Dist: cryptography
Requires-Dist: tqdm

# PZip

PZip is an encrypted file format (with optional compression), a command-line tool, and a Python file-like interface.

## Command Line Usage

For a full list of options, run `pzip -h`. Basic usage is summarized below:

```
pzip --key keyfile sensitive_data.csv
pzip --key keyfile sensitive_data.csv.pz
```

## Python Usage

```python
import os
from pzip import PZip

key = os.urandom(32)
with PZip("myfile.pz", PZip.Mode.ENCRYPT, key) as f:
    f.write(b"sensitive data")

with PZip("myfile.pz", PZip.Mode.DECRYPT, key) as f:
    print(f.read())
```

## Encryption

PZip uses AES-GCM with 128-, 192-, or 256-bit (default) keys. Keys are derived using PBKDF2-SHA256 with a configurable
iteration count (currently 200,000) and a random salt per file. A random 128-bit nonce (GCM IV) is generated by default
for each file, but may also be supplied via the Python interface for systems that can more strongly guarantee
uniqueness. The key size, iteration count, salt, nonce/IV, and GCM authentication tag are stored in the PZip file
header. Additionally, the 128-bit nonce is prepended to the file contents when encrypting as a way to fail fast when
doing streaming decryption. The decrypted plaintext will still be authenticated via the tag at the end, but a fail-fast
mechanism is important when dealing with large files.

## Compression

PZip optionally compresses data using gzip at the default compression level. Nothing about the file format precludes
adding an option in the future to allow conifguration of the comprssion level, or even the compression algorithm.

## File Format

The PZip file format consists of a 68-byte header, followed by the encrypted file data, the first 16 bytes of which are
the nonce repeated. The header is big/network endian, with the following fields/sizes:

  * File identification (magic), 4 bytes - `PZIP`
  * File format version, 1 byte - currently `\x01`
  * Flags, 2 bytes (unsigned short bitfield) - currently only bit 0 is set when the file data is gzip-compressed
  * AES key size (in bytes), 1 byte - must be 16, 24, or 32
  * Plaintext size, 8 bytes (unsigned long long) - unencrypted/decompressed file size
  * PBKDF2 iterations (4 bytes, unsigned int/long)
  * PBKDF2 salt (16 bytes)
  * GCM nonce/IV (16 bytes)
  * GCM authentication tag (16 bytes)


