Metadata-Version: 2.4
Name: pytde
Version: 0.1.1
Summary: Python parser for legacy Tableau Data Extract (TDE) files
Author-email: Ron Reiter <ron@ronreiter.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/ronreiter/pytde
Project-URL: Repository, https://github.com/ronreiter/pytde
Project-URL: Issues, https://github.com/ronreiter/pytde/issues
Keywords: tableau,tde,data-extract,pandas,dataframe
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.0.0
Requires-Dist: numpy>=1.18.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Dynamic: license-file

# pytde

A Python parser for legacy Tableau Data Extract (TDE) files.

## Overview

`pytde` reads TDE files (the legacy Tableau data format used before Hyper) and converts them to pandas DataFrames. This is useful for accessing data from older Tableau extracts without needing Tableau Desktop or Server.

## Installation

```bash
pip install pytde
```

Or install from source:

```bash
git clone https://github.com/ronreiter/pytde.git
cd pytde
pip install -e .
```

## Quick Start

```python
from pytde import read_tde

# Read a TDE file
tables = read_tde('data.tde')

# Get the Extract table as a DataFrame
df = tables['Extract']

# Work with the data
print(df.head())
print(df.describe())
```

## API Reference

### `read_tde(file_path: str) -> dict[str, pd.DataFrame]`

Read a TDE file and return its contents as pandas DataFrames.

```python
from pytde import read_tde

tables = read_tde('sales_data.tde')
df = tables['Extract']
```

### `read_tde_metadata(file_path: str) -> dict`

Read TDE file metadata without fully parsing the data.

```python
from pytde import read_tde_metadata

metadata = read_tde_metadata('sales_data.tde')
print(metadata['columns'])       # List of column names
print(metadata['column_types'])  # Column data types
print(metadata['file_size'])     # File size in bytes
print(metadata['format_version']) # TDE format version
```

### `TDEParser` class

For more control over parsing, use the `TDEParser` class directly:

```python
from pytde import TDEParser

parser = TDEParser('data.tde')
tables = parser.parse()
metadata = parser.get_metadata()

# Access internal state
print(parser.xml_metadata)  # Embedded XML schema
print(parser.column_entries)  # Column index entries
```

## Command Line Interface

`pytde` includes a CLI tool for quick inspection of TDE files:

```bash
pytde data.tde
```

Output:
```
Parsing TDE file: data.tde
------------------------------------------------------------
Format version: 2
File size: 51306 bytes
Columns found: ['Region', 'Sales', 'Sales Person']
Column types: {'Region': 'string', 'Sales': 'double', 'Sales Person': 'string'}

============================================================
Table: Extract
Shape: (43, 3)
Columns: ['Region', 'Sales', 'Sales Person']
...
```

## Supported Data Types

| TDE Type | Python/Pandas Type |
|----------|-------------------|
| string   | object (str)      |
| double   | float64           |
| integer  | int64             |
| date     | object*           |
| datetime | object*           |
| boolean  | bool              |

*Date/datetime support is limited in the current version.

## TDE File Format

TDE files are binary files that store columnar data optimized for Tableau's data engine. Key features:

- Little-endian byte ordering
- Block-based structure with markers (`f0ca1278` for data, `f1ca1278` for index)
- Dictionary encoding for string columns
- Embedded XML metadata for schema definitions

For detailed format specification, see [TDE.MD](TDE.MD).

## Limitations

- **Index decoding**: String column row-to-value mapping uses fallback distribution when exact indices cannot be decoded
- **Date/time columns**: Limited support for date and datetime types
- **Compression**: Some compressed TDE files may not be fully supported
- **Large files**: Memory usage scales with file size (entire file is loaded into memory)

## Development

### Setup

```bash
git clone https://github.com/ronreiter/pytde.git
cd pytde
pip install -e ".[dev]"
```

### Running Tests

```bash
pytest
```

### Running Tests with Coverage

```bash
pytest --cov=pytde --cov-report=html
```

## License

MIT License - see [LICENSE](LICENSE) for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Acknowledgments

This parser was developed through reverse engineering of the TDE file format. Special thanks to the open source community for their work on understanding proprietary file formats.
