Metadata-Version: 2.1
Name: dftxt
Version: 1.0.0
Summary: Human-friendly, VCS-friendly file format for Python Pandas and Polars DataFrames.
Home-page: https://github.com/rocketboosters/dftxt
License: Apache Version 2.0
Author: Scott Ernst
Author-email: swernst@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: File Formats
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Provides-Extra: all
Provides-Extra: pandas
Provides-Extra: polars
Requires-Dist: pandas (>=2.0.0) ; extra == "pandas" or extra == "all"
Requires-Dist: polars (>=0.20.5,<0.21.0) ; extra == "polars" or extra == "all"
Requires-Dist: pyarrow (>=14.0.2,<15.0.0) ; extra == "pandas" or extra == "all"
Requires-Dist: pytz (>=2020.1)
Project-URL: Documentation, https://github.com/rocketboosters/dftxt
Project-URL: Repository, https://github.com/rocketboosters/dftxt
Description-Content-Type: text/markdown

# dftxt

A Python library for a simple DataFrame text file format that facilitates easier
specification of a Pandas and Polars DataFrame in a human-readable text format for
use in testing and where source data is small and human managed. By, example here's
what the format would look like:

```
Name      Planet    Numeral  Mean Radius (km)  Discovery Year  Discoverer
          &&cat              &&float           &&Int
Moon 	    Earth 	  I  	     1738 	           None            None
Phobos 	  Mars 	    I 	     11.267            1877            Hall
Deimos 	  Mars 	    II       6.2 	             1877 	         Hall
Io 	      Jupiter   I        1821              1610 	         Galileo
Europa 	  Jupiter   II 	     1560              1610 	         Galileo
Ganymede 	Jupiter 	III      2634 	           1610  	         Galileo
Callisto 	Jupiter 	IV 	     2410 	   	       1610            Galileo
Amalthea  Jupiter   V        83.5              1892            Barnard
Himalia   Jupiter   VI       69.8              1904            Perrine
Mimas 	  Saturn 	  I        198.2             1789            Herschel
```

This is a fixed-width file format that uses two+ spaces separating column names to
define the width of each column.

The benefits of the format are:

## 1. Preserves DataFrame Structure

Most importantly, this format retains the necessary information to reload the DataFrame
in an identical fashion as the file was specified. This includes data types, column
ordering, and indexing (Pandas only as Polars has no index). In testing, it should be
possible to use `(pandas|polars).testing.assert_frame_equal()` on a loaded DataFrame
without any transformation when read from a file.

For example,

```
sku           price_usd  originally_released_on  product_name
&&int_index   &decimal   &date
109456        119.99     2023-07-09              Fancy Socks
450213        24.49      2020-11-12              Simple Socks
90210         299.99     1998-03-28              LA Heartthrob Socks
```

## 2. Human Friendly

The format is easy to read and modify by humans and requires little to no machine
characters in its specification. Whitespace is used as the delimiter - specifically
2+ spaces between column names - which also serves to align columns for easy
readability. Quoting and escaping are rarely needed as a result.

## 3. Diff/Code Review Friendly

TODO

