Metadata-Version: 2.3
Name: df_cereal
Version: 0.0.1
Summary: df_cereal - playing with dataframe serialization
Project-URL: Homepage, https://github.com/paddymul/df_cereal
Author: Paddy Mullen
License: Copyright (c) 2019 Bloomberg
        All rights reserved.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
License-File: LICENSE.txt
Keywords: IPython,Jupyter,Widgets,pandas
Classifier: Framework :: Jupyter
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.8
Requires-Dist: graphlib-backport>=1.0.0
Requires-Dist: ipywidgets<9,>=7.6.0
Provides-Extra: build
Requires-Dist: build; extra == 'build'
Requires-Dist: twine; extra == 'build'
Provides-Extra: docs
Requires-Dist: graphviz>=0.20.1; extra == 'docs'
Requires-Dist: sphinx>=1.5; extra == 'docs'
Provides-Extra: jupyterlab
Requires-Dist: jupyterlab>=3.6.0; extra == 'jupyterlab'
Provides-Extra: notebook
Requires-Dist: notebook>=7.0.0; extra == 'notebook'
Provides-Extra: polars
Requires-Dist: polars>=0.20.7; extra == 'polars'
Provides-Extra: test
Requires-Dist: hypothesis>=6.88.1; extra == 'test'
Requires-Dist: nbval>=0.9; extra == 'test'
Requires-Dist: pandas>=1.3.5; extra == 'test'
Requires-Dist: polars>=0.19.4; extra == 'test'
Requires-Dist: pyarrow; extra == 'test'
Requires-Dist: pydantic>=2.5.2; extra == 'test'
Requires-Dist: pytest-cov>=3; extra == 'test'
Requires-Dist: pytest>=6; extra == 'test'
Description-Content-Type: text/markdown

# DF_Cereal - Serialization testing ground

This is a stripped down repo to test different methods of dataframe serialization.  It aims to be a referencer implementation for serializing dataframes with pyarrow.


Dataframe serialization is hard, and it is the source of performance regresssions.  Arrow seems to be the way forward for dataframe libraries and for dataframe serialization.  This project is meant to be a colaborative reference for library authors who want to do high performance serialization.


## Planned features include

* A repo that demonstrates different ways to serialize dataframes, with MVP implementations that are easy to adapt
* Benchmarks for different serialization techniques
* Tests for all of this
* Examples of more complex dataframe constructs, and how they appear in JS.  Multi-indexes, TimeStamps, structures
* Simple documentation that is easy to follow




## notes

This repo is built on top of stripped down [buckaroo](https://github.com/paddymul/buckaroo) repo.  Some buckaroo artifacts might pop out here and there.
## Development installation

For a development installation:

```bash
git clone https://github.com/paddymul/df_cereal.git
cd df_cereal
#we need to build against 3.6.5, jupyterlab 4.0 has different JS typing that conflicts
# the installable still works in JL4
pip install build twine pytest sphinx polars mypy jupyterlab==3.6.5 pandas-stubs
pip install -ve .
```

Enabling development install for Jupyter notebook:


Enabling development install for JupyterLab:

```bash
jupyter labextension develop . --overwrite
```

Note for developers: the `--symlink` argument on Linux or OS X allows one to modify the JavaScript code in-place. This feature is not available with Windows.
`
### Developing the JS side

There are a series of examples of the components in [examples/ex](./examples/ex).



Instructions
```bash
npm install
npm run dev
```


## Contributions

We :heart: contributions.

Have you had a good experience with this project? Why not share some love and contribute code, or just let us know about any issues you had with it?

We welcome issue reports [here](../../issues); be sure to choose the proper issue template for your issue, so that we can be sure you're providing the necessary information.

