Metadata-Version: 2.1
Name: pandas-streaming
Version: 0.5.1
Summary: Array (and numpy) API for ONNX
Home-page: https://github.com/sdpython/pandas-streaming
Author: Xavier Dupré
Author-email: xavier.dupre@gmail.com
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: C
Classifier: Programming Language :: Python
Classifier: Topic :: Software Development
Classifier: Topic :: Scientific/Engineering
Classifier: Development Status :: 5 - Production/Stable
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
License-File: LICENSE.txt
Requires-Dist: pandas

pandas-streaming: streaming API over pandas
===========================================

.. image:: https://ci.appveyor.com/api/projects/status/4te066r8ne1ymmhy?svg=true
    :target: https://ci.appveyor.com/project/sdpython/pandas-streaming
    :alt: Build Status Windows

.. image:: https://dev.azure.com/xavierdupre3/pandas_streaming/_apis/build/status/sdpython.pandas_streaming
    :target: https://dev.azure.com/xavierdupre3/pandas_streaming/

.. image:: https://badge.fury.io/py/pandas_streaming.svg
    :target: http://badge.fury.io/py/pandas_streaming

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
    :alt: MIT License
    :target: https://opensource.org/license/MIT/

.. image:: https://codecov.io/gh/sdpython/pandas-streaming/branch/main/graph/badge.svg?token=0caHX1rhr8 
    :target: https://codecov.io/gh/sdpython/pandas-streaming

.. image:: http://img.shields.io/github/issues/sdpython/pandas_streaming.png
    :alt: GitHub Issues
    :target: https://github.com/sdpython/pandas_streaming/issues

.. image:: https://pepy.tech/badge/pandas_streaming/month
    :target: https://pepy.tech/project/pandas_streaming/month
    :alt: Downloads

.. image:: https://img.shields.io/github/forks/sdpython/pandas_streaming.svg
    :target: https://github.com/sdpython/pandas_streaming/
    :alt: Forks

.. image:: https://img.shields.io/github/stars/sdpython/pandas_streaming.svg
    :target: https://github.com/sdpython/pandas_streaming/
    :alt: Stars

.. image:: https://img.shields.io/github/repo-size/sdpython/pandas_streaming
    :target: https://github.com/sdpython/pandas_streaming/
    :alt: size

`pandas-streaming <https://sdpython.github.io/doc/pandas-streaming/dev/>`_
aims at processing big files with `pandas <https://pandas.pydata.org/>`_,
too big to hold in memory, too small to be parallelized with a significant gain.
The module replicates a subset of *pandas* API
and implements other functionalities for machine learning.

.. code-block:: python

    from pandas_streaming.df import StreamingDataFrame
    sdf = StreamingDataFrame.read_csv("filename", sep="\t", encoding="utf-8")

    for df in sdf:
        # process this chunk of data
        # df is a dataframe
        print(df)

The module can also stream an existing dataframe.

.. code-block:: python

    import pandas
    df = pandas.DataFrame([dict(cf=0, cint=0, cstr="0"),
                           dict(cf=1, cint=1, cstr="1"),
                           dict(cf=3, cint=3, cstr="3")])

    from pandas_streaming.df import StreamingDataFrame
    sdf = StreamingDataFrame.read_df(df)

    for df in sdf:
        # process this chunk of data
        # df is a dataframe
        print(df)

It contains other helpers to split datasets into
train and test with some weird constraints.
