Metadata-Version: 2.1
Name: vaex
Version: 3.0.0
Summary: Out-of-Core DataFrames to visualize and explore big tabular datasets
Home-page: https://www.github.com/maartenbreddels/vaex
Author: Maarten A. Breddels
Author-email: maartenbreddels@gmail.com
License: MIT
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: vaex-core (<3,>=2.0.0)
Requires-Dist: vaex-viz (<0.5,>=0.4.0)
Requires-Dist: vaex-server (<0.4,>=0.3.0)
Requires-Dist: vaex-hdf5 (<0.7,>=0.6.0)
Requires-Dist: vaex-astro (<0.8,>=0.7.0)
Requires-Dist: vaex-arrow (<0.6,>=0.5.0)
Requires-Dist: vaex-jupyter (<0.6,>=0.5.0)
Requires-Dist: vaex-ml (<0.10,>=0.9.0)


[![Documentation](https://readthedocs.org/projects/vaex/badge/?version=latest)](https://docs.vaex.io)

# What is Vaex?

Vaex is a high performance Python library for lazy **Out-of-Core DataFrames**
(similar to Pandas), to visualize and explore big tabular datasets. It
calculates *statistics* such as mean, sum, count, standard deviation etc, on an
*N-dimensional grid* for more than **a billion** (`10^9`) samples/rows **per
second**. Visualization is done using **histograms**, **density plots** and **3d
volume rendering**, allowing interactive exploration of big data. Vaex uses
memory mapping, zero memory copy policy and lazy computations for best
performance (no memory wasted).

# Key features
## Instant opening of Huge data files (memory mapping)
[HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) and [Apache Arrow](https://arrow.apache.org/) supported.

![opening1a](https://user-images.githubusercontent.com/1765949/82818563-31c1e200-9e9f-11ea-9ee0-0a8c1994cdc9.png)


![opening1b](https://user-images.githubusercontent.com/1765949/82820352-49e73080-9ea2-11ea-9153-d73aa399d329.png)

[Read the documentation on how to efficiently convert your data](https://docs.vaex.io/en/latest/example_io.html) from CSV files, Pandas DataFrames, or other sources.


Lazy streaming from S3 supported in combination with memory mapping.

![opening1c](https://user-images.githubusercontent.com/1765949/82820516-a21e3280-9ea2-11ea-948b-07df26c4b5d3.png)


## Expression system
Don't waste memory or time with feature engineering, we (lazily) transform your data when needed.


![expression](https://user-images.githubusercontent.com/1765949/82818733-70f03300-9e9f-11ea-80b0-ab28e7950b5c.png)



## Out-of-core DataFrame
Filtering and evaluating expressions will not waste memory by making copies; the data is kept untouched on disk, and will be streamed only when needed. Delay the time before you need a cluster.


![occ-animated](https://user-images.githubusercontent.com/1765949/82821111-c6c6da00-9ea3-11ea-9f9e-498de8133cc2.gif)

## Fast groupby / aggregations
Vaex implements parallelized, highly performant `groupby` operations, especially when using categories (>1 billion/second).


![groupby](https://user-images.githubusercontent.com/1765949/82818807-97ae6980-9e9f-11ea-8820-41dd4441057a.png)


## Fast and efficient join
Vaex doesn't copy/materialize the 'right' table when joining, saving gigabytes of memory. With subsecond joining on a billion rows, it's pretty fast!

![join](https://user-images.githubusercontent.com/1765949/82818840-a268fe80-9e9f-11ea-8ba2-6a6d52c4af88.png)

## More features

 * Remote DataFrames (documentation coming soon)
 * Integration into [Jupyter and Voila for interactive notebooks and dashboards](https://vaex.readthedocs.io/en/latest/tutorial_jupyter.html)
 * [Machine Learning without (explicit) pipelines](https://vaex.readthedocs.io/en/latest/tutorial_ml.html)


# Learn how to use Vaex efficiently
 * [Follow our tutorials](https://docs.vaex.io/en/latest/tutorials.html)
 * Watch our more recent talks:
   * [PyData London 2019](https://www.youtube.com/watch?v=2Tt0i823-ec)
   * [SciPy 2019](https://www.youtube.com/watch?v=ELtjRdPT8is)
 * Contact us for training or enterprise support at https://vaex.io/


