Metadata-Version: 2.1
Name: nptab
Version: 3.0.0
Summary: SQL atop numpy arrays represented as tables. Tables logic forked from github.com/BastiaanBergman/nptab
Home-page: https://github.com/javadba/nptab
Author: Stephen Boesch
Author-email: javadba@gmail.com
License: MIT
Project-URL: Source Code, https://github.com/javadba/nptab
Keywords: numpy sql table
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/x-rst
Requires-Dist: numpy
Requires-Dist: tabulate

About Nptab
============

Lightweight, intuitive and fast data-tables.

*Nptab* data-tables are tables with columns and column names, rows and row
numbers. Indexing and slicing your data is analogous to numpy array's. The
only real difference is that each column can have its own data type.


Design objectives
-----------------

I got frustrated with pandas: it's complicated slicing syntax (.loc, .x,
.iloc, .. etc), it's enforced index column and the Series objects I get when I
want a numpy array. With Nptab I created the simplified pandas I need for many
of my data-jobs. Just focussing on simple slicing of multi-datatype tables and
basic table tools.

* Intuitive simple slicing.

* Using numpy machinery, for best performance, integration with other tools
  and future support.

* Store data by column numpy arrays (column store).

* No particular index column, all columns can be used as the index, the choice
  is up to the user.

* Fundamental necessities for sorting, grouping, joining and appending tables.


Install
========

pip install nptab

Quickstart
===========

init
----

To setup a Nptab:

>>> from nptab import Nptab
>>> nptab = Nptab([ ["John", "Joe", "Jane"],
...                [1.82,1.65,2.15],
...                [False,False,True]], columns = ["Name", "Height", "Married"])
>>> nptab
 Name   |   Height |   Married
--------+----------+-----------
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
3 rows ['<U4', '<f8', '|b1']

Alternatively, Tabls can be setup from dictionaries, numpy arrays, pandas
DataFrames, or no data at all. Database connectors usually return data as a list
of records, the module provides a convenience function to transpose this into a
list of columns.

slice
-----

Slicing can be done the numpy way, always returning Nptab objects:

>>> nptab[1:3,[0,2]]
 Name   |   Married
--------+-----------
 Joe    |         0
 Jane   |         1
2 rows ['<U4', '|b1']

Slices will always return a Nptab except in three distinct cases, when:

1. explicitly one column is requested, a numpy array is returned:

>>> nptab[1:3,'Name']       # doctest: +SKIP
array(['Joe', 'Jane'],
      dtype='<U4')

2. explicitly one row is requested, a tuple is returned:

>>> nptab[0,:]
('John', 1.82, False)

3. explicitly one element is requested:

>>> nptab[0,'Name']
'John'

In general, slicing is intuitive and does not deviate from what would expect
from numpy. With the one addition that columns can be referred to by names as
well as numbers.

set
----

Setting elements works the same as slicing:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab[0,"Name"] = "Jos"
>>> nptab
 Name   |   Height |   Married
--------+----------+-----------
 Jos    |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
3 rows ['<U4', '<f8', '|b1']

The datatype that the value is expected to have, is the same as the datatype a
slice would result into.

Adding columns, works the same as setting elements, just give it a new name:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab['new'] = [1,2,3]
>>> nptab
 Name   |   Height |   Married |   new
--------+----------+-----------+-------
 John   |     1.82 |         0 |     1
 Joe    |     1.65 |         0 |     2
 Jane   |     2.15 |         1 |     3
3 rows ['<U4', '<f8', '|b1', '<i8']

Or set the whole column to the same value:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab['new'] = 13
>>> nptab
 Name   |   Height |   Married |   new
--------+----------+-----------+-------
 John   |     1.82 |         0 |    13
 Joe    |     1.65 |         0 |    13
 Jane   |     2.15 |         1 |    13
3 rows ['<U4', '<f8', '|b1', '<i8']

Just like numpy, slices are not actual copies of the data, rather they are
references.

append Nptab and row
---------------------

Tabls can be appended with other Tabls:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab += nptab
>>> nptab
 Name   |   Height |   Married
--------+----------+-----------
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
6 rows ['<U4', '<f8', '|b1']

Or append rows as dictionary:

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab.row_append({'Height':1.81, 'Name':"Jack", 'Married':True})
>>> nptab
 Name   |   Height |   Married
--------+----------+-----------
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
 Jack   |     1.81 |         1
4 rows ['<U4', '<f8', '|b1']


instance properties
--------------------

Your data is simply stored as a list of numpy arrays and can be accessed or
manipulated like that (just don't make a mess):

>>> nptab = Nptab({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> nptab.columns
['Name', 'Height', 'Married']
>>> nptab.data        # doctest: +SKIP
[array(['John', 'Joe', 'Jane'],
      dtype='<U4'), array([ 1.82,  1.65,  2.15]), array([False, False,  True], dtype=bool)]

Further the basic means to asses the size of your data:

>>> nptab.shape
(3, 3)
>>> len(nptab)
3

pandas
-------

For for interfacing with the popular datatable framework, going back and forth
is easy:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':range(3),'b':range(10,13)})
>>> df
   a   b
0  0  10
1  1  11
2  2  12

To make a Nptab from a DataFrame, just supply it to the initialize:

>>> nptab = Nptab(df)
>>> nptab
   a |   b
-----+-----
   0 |  10
   1 |  11
   2 |  12
3 rows ['<i8', '<i8']

The dict property of Nptab provides a way to make a DataFrame from a Nptab:

>>> df = pd.DataFrame(nptab.dict)
>>> df
   a   b
0  0  10
1  1  11
2  2  12


Dependencies
============

* numpy
* tabulate (optional, recommended)
* pandas (optional, for converting back and forth to DataFrames)

Tested on:
----------

* Python 3.8.2;  numpy 1.18.1


Contributing to Nptab
=====================
Nptab is perfect already, no more contributions needed. Just kidding!

See the repository for filing issues and proposing enhancements.

 - pytest ::

    cd nptab/test
    conda activate py38
    pytest

 - pylint ::

    cd nptab/
    ./pylint.sh

 - doctest ::

    cd nptab/docs
    make doctest

 - sphynx ::

    cd nptab/docs
    make html

 - setuptools/pypi ::

    python setup.py sdist bdist_wheel
    twine upload dist/nptab-*

Contributors
============

* Stephen Boesch [javadba@gmail.com]
* For the original `tabel` logic: Bastiaan Bergman [Bastiaan.Bergman@gmail.com].



