Metadata-Version: 2.1
Name: spectrify
Version: 2.0.0
Summary: Tools for working with Redshift Spectrum.
Home-page: https://github.com/hellonarrativ/spectrify
Author: The Narrativ Company, Inc.
Author-email: engineering@narrativ.com
License: MIT license
Keywords: spectrify
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: boto3
Requires-Dist: ciso8601
Requires-Dist: Click
Requires-Dist: future
Requires-Dist: pandas
Requires-Dist: psycopg2
Requires-Dist: pyarrow (>=0.9.0)
Requires-Dist: python-dateutil (<2.7.0,>=2.1)
Requires-Dist: s3fs
Requires-Dist: sqlalchemy
Requires-Dist: sqlalchemy-redshift (>=0.7.1)
Requires-Dist: unicodecsv

=========
Spectrify
=========


.. image:: https://img.shields.io/pypi/v/spectrify.svg
    :target: https://pypi.python.org/pypi/spectrify

.. image:: https://img.shields.io/travis/hellonarrativ/spectrify.svg
    :target: https://travis-ci.org/hellonarrativ/spectrify

.. image:: https://readthedocs.org/projects/spectrify/badge/?version=latest
    :target: https://spectrify.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status


A simple yet powerful tool to move your data from Redshift to Redshift Spectrum.


* Free software: MIT license
* Documentation: https://spectrify.readthedocs.io.


Features
--------

One-liners to:

* Export a Redshift table to S3 (CSV)
* Convert exported CSVs to Parquet files in parallel
* Create the Spectrum table on your Redshift cluster
* **Perform all 3 steps in sequence**, essentially "copying" a Redshift table Spectrum in one command.

S3 credentials are specified using boto3. See http://boto3.readthedocs.io/en/latest/guide/configuration.html

Redshift credentials are supplied via environment variables, command-line parameters, or interactive prompt.

Install
--------

.. code-block:: bash

    $ pip install spectrify


Command-line Usage
------------------

Export Redshift table `my_table` to a folder of CSV files on S3:

.. code-block::

    $ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb export my_table \
        's3://example-bucket/my_table'

Convert exported CSVs to Parquet:

.. code-block::

    $ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb convert my_table \
        's3://example-bucket/my_table'

Create Spectrum table from S3 folder:

.. code-block::

    $ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb create_table \
        's3://example-bucket/my_table' my_table my_spectrum_table

Transform Redshift table by performing all 3 steps in sequence:

.. code-block::

    $ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb transform my_table \
        's3://example-bucket/my_table'


Python Usage
------------

Export to S3:

.. code-block:: python


    from spectrify.export import RedshiftDataExporter
    RedshiftDataExporter(sa_engine, s3_config).export_to_csv('my_table')

Convert exported CSVs to Parquet:

.. code-block:: python

    from spectrify.convert import ConcurrentManifestConverter
    from spectrify.utils.schema import SqlAlchemySchemaReader
    sa_table = SqlAlchemySchemaReader(engine).get_table_schema('my_table')
    ConcurrentManifestConverter(sa_table, s3_config).convert_manifest()

Create Spectrum table from S3 parquet folder:

.. code-block:: python

    from spectrify.create import SpectrumTableCreator
    from spectrify.utils.schema import SqlAlchemySchemaReader
    sa_table = SqlAlchemySchemaReader(engine).get_table_schema('my_table')
    SpectrumTableCreator(sa_engine, dest_schema, dest_table_name, sa_table, s3_config).create()

Transform Redshift table by performing all 3 steps in sequence:

.. code-block:: python

    from spectrify.transform import TableTransformer
    transformer = TableTransformer(engine, 'my_table', s3_config, dest_schema, dest_table_name)
    transformer.transform()

Contribute
----------
Contributions always welcome! Read our guide on contributing here: http://spectrify.readthedocs.io/en/latest/contributing.html

License
-------
MIT License. Copyright (c) 2017, The Narrativ Company, Inc.


=======
History
=======

2.0.0 (2019-03-09)
------------------

* Default to 256MB files
* Flag for unicode support on Python 2.7 (performance implications)
* Drop support for Python 3.4
* Support for additional CSV format parameters
* Support for REAL data type


1.0.1 (2018-07-12)
------------------

* Loosen version requirement for PyArrow
* Add example script
* Update documentation


1.0.0 (2018-04-20)
------------------

* Move functionality into classes to make customizing behavior easier
* Add support for DATE columns
* Add support for DECIMAL/NUMERIC columns
* Upgrade to pyarrow v0.9.0


0.4.1 (2018-03-25)
------------------

* Fix exception when source table is not in schema public


0.4.0 (2018-02-25)
------------------

* Upgrade to pyarrow v0.8.0
* Verify Redshift column types are supported before attempting conversion
* Bugfix: Properly clean up multiprocessing.pool resource


0.3.0 (2017-10-30)
------------------

* Support 16- and 32-bit integers
* Packaging updates


0.2.1 (2017-09-27)
------------------

* Fix Readme


0.2.0 (2017-09-27)
------------------

* First release on PyPI.


0.1.0 (2017-09-13)
------------------

* Didn't even make it to PyPI.


