Metadata-Version: 2.1
Name: pg2avro
Version: 0.2
Summary: Utility generating avro files from postgres.
Home-page: https://github.com/kiwicom/pg2avro
Author: Milan Lukac
Author-email: milan.lukac@kiwi.com
License: MIT
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Environment :: Plugins
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: sqlalchemy (>=1.2)
Requires-Dist: psycopg2 (>=2.7)

# pg2avro

Postgres to Avro generator.

## Features

- Generate Avro schema from column definition.
- Generate  data format consumable for Avro serialization.

# Usage

## Generating schema

Method: `pg2avro.get_avro_schema`

```
get_avro_schema(
    "mytable", 
    "public", 
    [
        # Dictionary mode
        {
            "name": "column_name_1",
            "type": "int2",
            "nullable": False,
        },
        # SqlAlchemy mode
        SqlAlchemyColumn(ARRAY(TEXT), name="column_name_2"),
        ...
    ]
)

```

Schema generator needs the following information:
- table name
- namespace (`schema` in SQL, `dataset` in Big Query etc.)
- columns - iterable of columns, each element with:
    - name
    - type - `_` prefix is used to indicate array types
    - nullable (optional, `True` assumed if not provided)
- column mapping - optional `ColumnMapping` object with column mappings (see below for more info).

Column data can be passed in multiple formats.

### Supported column formats

- Dictionary with required keys and data
- SqlAlchemy Column object
- Any object with compatible attributes and required data
- Dictionary or object with required data, but without compatible attributes/keys, supplied with ColumnMapping.

Note: this mode supports **generating schema from raw postgres data** - `udt_name` can be used to generate the schema.
```
columns = [
    CustomColumn(name="column_name", udt_name="int2", is_nullable=False),
]

get_avro_schema(
    table_name,
    namespace,
    columns,
    ColumnMapping(name="name", type="udt_name", nullable="is_nullable"),
)
```

## Generating rows data

Method: `pg2avro.get_avro_row_dict`

This method requires rows data and schema to generate the rows with.

### Supported row formats

- Dictionary with keys corresponding to schema field names
- Object with keys corresponding to schema field names (works the same as dictionary with corresponding fields)
- Tuple with data in the same order as fields specified in schema

```
columns = [
    {"name": "name", "type": "varchar", "nullable": False},
    {"name": "number", "type": "float4", "float4", "nullable": False},
]
schema = get_avro_schema(table_name, namespace, columns)
rows = [
    {"name": "John", "number": 1.0},
    RowObject(name="Jack", number=2.0),
    ("Jim", 3.0),
]
data = [get_avro_row_dict(row, schema) for row in rows]

```

