Metadata-Version: 2.1
Name: contentaiextractor
Version: 1.1.0
Summary: makes it easier to implement a ContentAI extractor
Home-page: http://github.com/turnercode/contentai-extractor-runtime-python
Author: John Ritsema
Author-email: john.ritsema@turner.com
License: apache-2
Keywords: contentai extractor computervision cv ai ml
Platform: UNKNOWN
Description-Content-Type: text/markdown

# contentai-extractor-runtime-python

This is a python package used for implementing a custom extractor that runs on the ContentAI platform.

https://pypi.org/project/contentaiextractor/

1. [Usage](#Usage)
1. [API Documentation](#API-Documentation)
1. [Dependencies](#Dependencies)
1. [Develop](#Develop)
1. [Changes](#Changes)

# Usage

```sh
pip install contentaiextractor
```

```python
import contentaiextractor as contentai

# download content locally
content_path = contentai.download_content()

# access metadata that was supplied when running a job
# contentai run s3://bucket/video.mp4 -d '{ "input": "value" }'
inputData = contentai.metadata()["input"]

# get output from another extractor
csv = contentai.get("extractor", "data.csv")
json = contentai.get_json("extractor", "data.json")

# extract some data
outputData = []
outputData.append({"frameNumber": 1})

# output data from this extractor
contentai.set("output", outputData)
```

# API Documentation

<a name="contentaiextractor.ContentAIError"></a>
## ContentAIError Objects

```python
class ContentAIError(Exception)
```

represents a contentai error


## Fields

- `extractor_name` - name of the extractor being run
- `job_id` - current job id
- `content_url` - URL of the content the extractor is run against
- `content_path` - local path where the extractor can access the content
- `result_path` - local path where the extractor should write the results
- `running_in_contentai` - boolean set to `True`; useful for testing code locally
- `metadata_json` - raw string (or `None` if not set) for active extractor run (also, see parsed [metadata()](#metadata))

## Functions

<a name="contentaiextractor.download_content"></a>
#### download\_content

```python
download_content()
```

download content to work with locally

returns local path where content is written

<a name="contentaiextractor.metadata"></a>
#### metadata

```python
metadata()
```

returns a dict containing input metadata

example:

access metadata that was supplied when running a job
```sh
contentai run s3://bucket/video.mp4 -d '{ "input: "value" }'
```
```python
input = contentai.metadata()["input"]
```

<a name="contentaiextractor.extractors"></a>
#### extractors

```python
extractors()
```

get list of all extractors executed against this content url

returns a list of strings
```json
[
    "extractor1",
    "extractor2"
],
```

example:
```python
# get all data from all extractors
for extractor in contentai.extractors():
    for key in contentai.keys(extractor):
        data = contentai.get(extractor, key)
```


<a name="contentaiextractor.keys"></a>
#### keys

```python
keys(extractor_name)
```

get a list of keys for specified extractor

returns a dict containing a list of keys
```json
[
  "data.json",
  "data.csv",
  "data.txt,"
]
```

example:
```python
keys = contentai.keys("azure_videoindexer")
for key in keys:
    data = contentai.get("azure_videoindexer", key)
```

<a name="contentaiextractor.get"></a>
#### get

```python
get(extractor_name, key)
```

get the contents of a particular key

example:
```python
# get another extractor's output
data = contentai.get("some_extractor", "output.csv")
```

<a name="contentaiextractor.get_json"></a>
#### get\_json

```python
get_json(extractor_name, key)
```

get the json contents of a particular key

example:
```python
# get another extractor's output
data = contentai.get_json("some_extractor", "data.json")
```

<a name="contentaiextractor.get_bytes"></a>
#### get\_bytes

```python
get_bytes(extractor_name, key)
```

get the contents of a particular key in raw bytes

example:
```python
# get another extractor's output
data = contentai.get_bytes("some_extractor", "output.bin")
```

<a name="contentaiextractor.set"></a>
#### set

```python
set(key, value)
```

set results data for this extractor

can be called multiple times with different keys

value is a string

example:
```python
contentai.set("output", "hello world")
```

<a name="contentaiextractor.set_json"></a>
#### set\_json

```python
set_json(key, value)
```

set results data for this extractor

can be called multiple times with different keys

value can be anything

example:
```python
data = {}
data["foo"] = bar
contentai.set_json("output", data)
```

<a name="contentaiextractor.set_bytes"></a>
#### set\_bytes

```python
set_bytes(key, value)
```

set results data for this extractor

can be called multiple times with different keys

value is bytes

example:
```python
some_file = open("some-file", "rb")
contentai.set_bytes("output", some_file.read())
```

<a name="contentaiextractor.save_results"></a>
#### save\_results

```python
save_results()
```

save results immediately, instead of waiting until process exits

<a name="contentaiextractor.parse_content_url"></a>
#### parse\_content\_url

```python
parse_content_url()
```

extract details from content url

returns

- `source_bucket_name` - the s3 bucket name derived from content_url
- `source_bucket_key` - the s3 bucket key derived from content_url
- `source_bucket_region` - the s3 bucket region derived from content_url        

the following `content url` formats are supported:

- Simple (CLI) Format - `s3://{bucket}/{key}`
- Virtual Hosted Format - `https://{bucket}.s3.amazonaws.com/{key}`
- Virtual Hosted Format with Region - `https://{bucket}.s3.{region}.amazonaws.com/{key}`

# Dependencies

```
pip install -r requirements.txt
```

# Develop

```
 Choose a make command to run

  build    build package
  deploy   upload package to pypi
  docs     generates api docs in markdown
```

# Release

To publish a new release to pypi, increment the version number in `setup.py`, tag the commit and push it.


### Changes

- 1.1.0
  - add `extractors()`

- 1.0.4
  - updated changelog

- 1.0.3
  - fixes issue where `EXTRACTOR_METADATA` envvar was indavertently required

- 1.0.2
  - add safety to setting retrieval on local runs
  - documentation updates 

- 1.0.1
  - api docs for publish to pypi

- 1.0.0
  - initial release


