Metadata-Version: 2.1
Name: nifi-api
Version: 0.0.4
Summary: Scripts for the NIFI API
Home-page: https://github.com/kikejimenez/nifi_rest/tree/main/
Author: kikejimenez
Author-email: physieira@gmail.com
License: Apache Software License 2.0
Keywords: nifi cloudera requests python
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: requests (>=2.25.1)

# NIFI REST API
> Automates the Nifi dataflows 


Dataflows can be fully automated using the Nifi API. This project taps and un-taps a dataflow by tracking the incoming/outgoing flowfiles. 

More specifically, it turns on an *initial processor* and records the flowfiles generated by inspecting an *incoming connection*. After this, a *middle processor* is turned on. When all the flowfiles are registered in a *outcoming connection*, a *final processor* is turned on. Before finishing, it returns the pipeline to its initial state.

## What is automated?

The following example represents  the actions on the dataflow that get automated by the **nifi_api** library.

*1*. Consider the initial state of the dataflow:

![image.png](nbs/images/1_dataflow.png " initial state")

*2*. Turn on the "Initial" processor and turn off the "Final" processor:

![Initial State](nbs/images/2_dataflow.png " middle state")

*3*. Turn off the "Initial" processor and record the flowfiles in the "Initial" connection, then turn on the "Middle" processor:

![Initial State](nbs/images/3_dataflow.png " final state")

*4*. Record the flowfiles in the "Final" Connection, when they coincide with the "Initial" flowfile turn off the "Middle" processor and turn on the "Final" processor.

*5*. The flowfiles in the "Final" connection get consumed by the "Final" processor and the initial state of the dataflow is recovered.

Here is a recording of the Nifi UI when the tool is being executed on this dataflow:

![image.gif](nbs/images/automated_dataflow.gif "animation")

## Environment

The Nifi cluster used for testing is in the *Cloudera Public Cloud* and needs basic authentication credentials for accessing. The following are the environment variables required to access the cluster:

      - CLOUDERA_USER=user
      - CLOUDERA_PASS=password
      - CLOUDERA_CLUSTER=https://<url_clustername>.cloudera.site/<clustername>
      - CLOUDERA_NIFI_REST=/cdp-proxy-api/nifi-app/nifi-api/

## Install

**nipi_api** is in `pip` and can be installed by running:

`pip install nifi-api`

## How to use

Consider the template `Test_API.json` in the root folder, this is the template used in `What automates?` section. 

Write the data structure with the Nifi Ids (located in *view configuration -> settings -> Id*) of the processors and connections:

```python
from nifi_api.environment import DataFlowIds
ids = {
    "in_connection": {
        "Id": "cc549c6e-0177-1000-ffff-ffffb5d2aba2",
        "name": "First"
    },
    "out_connection": {
        "Id": "51ab3b24-084f-1309-0000-00001946f2c7",
        "name": "Final"
    },
    "in_processor": {
        "Id": "36c62ad6-d606-3b04-9743-d77b6249608c",
        "name": "First"
    },
    "middle_processor": {
        "Id": "cc54862f-0177-1000-ffff-ffffe7325a20",
        "name": "Middle"
    },
    "out_processor": {
        "Id": "51ab3b1e-084f-1309-a135-aa0100d7186b",
        "name": "Final"
    },
}
data_ids = DataFlowIds(ids)
```

Instantiate and run:

```python
from nifi_api.dataflow import DataFlow
dataflow = DataFlow(
    dataflow_ids=data_ids,
    delay_seconds_after_start=5,
    delay_seconds_between_checks=5,
)
```

```python
dataflow.run()
```

    pipeline watching has started..
    Pipeline watching has finished ...



