Metadata-Version: 2.1
Name: fakefill
Version: 1.0.0
Summary: Fast & Fake Backfill Airflow DAGs Status
Home-page: https://github.com/benbenbang/airflow_fakefill
License: MIT
Keywords: airflow,fakefill,backfill,fast,success,fill,migration,database
Author: Ben CHEN
Author-email: bn@benbenbang.io
Requires-Python: >=3.7,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: click (>=7.1.0,<8.0.0)
Requires-Dist: loguru (>=0.5.2,<0.6.0)
Requires-Dist: pyyaml (>=5.3.1,<6.0.0)
Project-URL: Repository, https://github.com/benbenbang/airflow_fakefill
Description-Content-Type: text/markdown

# Airflow Fakefill Marker



Due to migrating to Kubernetes-host Airflow and using different backend, we need to find out a way to fill out all the history since its starting date for thousands of dags. To make this process going faster and easier, in the meantime, I didn't find this kind of tool on Github, so I implement this simple tool to help with marking dags as `success.` Hope it can also help others.



## Installations

### Method 1

```bash
$ pip install fakefill
```

### Method 2

```bash
$ pip install git+https://git@github.com/benbenbang/airflow_fastfill.git
```

### Method 3

```bash
$ git clone git@github.com:benbenbang/airflow_fastfill.git
$ cd airflow_fastfill
$ pip install .
```



## Usages

```bash
$ fakefill
```

It takes 1 of 2 required argument, and 6 optional arguments. You can also define them in a yaml file and pass to the cli.

- Options

    - Required [1 / 2]:

        > - dag_id [-d][reqired]: can be a real dag id or "all" to fill all the dags
        > - config_path [-cp][choose one]: path to the config yaml

    - Optional:
        >- start_date [-sd]: starting date, default will be counted from 365 days ago
        >- maximum_day [-md]: maximum fill date per dag, rangint: [1, 180]
        >- maximum_unit [-mu]: maxium fill unit per dag, rangint: [1, 43200]
        >- ignore [-i]: still procceed auto fill even the dag ran recently
        >- pause_only [-p]: pass true to fill dags which are pause
        >- confirm [-y]: pass true to bypass the prompt if dag_id is all
        >- traceback [-v]: pass print our Airflow Database error



## Examples

Fill all the dags for the past 30 days without prompt, and only fill if all the dags which have status == pause

```bash
$ fakefill -d all -p -md 30 -y
```



Run fastfill for dag id == `dag_a` by counting default fakefill days == 365

```bash
$ fakefill -d dag_a
```



Run fastfill with config yaml

```bash
$ fakefill -cp config.yml
```

The yaml file needs to be defined with two dictonary types: `dags` and `settings`. For `dags` section, it needs to be a `list`, while the `settings`section is `dict`

Sample:

```yaml
dags:
  - dag_a
  - dag_b
  - dag_c

settings:
  start_date: 2019-01-01
  maximum: "365"
  traceback: false
  confirm: true
  pause_only: true

```

