Metadata-Version: 2.1
Name: featsql
Version: 0.1.1
Summary: Create features with sql
Home-page: https://github.com/ravennaro/featsql
Author: Ravenna Oliveria
Author-email: ravenna.rro@gmail.com
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore
Requires-Dist: pandas
Provides-Extra: dev
Requires-Dist: ipykernel ; extra == 'dev'
Requires-Dist: nbdev ; extra == 'dev'

# featsql

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

<span style="color: blue;">(EN)</span> There are three kinds of
functions that can be used to create features from numerical,
categorical and aggregated data. They are:

- For the SQLite database:

  - `sqlite_create_query_num()`: Numerical variables such as mean, sum,
    etc., within the user-provided window of n months.

  - `sqlite_create_query_cat()`: Categorical variables such as mean,
    sum, etc., within the user-provided window of n months.

  - `sqlite_create_query_agregada()`: Numerical variables grouped by a
    specific categorical variable value with mean, sum, etc., within the
    user-provided window of n months.

- For the MySQL database:

  - `mysql_create_query_num()`: Numerical variables such as mean, sum,
    etc., within the user-provided window of n months.

  - `mysql_create_query_cat()`: Categorical variables such as mean, sum,
    etc., within the user-provided window of n months.

  - `mysql_create_query_agregada()`: Numerical variables grouped by a
    specific categorical variable value with mean, sum, etc., within the
    user-provided window of n months.

- For the Snowflake database:

  - `snow_create_query_num()`: Numerical variables such as mean, sum,
    etc., within the user-provided window of n months.

  - `snow_create_query_cat()`: Categorical variables such as mean, sum,
    etc., within the user-provided window of n months.

  - `snow_create_query_agregada()`: Numerical variables grouped by a
    specific categorical variable value with mean, sum, etc., within the
    user-provided window of n months.

Next, we have a detailed example for creating variables in SQLite and
MySQL databases, along with an example output for the Snowflake
database.

<span style="color: green;">(PT)</span> Existem três tipos de funções
que podem ser usadas para criar variáveis a partir de dados numéricos,
categóricos e agregados. São elas: - Para o banco SQLite:

    + ``sqlite_create_query_num()``: variáveis numéricas como média, soma e etc na janela de n meses fornecida pelo usuário.
    + ``sqlite_create_query_cat()``: variáveis categóricas como média, soma e etc na janela de n meses fornecida pelo usuário.
    + ``sqlite_create_query_agregada()``: variáveis numéricas agrupadas por uma valor específico de variável categórica com a média, soma e etc na janela de n meses fornecida pelo usuário.

- Para o banco MySQL:

  - `mysql_create_query_num()`: variáveis numéricas como média, soma e
    etc na janela de n meses fornecida pelo usuário.
  - `mysql_create_query_cat()`: variáveis categóricas como média, soma e
    etc na janela de n meses fornecida pelo usuário.
  - `mysql_create_query_agregada()`: variáveis numéricas agrupadas por
    uma valor específico de variável categórica com a média, soma e etc
    na janela de n meses fornecida pelo usuário.

- Para o banco snowflake:

  - `snow_create_query_num()`: variáveis numéricas como média, soma e
    etc na janela de n meses fornecida pelo usuário.

  - `snow_create_query_cat()`: variáveis categóricas como média, soma e
    etc na janela de n meses fornecida pelo usuário.

  - `snow_create_query_agregada()`: variáveis numéricas agrupadas por
    uma valor específico de variável categórica com a média, soma e etc
    na janela de n meses fornecida pelo usuário.

A seguir, temos um exemplo detalhado para a criação de variáveis no
banco sqlite e mysql. Além de um exemplo de output para o banco do tipo
snowflake.

## Install

<span style="color: blue;">(EN)</span> To install, simply use the
command line:

``` sh
pip install featsql
```

<span style="color: green;">(PT)</span> Para instalar, simplesmente use
o comando:

``` sh
pip install featsql
```

## Imports

``` python
from featsql.featsqlite import *
from featsql.featmysql import *
from featsql.featsnow import *
```

``` python
import pandas as pd
```

``` python
import mysql.connector
import sqlite3
pd.set_option('display.max_columns', None)
```

# Exemples

## 1. SQLITE

### Setting up the engine (Configurando a engine)

``` python
url_db = "../../data/mydatabase.db" 
conn = sqlite3.connect(url_db)
cursor = conn.cursor()
```

### Initial view of the public (Visão inicial do público)

<span style="color: blue;">(EN)</span> First, let’s observe the format
of the spine table.

<span style="color: green;">(PT)</span> Primeiro vamos observar o
formato da tabela spine

``` python
df_spine = pd.read_sql("SELECT * FROM tb_spine", conn)
df_spine.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | Target |
|-----|-----|------------|--------|
| 0   | 4   | 2023-02-01 | 0      |
| 1   | 5   | 2023-02-01 | 0      |
| 2   | 6   | 2023-02-01 | 0      |
| 3   | 7   | 2023-02-01 | 0      |
| 4   | 10  | 2023-02-01 | 1      |

</div>

### Initial view of the variables table (Visão inicial da tabela de variáveis)

<span style="color: blue;">(EN)</span> The table tb_feat contains 4
variables, two numeric and two categorical. Note that there are more
unique IDs and dates available in this than in the spine table, as is
often the case.

<span style="color: green;">(PT)</span> A tabela tb_feat contém 4
variáveis, duas sendo numéricas e duas categórica. Perceba que existem
mais ID’s únicos e datas disponíveis nessa tabela do que na tabela
spine, caso que ocorre no dia a dia.

``` python
df_data = pd.read_sql("SELECT * FROM tb_feat", conn)
df_data.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA      | FEAT_NUM1 | FEAT_NUM2 | FEAT_CAT1 | FEAT_CAT2 |
|-----|-----|------------|-----------|-----------|-----------|-----------|
| 0   | 1   | 2023-01-01 | -19       | -52       | A         | B         |
| 1   | 2   | 2023-01-01 | -7        | -33       | A         | B         |
| 2   | 3   | 2023-01-01 | 6         | 91        | C         | C         |
| 3   | 4   | 2023-01-01 | 74        | 52        | B         | A         |
| 4   | 5   | 2023-01-01 | 79        | 77        | B         | B         |

</div>

### Creation of numerical variables (Criação de variáveis numéricas)

<span style="color: blue;">(EN)</span> The function
sqlite_create_query_num() creates a query text for generating variables
with the sum, minimum, maximum, and average operations on the variables
listed in feat_num_lista and within the time window listed in
list_window.

<span style="color: green;">(PT)</span> A função
sqlite_create_query_num() cria um texto com a query para a criação de
variáveis com as operações soma, mínimo, máximo e média das variáveis
listadas em feat_num_lista e com a janela de tempo listada em
list_window.

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
query_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)
```

    Complete query creation with no saved file.

``` python
df_num_sqlite = pd.read_sql(query_final_num_sqlite, conn)
df_num_sqlite.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |
|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| 0   | 4   | 2023-02-01 | 74.0             | 74.0             | 74.0             | 74.0             | 52.0             | 52.0             | 52.0             | 52.0             | 74               | 74               | 74               | 74.0             | 52               | 52               | 52               | 52.0             | 74               | 74               | 74               | 74.0             | 52               | 52               | 52               | 52.0             |
| 1   | 5   | 2023-02-01 | 79.0             | 79.0             | 79.0             | 79.0             | 77.0             | 77.0             | 77.0             | 77.0             | 79               | 79               | 79               | 79.0             | 77               | 77               | 77               | 77.0             | 79               | 79               | 79               | 79.0             | 77               | 77               | 77               | 77.0             |
| 2   | 6   | 2023-02-01 | -13.0            | -13.0            | -13.0            | -13.0            | -45.0            | -45.0            | -45.0            | -45.0            | -13              | -13              | -13              | -13.0            | -45              | -45              | -45              | -45.0            | -13              | -13              | -13              | -13.0            | -45              | -45              | -45              | -45.0            |
| 3   | 7   | 2023-02-01 | -44.0            | -44.0            | -44.0            | -44.0            | 0.0              | 0.0              | 0.0              | 0.0              | -44              | -44              | -44              | -44.0            | 0                | 0                | 0                | 0.0              | -44              | -44              | -44              | -44.0            | 0                | 0                | 0                | 0.0              |
| 4   | 10  | 2023-02-01 | 11.0             | 11.0             | 11.0             | 11.0             | -39.0            | -39.0            | -39.0            | -39.0            | 11               | 11               | 11               | 11.0             | -39              | -39              | -39              | -39.0            | 11               | 11               | 11               | 11.0             | -39              | -39              | -39              | -39.0            |

</div>

<span style="color: blue;">(EN)</span> We can use the `nome_arquivo`
parameter with a name in the format ‘table.sql’ to save the creation
query in a file in the same folder where it is executed.

<span style="color: green;">(PT)</span> Podemos utilizar o parâmetro
`nome_arquivo` com um nome no formato “table.sql” para salvar query de
criação em um arquivo na mesma pasta que é executado.

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
file_name = 'table.sql'

query_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name)
```

    Complete query creation with table.sql saved file

<span style="color: blue;">(EN)</span> Besides saving the query to a
file, we can create a table directly in the database using the
`[`sqlite_create_query_num`](https://ravennaro.github.io/featsql/creation_sqlite.html#sqlite_create_query_num)`
function with the creation parameters. Here is an example:

<span style="color: green;">(PT)</span> Além de salvar a query em um
arquivo, podemos criar uma tabela diretamente no banco de dados a partir
da função
`[`sqlite_create_query_num`](https://ravennaro.github.io/featsql/creation_sqlite.html#sqlite_create_query_num)`
com os parâmetros de criação. A seguir um exemplo:

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=conn)
```

    Complete query creation with table.sql saved file

<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:

<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:

``` python
query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, conn)
df.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |
|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| 0   | 4   | 2023-02-01 | 74.0             | 74.0             | 74.0             | 74.0             | 52.0             | 52.0             | 52.0             | 52.0             | 74.0             | 74.0             | 74.0             | 74.0             | 52.0             | 52.0             | 52.0             | 52.0             | 74.0             | 74.0             | 74.0             | 74.0             | 52.0             | 52.0             | 52.0             | 52.0             |
| 1   | 5   | 2023-02-01 | 79.0             | 79.0             | 79.0             | 79.0             | 77.0             | 77.0             | 77.0             | 77.0             | 79.0             | 79.0             | 79.0             | 79.0             | 77.0             | 77.0             | 77.0             | 77.0             | 79.0             | 79.0             | 79.0             | 79.0             | 77.0             | 77.0             | 77.0             | 77.0             |
| 2   | 6   | 2023-02-01 | -13.0            | -13.0            | -13.0            | -13.0            | -45.0            | -45.0            | -45.0            | -45.0            | -13.0            | -13.0            | -13.0            | -13.0            | -45.0            | -45.0            | -45.0            | -45.0            | -13.0            | -13.0            | -13.0            | -13.0            | -45.0            | -45.0            | -45.0            | -45.0            |
| 3   | 7   | 2023-02-01 | -44.0            | -44.0            | -44.0            | -44.0            | 0.0              | 0.0              | 0.0              | 0.0              | -44.0            | -44.0            | -44.0            | -44.0            | 0.0              | 0.0              | 0.0              | 0.0              | -44.0            | -44.0            | -44.0            | -44.0            | 0.0              | 0.0              | 0.0              | 0.0              |
| 4   | 10  | 2023-02-01 | 11.0             | 11.0             | 11.0             | 11.0             | -39.0            | -39.0            | -39.0            | -39.0            | 11.0             | 11.0             | 11.0             | 11.0             | -39.0            | -39.0            | -39.0            | -39.0            | 11.0             | 11.0             | 11.0             | 11.0             | -39.0            | -39.0            | -39.0            | -39.0            |

</div>

### Creation of categorical variables (Criação de variáveis categóricas)

<span style="color: blue;">(EN)</span> The function
`sqlite_create_query_cat()` generates a query text for creating
variables with the mode of each variable listed in feat_num_lista within
the time window provided in list_window.

<span style="color: green;">(PT)</span> A função
`sqlite_create_query_cat()` cria um texto com a query para a criação de
variáveis com a moda de cada uma das variáveis listadas em
feat_num_lista na janela de tempo fornecida em list_window.

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']
list_window = [1, 3, 6]

query_final_cat_sqlite = sqlite_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)
```

    Complete query creation with no saved file.

``` python
df_cat_sqlite_sqlite = pd.read_sql(query_final_cat_sqlite, conn)
df_cat_sqlite_sqlite.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M | FEAT_CAT1_MODA_6M | FEAT_CAT2_MODA_6M |
|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| 0   | 4   | 2023-02-01 | B                 | A                 | B                 | A                 | B                 | A                 |
| 1   | 5   | 2023-02-01 | B                 | B                 | B                 | B                 | B                 | B                 |
| 2   | 6   | 2023-02-01 | C                 | B                 | C                 | B                 | C                 | B                 |
| 3   | 7   | 2023-02-01 | B                 | C                 | B                 | C                 | B                 | C                 |
| 4   | 10  | 2023-02-01 | A                 | A                 | A                 | A                 | A                 | A                 |

</div>

<span style="color: blue;">(EN)</span> The function
`sqlite_create_query_cat()` also allows saving a file with the created
query and creating the table directly in the database. Here’s an
example:

<span style="color: green;">(PT)</span> A função
`sqlite_create_query_cat()` também permite salvar um arquivo com a query
criada e criar a tabela diretamente no banco de dados. Segue um exemplo:

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']
list_window = [1, 3, 6]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query_final_cat_sqlite = sqlite_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=conn)
```

    Complete query creation with table.sql saved file

<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:

<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:

``` python
query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, conn)
df.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M | FEAT_CAT1_MODA_6M | FEAT_CAT2_MODA_6M |
|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| 0   | 4   | 2023-02-01 | B                 | A                 | B                 | A                 | B                 | A                 |
| 1   | 5   | 2023-02-01 | B                 | B                 | B                 | B                 | B                 | B                 |
| 2   | 6   | 2023-02-01 | C                 | B                 | C                 | B                 | C                 | B                 |
| 3   | 7   | 2023-02-01 | B                 | C                 | B                 | C                 | B                 | C                 |
| 4   | 10  | 2023-02-01 | A                 | A                 | A                 | A                 | A                 | A                 |

</div>

### Creation of aggregated variables (Criação de variáveis agregadas)

<span style="color: blue;">(EN)</span> We can create features by
aggregating on a specific value of a categorical variable and
calculating sum, avg, min, and max operations for user-defined numerical
variables. For example, for the variable FEAT_CAT1 having a value of A
or B, we calculate sum, avg, min, and max operations for the variables
FEAT_NUM1 and FEAT_NUM2. Here’s a use case:

<span style="color: green;">(PT)</span> Podemos criar variáveis
agregando por valor específico de variável categórica e calculando as
operações de sum, avg, min e max para as variáveis númericas definidas
pelo usuário. Por exemplo, para a variável `FEAT_CAT1` tendo valor igual
a `A` ou `B`, calculamos as operações de sum, avg, min e max para as
variáveis `FEAT_NUM1` e `FEAT_NUM2`. Segue um caso de uso:

``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
list_aggregator_value = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'

query = sqlite_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, list_aggregator_value)
```

    Complete query creation with no saved file.

``` python
df_sqlite_agregada = pd.read_sql(query, conn)
df_sqlite_agregada.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0   | 4   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | 74.0                         | 74.0                         | 74.0                         | 74.0                         | 52.0                         | 52.0                         | 52.0                         | 52.0                         | 74.0                         | 74.0                         | 74.0                         | 74.0                         | 52.0                         | 52.0                         | 52.0                         | 52.0                         |
| 1   | 5   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | 79.0                         | 79.0                         | 79.0                         | 79.0                         | 77.0                         | 77.0                         | 77.0                         | 77.0                         | 79.0                         | 79.0                         | 79.0                         | 79.0                         | 77.0                         | 77.0                         | 77.0                         | 77.0                         |
| 2   | 6   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 3   | 7   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | -44.0                        | -44.0                        | -44.0                        | -44.0                        | 0.0                          | 0.0                          | 0.0                          | 0.0                          | -44.0                        | -44.0                        | -44.0                        | -44.0                        | 0.0                          | 0.0                          | 0.0                          | 0.0                          |
| 4   | 10  | 2023-02-01 | 11.0                         | 11.0                         | 11.0                         | 11.0                         | -39.0                        | -39.0                        | -39.0                        | -39.0                        | 11.0                         | 11.0                         | 11.0                         | 11.0                         | -39.0                        | -39.0                        | -39.0                        | -39.0                        | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |

</div>

<span style="color: blue;">(EN)</span> Here’s an example of saving a
file with the created query and directly creating the table in the
database:

<span style="color: green;">(PT)</span> Segue um exemplo para salvar um
arquivo com a query criada e criar a tabela diretamente no banco de
dados:

``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
list_aggregator_value = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
status = True
file_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query = sqlite_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, list_aggregator_value, file_name=file_name, status=status, table_name=table_name, conn=conn)
```

    Complete query creation with table.sql saved file

<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:

<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:

``` python
query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, conn)
df.head()
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0   | 4   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | 74.0                         | 74.0                         | 74.0                         | 74.0                         | 52.0                         | 52.0                         | 52.0                         | 52.0                         | 74.0                         | 74.0                         | 74.0                         | 74.0                         | 52.0                         | 52.0                         | 52.0                         | 52.0                         |
| 1   | 5   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | 79.0                         | 79.0                         | 79.0                         | 79.0                         | 77.0                         | 77.0                         | 77.0                         | 77.0                         | 79.0                         | 79.0                         | 79.0                         | 79.0                         | 77.0                         | 77.0                         | 77.0                         | 77.0                         |
| 2   | 6   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 3   | 7   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | -44.0                        | -44.0                        | -44.0                        | -44.0                        | 0.0                          | 0.0                          | 0.0                          | 0.0                          | -44.0                        | -44.0                        | -44.0                        | -44.0                        | 0.0                          | 0.0                          | 0.0                          | 0.0                          |
| 4   | 10  | 2023-02-01 | 11.0                         | 11.0                         | 11.0                         | 11.0                         | -39.0                        | -39.0                        | -39.0                        | -39.0                        | 11.0                         | 11.0                         | 11.0                         | 11.0                         | -39.0                        | -39.0                        | -39.0                        | -39.0                        | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |

</div>

## 2. MySQL

### Setting up the connection (Configurando a conexão)

``` python
host = "localhost"
user = "sqluser"
password = "password"
database = "mydatabase"

# Conectar ao MySQL
connection = mysql.connector.connect(
    host=host,
    user=user,
    password=password,
    database=database
)
```

### Initial view of the public (Visão inicial do público)

<span style="color: blue;">(EN)</span> First, let’s observe the format
of the spine table.

<span style="color: green;">(PT)</span> Primeiro vamos observar o
formato da tabela spine

``` python
df_spine = pd.read_sql("SELECT * FROM tb_spine", connection)
df_spine.head()
```

    /tmp/ipykernel_31659/1385309256.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
      df_spine = pd.read_sql("SELECT * FROM tb_spine", connection)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | Target |
|-----|-----|------------|--------|
| 0   | 4   | 2023-02-01 | 1      |
| 1   | 5   | 2023-02-01 | 0      |
| 2   | 6   | 2023-02-01 | 0      |
| 3   | 7   | 2023-02-01 | 0      |
| 4   | 10  | 2023-02-01 | 0      |

</div>

### Initial view of the variables table (Visão inicial da tabela de variáveis)

<span style="color: blue;">(EN)</span> The table tb_feat contains 4
variables, two numeric and two categorical. Note that there are more
unique IDs and dates available in this than in the spine table, as is
often the case.

<span style="color: green;">(PT)</span> A tabela tb_feat contém 4
variáveis, duas sendo numéricas e duas categórica. Perceba que existem
mais ID’s únicos e datas disponíveis nessa tabela do que na tabela
spine, caso que ocorre no dia a dia.

``` python
df_data = pd.read_sql("SELECT * FROM tb_feat", connection)
df_data.head()
```

    /tmp/ipykernel_31659/548342132.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
      df_data = pd.read_sql("SELECT * FROM tb_feat", connection)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA      | FEAT_NUM1 | FEAT_NUM2 | FEAT_CAT1 | FEAT_CAT2 |
|-----|-----|------------|-----------|-----------|-----------|-----------|
| 0   | 1   | 2023-01-01 | 73        | 23        | B         | B         |
| 1   | 3   | 2023-01-01 | 15        | 1         | B         | B         |
| 2   | 5   | 2023-01-01 | 75        | 71        | A         | A         |
| 3   | 7   | 2023-01-01 | 73        | 82        | B         | C         |
| 4   | 9   | 2023-01-01 | 61        | 8         | C         | B         |

</div>

### Creation of numerical variables (Criação de variáveis numéricas)

<span style="color: blue;">(EN)</span> The function
`mysql_create_query_num()` creates a query text for generating variables
with the sum, minimum, maximum, and average operations on the variables
listed in feat_num_lista and within the time window listed in
list_window.

<span style="color: green;">(PT)</span> A função
`mysql_create_query_num()` cria um texto com a query para a criação de
variáveis com as operações soma, mínimo, máximo e média das variáveis
listadas em feat_num_lista e com a janela de tempo listada em
list_window.

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
query_final_num_mysql = mysql_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra)
```

    Complete query creation with no saved file.

``` python
df_num_mysql = pd.read_sql(query_final_num_mysql, connection)
df_num_mysql.head()
```

    /tmp/ipykernel_31659/3114881597.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
      df_num_mysql = pd.read_sql(query_final_num_mysql, connection)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |
|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| 0   | 4   | 2023-02-01 | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              |
| 1   | 5   | 2023-02-01 | 75.0             | 75.0             | 75.0             | 75.0             | 71.0             | 71.0             | 71.0             | 71.0             | 75.0             | 75.0             | 75.0             | 75.0             | 71.0             | 71.0             | 71.0             | 71.0             | 75.0             | 75.0             | 75.0             | 75.0             | 71.0             | 71.0             | 71.0             | 71.0             |
| 2   | 6   | 2023-02-01 | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              |
| 3   | 7   | 2023-02-01 | 73.0             | 73.0             | 73.0             | 73.0             | 82.0             | 82.0             | 82.0             | 82.0             | 73.0             | 73.0             | 73.0             | 73.0             | 82.0             | 82.0             | 82.0             | 82.0             | 73.0             | 73.0             | 73.0             | 73.0             | 82.0             | 82.0             | 82.0             | 82.0             |
| 4   | 10  | 2023-02-01 | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              |

</div>

<span style="color: blue;">(EN)</span> We can use the `file_name`
parameter with a name in the format ‘table.sql’ to save the creation
query in a file in the same folder where it is executed.

<span style="color: green;">(PT)</span> Podemos utilizar o parâmetro
`file_name` com um nome no formato “table.sql” para salvar query de
criação em um arquivo na mesma pasta que é executado.

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query_final_num_mysql = mysql_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=connection)
```

    ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ']
    -- Create table with the name nome_da_tabela_criada
    -- Criar a tabela nome_da_' at line 1

<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:

<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:

``` python
query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, connection)
df.head()
```

    /tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
      df = pd.read_sql(query, connection)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | FEAT_NUM1_SUM_1M | FEAT_NUM1_MIN_1M | FEAT_NUM1_MAX_1M | FEAT_NUM1_AVG_1M | FEAT_NUM2_SUM_1M | FEAT_NUM2_MIN_1M | FEAT_NUM2_MAX_1M | FEAT_NUM2_AVG_1M | FEAT_NUM1_SUM_2M | FEAT_NUM1_MIN_2M | FEAT_NUM1_MAX_2M | FEAT_NUM1_AVG_2M | FEAT_NUM2_SUM_2M | FEAT_NUM2_MIN_2M | FEAT_NUM2_MAX_2M | FEAT_NUM2_AVG_2M | FEAT_NUM1_SUM_3M | FEAT_NUM1_MIN_3M | FEAT_NUM1_MAX_3M | FEAT_NUM1_AVG_3M | FEAT_NUM2_SUM_3M | FEAT_NUM2_MIN_3M | FEAT_NUM2_MAX_3M | FEAT_NUM2_AVG_3M |
|-----|-----|------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|
| 0   | 4   | 2023-02-01 | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              |
| 1   | 5   | 2023-02-01 | 75.0             | 75.0             | 75.0             | 75.0             | 71.0             | 71.0             | 71.0             | 71.0             | 75.0             | 75.0             | 75.0             | 75.0             | 71.0             | 71.0             | 71.0             | 71.0             | 75.0             | 75.0             | 75.0             | 75.0             | 71.0             | 71.0             | 71.0             | 71.0             |
| 2   | 6   | 2023-02-01 | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              |
| 3   | 7   | 2023-02-01 | 73.0             | 73.0             | 73.0             | 73.0             | 82.0             | 82.0             | 82.0             | 82.0             | 73.0             | 73.0             | 73.0             | 73.0             | 82.0             | 82.0             | 82.0             | 82.0             | 73.0             | 73.0             | 73.0             | 73.0             | 82.0             | 82.0             | 82.0             | 82.0             |
| 4   | 10  | 2023-02-01 | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              | NaN              |

</div>

### Creation of categorical variables (Criação de variáveis categóricas)

<span style="color: blue;">(EN)</span> The function
`mysql_create_query_cat()` generates a query text for creating variables
with the mode of each variable listed in feat_num_lista within the time
window provided in list_window.

<span style="color: green;">(PT)</span> A função
`mysql_create_query_cat()` cria um texto com a query para a criação de
variáveis com a moda de cada uma das variáveis listadas em
feat_num_lista na janela de tempo fornecida em list_window.

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1','FEAT_CAT2']
list_window = [1,2,3]
query_final_cat_mysql = mysql_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)
```

    Complete query creation with no saved file.

``` python
df_cat_sqlite_mysql = pd.read_sql(query_final_cat_mysql, connection)
df_cat_sqlite_mysql.head()
```

    /tmp/ipykernel_1661/827211967.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
      df_cat_sqlite_mysql = pd.read_sql(query_final_cat_mysql, connection)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_2M | FEAT_CAT2_MODA_2M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M |
|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| 0   | 4   | 2023-02-01 | None              | None              | None              | None              | None              | None              |
| 1   | 5   | 2023-02-01 | A                 | A                 | A                 | A                 | A                 | A                 |
| 2   | 6   | 2023-02-01 | None              | None              | None              | None              | None              | None              |
| 3   | 7   | 2023-02-01 | B                 | C                 | B                 | C                 | B                 | C                 |
| 4   | 10  | 2023-02-01 | None              | None              | None              | None              | None              | None              |

</div>

<span style="color: blue;">(EN)</span> The function
`sqlite_create_query_cat()` also allows saving a file with the created
query and creating the table directly in the database. Here’s an
example:

<span style="color: green;">(PT)</span> A função
`sqlite_create_query_cat()` também permite salvar um arquivo com a query
criada e criar a tabela diretamente no banco de dados. Segue um exemplo:

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']
list_window = [1, 3, 6]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query_final_cat_mysql = mysql_create_query_cat(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=connection)
```

    Complete query creation with table.sql saved file

<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:

<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:

``` python
query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, connection)
df.head()
```

    /tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
      df = pd.read_sql(query, connection)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | FEAT_CAT1_MODA_1M | FEAT_CAT2_MODA_1M | FEAT_CAT1_MODA_3M | FEAT_CAT2_MODA_3M | FEAT_CAT1_MODA_6M | FEAT_CAT2_MODA_6M |
|-----|-----|------------|-------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| 0   | 4   | 2023-02-01 | None              | None              | None              | None              | None              | None              |
| 1   | 5   | 2023-02-01 | A                 | A                 | A                 | A                 | A                 | A                 |
| 2   | 6   | 2023-02-01 | None              | None              | None              | None              | None              | None              |
| 3   | 7   | 2023-02-01 | B                 | C                 | B                 | C                 | B                 | C                 |
| 4   | 10  | 2023-02-01 | None              | None              | None              | None              | None              | None              |

</div>

### Creation of aggregated variables (Criação de variáveis agregadas)

<span style="color: blue;">(EN)</span> We can create features by
aggregating on a specific value of a categorical variable and
calculating sum, avg, min, and max operations for user-defined numerical
variables. For example, for the variable FEAT_CAT1 having a value of A
or B, we calculate sum, avg, min, and max operations for the variables
FEAT_NUM1 and FEAT_NUM2. Here’s a use case:

<span style="color: green;">(PT)</span> Podemos criar variáveis
agregando por valor específico de variável categórica e calculando as
operações de sum, avg, min e max para as variáveis númericas definidas
pelo usuário. Por exemplo, para a variável `FEAT_CAT1` tendo valor igual
a `A` ou `B`, calculamos as operações de sum, avg, min e max para as
variáveis `FEAT_NUM1` e `FEAT_NUM2`. Segue um caso de uso:

``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'

query = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)
```

    Complete query creation with no saved file.

``` python
df_mysql_agregada = pd.read_sql(query, connection)
df_mysql_agregada.head()
```

    /tmp/ipykernel_1661/789764728.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
      df_mysql_agregada = pd.read_sql(query, connection)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0   | 4   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 1   | 5   | 2023-02-01 | 75.0                         | 75.0                         | 75.0                         | 75.0                         | 71.0                         | 71.0                         | 71.0                         | 71.0                         | 75.0                         | 75.0                         | 75.0                         | 75.0                         | 71.0                         | 71.0                         | 71.0                         | 71.0                         | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 2   | 6   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 3   | 7   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | 73.0                         | 73.0                         | 73.0                         | 73.0                         | 82.0                         | 82.0                         | 82.0                         | 82.0                         | 73.0                         | 73.0                         | 73.0                         | 73.0                         | 82.0                         | 82.0                         | 82.0                         | 82.0                         |
| 4   | 10  | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |

</div>

``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT2'
lista_valor_agragador = ['B', 'C']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'

query = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)
```

    Complete query creation with no saved file.

``` python
df_mysql_agregada = pd.read_sql(query, connection)
df_mysql_agregada.head()
```

    /tmp/ipykernel_1661/789764728.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
      df_mysql_agregada = pd.read_sql(query, connection)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | SUM_FEAT_NUM1_FEAT_CAT2_B_3M | MAX_FEAT_NUM1_FEAT_CAT2_B_3M | MIN_FEAT_NUM1_FEAT_CAT2_B_3M | AVG_FEAT_NUM1_FEAT_CAT2_B_3M | SUM_FEAT_NUM2_FEAT_CAT2_B_3M | MAX_FEAT_NUM2_FEAT_CAT2_B_3M | MIN_FEAT_NUM2_FEAT_CAT2_B_3M | AVG_FEAT_NUM2_FEAT_CAT2_B_3M | SUM_FEAT_NUM1_FEAT_CAT2_B_6M | MAX_FEAT_NUM1_FEAT_CAT2_B_6M | MIN_FEAT_NUM1_FEAT_CAT2_B_6M | AVG_FEAT_NUM1_FEAT_CAT2_B_6M | SUM_FEAT_NUM2_FEAT_CAT2_B_6M | MAX_FEAT_NUM2_FEAT_CAT2_B_6M | MIN_FEAT_NUM2_FEAT_CAT2_B_6M | AVG_FEAT_NUM2_FEAT_CAT2_B_6M | SUM_FEAT_NUM1_FEAT_CAT2_C_3M | MAX_FEAT_NUM1_FEAT_CAT2_C_3M | MIN_FEAT_NUM1_FEAT_CAT2_C_3M | AVG_FEAT_NUM1_FEAT_CAT2_C_3M | SUM_FEAT_NUM2_FEAT_CAT2_C_3M | MAX_FEAT_NUM2_FEAT_CAT2_C_3M | MIN_FEAT_NUM2_FEAT_CAT2_C_3M | AVG_FEAT_NUM2_FEAT_CAT2_C_3M | SUM_FEAT_NUM1_FEAT_CAT2_C_6M | MAX_FEAT_NUM1_FEAT_CAT2_C_6M | MIN_FEAT_NUM1_FEAT_CAT2_C_6M | AVG_FEAT_NUM1_FEAT_CAT2_C_6M | SUM_FEAT_NUM2_FEAT_CAT2_C_6M | MAX_FEAT_NUM2_FEAT_CAT2_C_6M | MIN_FEAT_NUM2_FEAT_CAT2_C_6M | AVG_FEAT_NUM2_FEAT_CAT2_C_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0   | 4   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 1   | 5   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 2   | 6   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 3   | 7   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | 73.0                         | 73.0                         | 73.0                         | 73.0                         | 82.0                         | 82.0                         | 82.0                         | 82.0                         | 73.0                         | 73.0                         | 73.0                         | 73.0                         | 82.0                         | 82.0                         | 82.0                         | 82.0                         |
| 4   | 10  | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |

</div>

<span style="color: blue;">(EN)</span> Here’s an example of saving a
file with the created query and directly creating the table in the
database:

<span style="color: green;">(PT)</span> Segue um exemplo para salvar um
arquivo com a query criada e criar a tabela diretamente no banco de
dados:

``` python
tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador, file_name=file_name, status=status, table_name=table_name, conn=connection)
```

    Complete query creation with table.sql saved file

<span style="color: blue;">(EN)</span> To verify the creation, let’s
check the database table using the name of the created table:

<span style="color: green;">(PT)</span> Para conferir a criação, vamos
consultar o banco de dados com o nome da tabela que foi criada:

``` python
query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, connection)
df.head()
```

    /tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
      df = pd.read_sql(query, connection)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | ID  | SAFRA_REF  | SUM_FEAT_NUM1_FEAT_CAT1_A_3M | MAX_FEAT_NUM1_FEAT_CAT1_A_3M | MIN_FEAT_NUM1_FEAT_CAT1_A_3M | AVG_FEAT_NUM1_FEAT_CAT1_A_3M | SUM_FEAT_NUM2_FEAT_CAT1_A_3M | MAX_FEAT_NUM2_FEAT_CAT1_A_3M | MIN_FEAT_NUM2_FEAT_CAT1_A_3M | AVG_FEAT_NUM2_FEAT_CAT1_A_3M | SUM_FEAT_NUM1_FEAT_CAT1_A_6M | MAX_FEAT_NUM1_FEAT_CAT1_A_6M | MIN_FEAT_NUM1_FEAT_CAT1_A_6M | AVG_FEAT_NUM1_FEAT_CAT1_A_6M | SUM_FEAT_NUM2_FEAT_CAT1_A_6M | MAX_FEAT_NUM2_FEAT_CAT1_A_6M | MIN_FEAT_NUM2_FEAT_CAT1_A_6M | AVG_FEAT_NUM2_FEAT_CAT1_A_6M | SUM_FEAT_NUM1_FEAT_CAT1_B_3M | MAX_FEAT_NUM1_FEAT_CAT1_B_3M | MIN_FEAT_NUM1_FEAT_CAT1_B_3M | AVG_FEAT_NUM1_FEAT_CAT1_B_3M | SUM_FEAT_NUM2_FEAT_CAT1_B_3M | MAX_FEAT_NUM2_FEAT_CAT1_B_3M | MIN_FEAT_NUM2_FEAT_CAT1_B_3M | AVG_FEAT_NUM2_FEAT_CAT1_B_3M | SUM_FEAT_NUM1_FEAT_CAT1_B_6M | MAX_FEAT_NUM1_FEAT_CAT1_B_6M | MIN_FEAT_NUM1_FEAT_CAT1_B_6M | AVG_FEAT_NUM1_FEAT_CAT1_B_6M | SUM_FEAT_NUM2_FEAT_CAT1_B_6M | MAX_FEAT_NUM2_FEAT_CAT1_B_6M | MIN_FEAT_NUM2_FEAT_CAT1_B_6M | AVG_FEAT_NUM2_FEAT_CAT1_B_6M |
|-----|-----|------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| 0   | 4   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 1   | 5   | 2023-02-01 | 75.0                         | 75.0                         | 75.0                         | 75.0                         | 71.0                         | 71.0                         | 71.0                         | 71.0                         | 75.0                         | 75.0                         | 75.0                         | 75.0                         | 71.0                         | 71.0                         | 71.0                         | 71.0                         | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 2   | 6   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |
| 3   | 7   | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | 73.0                         | 73.0                         | 73.0                         | 73.0                         | 82.0                         | 82.0                         | 82.0                         | 82.0                         | 73.0                         | 73.0                         | 73.0                         | 73.0                         | 82.0                         | 82.0                         | 82.0                         | 82.0                         |
| 4   | 10  | 2023-02-01 | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          | NaN                          |

</div>

## 3. Snowflake

### Criação de variáveis numéricas

<span style="color: blue;">(EN)</span> The function
`snow_create_query_num()` creates a query text for generating variables
with the sum, minimum, maximum, and average operations on the variables
listed in feat_num_lista and within the time window listed in
list_window.

<span style="color: green;">(PT)</span> A função
`snow_create_query_num()` cria um texto com a query para a criação de
variáveis com as operações soma, mínimo, máximo e média das variáveis
listadas em feat_num_lista e com a janela de tempo listada em
list_window.

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]

query_final_num_snow = snow_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra)
```

    Complete query creation with no saved file.

Podemos salvar a consulta da query em um arquivo também:

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
file_name = 'table.sql'

query_final_num_snow = snow_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra, file_name)
```

    Complete query creation with table.sql saved file

### Criação de variáveis categóricas

<span style="color: blue;">(EN)</span> The function
`mysql_create_query_cat()` generates a query text for creating variables
with the mode of each variable listed in feat_num_lista within the time
window provided in list_window.

<span style="color: green;">(PT)</span> A função
`mysql_create_query_cat()` cria um texto com a query para a criação de
variáveis com a moda de cada uma das variáveis listadas em
feat_num_lista na janela de tempo fornecida em list_window.

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1','FEAT_CAT2']
list_window = [1,2,3]

query_final_cat_snow = snow_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)
```

    Complete query creation with no saved file.

``` python
tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1','FEAT_CAT2']
list_window = [1,2,3]
file_name = 'table.sql'

query_final_cat_snow = snow_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name)
```

    Complete query creation with table.sql saved file

### Criação de variáveis agregadas

<span style="color: blue;">(EN)</span> We can create features by
aggregating on a specific value of a categorical variable and
calculating sum, avg, min, and max operations for user-defined numerical
variables. For example, for the variable FEAT_CAT1 having a value of A
or B, we calculate sum, avg, min, and max operations for the variables
FEAT_NUM1 and FEAT_NUM2. Here’s a use case:

<span style="color: green;">(PT)</span> Podemos criar variáveis
agregando por valor específico de variável categórica e calculando as
operações de sum, avg, min e max para as variáveis númericas definidas
pelo usuário. Por exemplo, para a variável `FEAT_CAT1` tendo valor igual
a `A` ou `B`, calculamos as operações de sum, avg, min e max para as
variáveis `FEAT_NUM1` e `FEAT_NUM2`. Segue um caso de uso:

``` python
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['B', 'C']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
janelas = [1, 2, 3]
tb_publico = 'tb_spine'


query_final_cat_agre_snow = snow_create_query_agregada(tb_publico, tb_feat, janelas,  lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)
```

    Complete query creation with no saved file.

<span style="color: blue;">(EN)</span> We can also save the query to a
file:

<span style="color: green;">(PT)</span> Podemos também salvar a consulta
em um arquivo:

``` python
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['B', 'C']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
janelas = [1, 2, 3]
tb_publico = 'tb_spine'
file_name = 'table.sql'

query_final_cat_agre_snow = snow_create_query_agregada(tb_publico, tb_feat, janelas,  lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador, file_name)
```

    Complete query creation with table.sql saved file


