Metadata-Version: 2.1
Name: trufl
Version: 0.0.2
Summary: Optimize adaptive sampling
Home-page: https://github.com/franckalbinet/trufl
Author: Floris Abrams, Franck Albinet
Author-email: franckalbinet@gmail.com
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore
Requires-Dist: geopandas
Requires-Dist: rasterio
Requires-Dist: h3==4.0.0b5
Requires-Dist: contextily
Requires-Dist: pysal==24.1
Requires-Dist: gstools
Requires-Dist: ipywidgets
Provides-Extra: dev

# Trufl


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

**Trufl** was initiated in the context of the [IAEA (International
Atomic Energy Agency)](https://www.iaea.org) Coordinated Research
Project titled [“Monitoring and Predicting Radionuclide Uptake and
Dynamics for Optimizing Remediation of Radioactive Contamination in
Agriculture”](https://www.iaea.org/newscenter/news/new-crp-monitoring-and-predicting-radionuclide-uptake-and-dynamics-for-optimizing-remediation-of-radioactive-contamination-in-agriculture-crp-d15019).

While **Trufl** was originally developed to address the remediation of
farmland affected by nuclear accidents, its approach and algorithms are
**applicable to a wide range of application domains**. This includes
managing **legacy contaminants or monitoring any phenomenon that
requires consideration of multiple decision criteria**, potentially
involving a large set of data.

This package leverages the work done by [Floris
Abrams](https://www.linkedin.com/in/floris-abrams-59080a15a) in the
context of his PhD at [KU Leuven](https://www.kuleuven.be) and [Franck
Albinet](https://www.linkedin.com/in/franckalbinet), International
Consultant in Geospatial Data Science and currently PhD researcher in AI
applied to nuclear remedation at KU Leuven.

## Install

`pip install trufl`

## Getting started

### Create a vector grid from a given raster

``` python
fname_raster = '../files/ground-truth-01-4326-simulated.tif'
gdf_grid = gridder(fname_raster, nrows=10, ncols=10)
```

``` python
gdf_grid.head()
```

<div>

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|        | geometry                                          |
|--------|---------------------------------------------------|
| loc_id |                                                   |
| 0      | POLYGON ((-1.20830 43.26950, -1.20830 43.26042... |
| 1      | POLYGON ((-1.20830 43.27858, -1.20830 43.26950... |
| 2      | POLYGON ((-1.20830 43.28766, -1.20830 43.27858... |
| 3      | POLYGON ((-1.20830 43.29673, -1.20830 43.28766... |
| 4      | POLYGON ((-1.20830 43.30581, -1.20830 43.29673... |

</div>

</div>

``` python
gdf_grid.boundary.plot(color=black, lw=0.5);
```

![](index_files/figure-commonmark/cell-4-output-1.png)

### Random sampling in areas of interest

Generating a random set of points within a given: - a geodataframe of
polygons of interest (in this example just a grid with `loc_id`s); - For
each subarea (`loc_id`), we specify the number of measurements to be
taken, which we simulate here by generating random numbers.

``` python
sampler = Sampler(gdf_grid)
n = np.random.randint(1, high=10, size=len(gdf_grid), dtype=int)
sample_locs = sampler.sample(n, method='uniform')

print(sample_locs.head())
sample_locs.plot(markersize=2, color=red);
```

                                                     geometry
    loc_id                                                   
    0       MULTIPOINT ((-1.22251 43.26756), (-1.22194 43....
    1       MULTIPOINT ((-1.22303 43.27756), (-1.22251 43....
    2       MULTIPOINT ((-1.22194 43.28433), (-1.22170 43....
    3       MULTIPOINT ((-1.22204 43.29152), (-1.22199 43....
    4       MULTIPOINT ((-1.22174 43.30568), (-1.21978 43....

![](index_files/figure-commonmark/cell-5-output-2.png)

### Emulating data collection

With random sampling location defined, data collector should be to the
field to take measurements. In our case, we “emulate” this process by
“extracting” measurements from provided raster file.

We will emulate data collection from the raster shown below:

``` python
with rasterio.open(fname_raster) as src:
    plt.axis('off')
    plt.imshow(src.read(1))
```

![](index_files/figure-commonmark/cell-6-output-1.png)

“Measuring” variable of interest from a given raster:

``` python
dc_emulator = DataCollector(fname_raster)
samples_t0 = dc_emulator.collect(sample_locs)

print(samples_t0.head())
ax = samples_t0.plot(column='value', s=2, legend=True)
gdf_grid.boundary.plot(color=black, ax=ax);
```

                             geometry     value
    loc_id                                     
    0       POINT (-1.22251 43.26756)  0.107457
    0       POINT (-1.22194 43.26461)  0.140862
    0       POINT (-1.22131 43.26484)  0.145688
    0       POINT (-1.22111 43.26411)  0.144795
    0       POINT (-1.21696 43.26822)  0.132611

![](index_files/figure-commonmark/cell-7-output-2.png)

### Getting current state

``` python
state = State(samples_t0, gdf_grid, cbs=[
    MaxCB(), MinCB(), StdCB(), CountCB(), MoranICB(k=5), PriorCB(fname_raster)
])

# You have to call the instance
state_t0 = state(); state_t0
```

<div>

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|        | Max      | Min      | Standard Deviation | Count | Moran.I  | Prior    |
|--------|----------|----------|--------------------|-------|----------|----------|
| loc_id |          |          |                    |       |          |          |
| 0      | 0.145688 | 0.054230 | 0.029788           | 9     | 0.786626 | 0.102492 |
| 1      | 0.156101 | 0.000000 | 0.055564           | 6     | 0.348230 | 0.125727 |
| 2      | 0.171939 | 0.153706 | 0.005947           | 6     | 0.200736 | 0.161802 |
| 3      | 0.221299 | 0.161790 | 0.022979           | 8     | 0.882324 | 0.184432 |
| 4      | 0.209163 | 0.175360 | 0.012759           | 6     | 0.756250 | 0.201405 |
| ...    | ...      | ...      | ...                | ...   | ...      | ...      |
| 95     | 0.881591 | 0.806614 | 0.021775           | 8     | 0.498415 | 0.803670 |
| 96     | 0.833478 | 0.753105 | 0.026137           | 8     | 0.789527 | 0.763408 |
| 97     | 0.708564 | 0.668151 | 0.017366           | 4     | NaN      | 0.727797 |
| 98     | 0.706323 | 0.674502 | 0.010833           | 8     | 0.818699 | 0.646002 |
| 99     | 0.709104 | 0.674233 | 0.010549           | 8     | 0.804634 | 0.655185 |

<p>100 rows × 6 columns</p>
</div>

</div>

## Build the ranking of polygons based on several criteria

### Criteria

- MaxCB()
- MinCB()
- StdCB()
- CountCB()
- MoranICB(k=5) – Gives 2 values (value , p-value)
- PriorCB

### Criteria type

- Benefit (high values –\> high score –\> rank high –\> prioritized
  sampling needed)

- Cost (high values –\> low score –\> low high –\> Less sampling needed)

- MaxCB() – Benefit

- MinCB() – ???

- StdCB() – Benefit

- CountCB() – Cost (Low count – higher priority because more samples
  need)

- MoranICB(k=5) – Cost (high value – highly correlated – less need for
  sampling ?? )

- PriorCB – Benefit

### MCDM techniques

- CP – low values – good alternative
- TOPSIS – High Value – good alternative

! Everything is converted to rank to account for these differences !

``` python
benefit_criteria = [True, True, True]
state = State(samples_t0, gdf_grid, cbs=[MaxCB(), MinCB(), StdCB()])
```

``` python
optimizer = Optimizer(state=state())
df = optimizer.rank(is_benefit_x=benefit_criteria, w_vector = [0.3, 0.3, 0.4],  
                    n_method=None, c_method = None, w_method=None, s_method="CP")

df.head()
```

<div>

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|        | rank |
|--------|------|
| loc_id |      |
| 83     | 1    |
| 91     | 2    |
| 92     | 3    |
| 84     | 4    |
| 93     | 5    |

</div>

</div>

``` python
# https://kapernikov.com/ipywidgets-with-matplotlib/
```
