Metadata-Version: 2.1
Name: ftm-geocode
Version: 0.0.1
Summary: Bath parse and geocode addresses from followthemoney entities
Home-page: https://github.com/simonwoerpel/ftm-geocode
Author: Simon Wörpel
Author-email: simon.woerpel@medienrevolte.de
License: MIT
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.10
Description-Content-Type: text/markdown
License-File: LICENSE

# ftm-geocoder

Batch parse and geocode addresses from
[followthemoney entities](https://followthemoney.readthedocs.io/en/latest/).
Simply geocoding just address strings works as well, of course.

There are as well some parsing / normalization helpers.

## Features
- Parse/normalize addresses via [libpostal](https://github.com/openvenues/libpostal)
- Geocoding via [geopy](https://geopy.readthedocs.io/en/stable/)
- Cache geocoding results in a sql database (using [dataset](https://github.com/pudo/dataset))
- Optional fallback geocoders when preferred geocoder doesn't match
- Create, update and merge [`Address`](https://followthemoney.readthedocs.io/en/latest/model.html#address) entities for ftm data

## Usage

### command line

The command line interface is designed for piping input / output streams, but
for each command a `-i <input_file>` and `-o <output_file>` can be used as well.

Geocode an input stream of ftm entities with nominatim and google maps as
fallback (geocoders are tried in the given order):

    cat entitis.ftm.ijson | ftmgeo geocode -g nominatim -g google > entities_geocoded.ftm.ijson

This looks for the [address prop](https://followthemoney.readthedocs.io/en/latest/types.html#type-address)
on input entities and creates address entities with reference to the input
entities. The output contains all entities from the input stream plus newly
created addresses.

If an input entity is itself an [Address entity](https://followthemoney.readthedocs.io/en/latest/model.html#address),
it will be geocoded as well and their props (country, city, ...) will be merged
with the geocoder result.

During the process, addresses are parsed and normalized and looked up in the
address cache database before actual geocoding. After geocoding, new addresses
are added to the cache database.

Geocoders can be set via `GEOCODERS` and default to `nominatim`

### geocoding just address strings

**csv format (for all csv input streams)**
first column `address`, optional second column `country` (name or code) and
third `language` for postal context

To ftm address entities:

    cat addresses.csv | ftmgeo geocode --input-format=csv > addresses.ftm.ijson

To csv:

    cat addresses.csv | ftmgeo geocode --input-format=csv --output-format=csv > addresses.csv

### formatting / normalization

    cat addresses.txt | ftmgeo format-line > clean_addresses.csv

Generate address entities (without geocoding):

    cat entities.ftm.ijson | ftmgeo map > entities.ftm.ijson
    cat addresses.csv | ftmgeo map --input-format=csv > addresses.ftm.ijson

### database cache

During geocoding, addresses are first looked up in the local cache, and new
geocoding results are added.

Ignore cache during geocoding (new results are still written to it):

    ftmgeo geocode --no-cache ...

Export cache:

    ftmgeo cache iterate > geocoded_addresses.ftm.ijsonl
    ftmgeo cache iterate --output-format=csv > geocoded_addresses.csv

Populate cache:

*csv input:*
`address_id,canonical_id,original_line,result_line,country,lat,lon,geocoder,geocoder_place_id`

optional field: `geocoder_raw` - json of geocoder response

    cat geocoded_addresses.csv | ftmgeo cache populate


## Configuration

**geocoders**

Default geocoders: env var `GEOCODERS`
They are used in the given order

Make sure to configure the geocoders as needed for `geopy` (endpoints, api keys, ...):

`export FTMGEO_<GEOCODERNAME>_<SETTING>=...`

**Persistent cache**

The cache database is set via `FTM_STORE_URI` (so it is the same as the ftm store, if any)

## Installation

Required external is [libpostal](https://github.com/openvenues/pypostal), see installation instructions there.

Once `libpostal` is installed on your system, you can install:

    pip install ftm-geocoder

## Testing

    make install
    make test
