Metadata-Version: 2.1
Name: icoscp_core
Version: 0.2.2
Summary: icoscp_core
Keywords: environment,research,infrastructure,data access
Author-email: Oleg Mirzov <oleg.mirzov@nateko.lu.se>
Maintainer-email: Klara Broman <klara.broman@nateko.lu.se>, Jonathan Schenk <jonathan.schenk@nateko.lu.se>
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Requires-Dist: dacite==1.8.1
Requires-Dist: numpy>=1.23
Requires-Dist: requests==2.31
Project-URL: Source, https://github.com/ICOS-Carbon-Portal/data/tree/master/src/main/python/icoscp_core

# icoscp_core

A foundational ICOS Carbon Portal core products Python library for metadata and data access, designed to work with multiple data repositories who use ICOS Carbon Portal core server software stack to host and serve their data. At the moment, three repositories are supported: [ICOS](https://data.icos-cp.eu/portal/), [SITES](https://data.fieldsites.se/portal/), and [ICOS Cities](https://citydata.icos-cp.eu/portal/).

## Design goals

- good alignment with the server APIs
- offer basic functionality, but in a robust way, and without sacrifices in performance
- avoid unnecessary dependencies (only depend on `numpy` and a small library `dacite`), but aim for good integration with `pandas`
- provide a solid foundation for future versions of [icoscp](https://pypi.org/project/icoscp/)&mdash;an ICOS-specific meta- and data access library developed by the Elaborated Products team
- extensive use of type annotations and Python data classes, to safeguard agains preventable bugs, both in the library itself, and in the tools and apps written on top of it; a goal is to satisfy the typechecker in strict mode
- usage of autogenerated data classes produced from Scala back end code representing various metadata entities (e.g. data objects, stations) and their parts
- simultaneous support of three cross-cutting concerns:
	- multiple repositories (ICOS, SITES, ICOS Cities)
	- multiple ways of authentication
	- data access through the HTTP API (on an arbitrary machine) and through file system (on a Jupyter notebook with "backdoor" data access); in the latter case the library is responsible for reporting the data usage event.

## Getting started

The library is available on PyPI, can be installed with `pip`:
```Bash
$ pip install icoscp_core
```

**The code examples below are usually provided for ICOS. For other Repositories (SITES or ICOS Cities), in the import directives, use `icoscp_core.sites` or `icoscp_core.cities`, respectively, instead of `icoscp_core.icos`.**

## Authentication

Metadata access does not require authentication, and is achieved by a simple import:
```Python
from icoscp_core.icos import meta
```
When using the library on an accordingly configured Jupyter notebook service hosted by the ICOS Carbon Portal (https://exploretest.icos-cp.eu/ at the time of this writing), authentication is not required for certain kinds of data access (specifically methods `get_columns_as_arrays` and `batch_get_columns_as_arrays`).

Authentication can be initialized in a number of ways.

### Credentials and token cache file (default)

This approach should only be used on machines the developer trusts.

A username/password account with the respective authentication service (links for: [ICOS](https://cpauth.icos-cp.eu/), [SITES](https://auth.fieldsites.se/), [ICOS Cities](https://cityauth.icos-cp.eu/)) is required for this. Obfuscated (not readable by humans) password is stored in a file on the local machine in a default user-specific folder. To initialize this file, run the following code interactively (only needs to be once for every machine):

```Python
from icoscp_core.icos import auth

auth.init_config_file()
```

After the initialization step is done, access to the metadata and data services is achieved by a simple import:
```Python
from icoscp_core.icos import meta, data
```

As an alternative, the developer may choose to use a specific file to store the credentials and token cache. In this scenario, `data` service needs to be initialized as follows:

```Python
from icoscp_core.icos import bootstrap
auth, meta, data = bootstrap.fromPasswordFile("<desired path to the file>")

# the next line needs to be run interactively (only once per file)
auth.init_config_file()
```

### Static authentication token (prototyping)

This option is good for testing, on a public machine or in general. Its only disadvantage is that the tokens have limited period of validity (100000 seconds, less than 28 hours), but this is precisely what makes it acceptable to include them directly in the Python source code.

The token can be obtained from the "My Account" page (links for: [ICOS](https://cpauth.icos-cp.eu/), [SITES](https://auth.fieldsites.se/), [ICOS Cities](https://cityauth.icos-cp.eu/)), which can be accessed by logging in using one of the supported authentication mechanisms (username/password, university sign-in, OAuth sign in). After this the bootstrapping can be done as follows:

```Python
from icoscp_core.icos import bootstrap
cookie_token = 'cpauthToken=WzE2OTY2NzQ5OD...'
meta, data = bootstrap.fromCookieToken(cookie_token)
```

### Explicit credentials (advanced option)

The user may choose to use their own mechanism of providing the credentials to initialize the authentication. This should be considered as an advanced option. **(Please do not put your password as clear text in your Python code!)** This can be achieved as follows:

```Python
from icoscp_core.icos import bootstrap
meta, data = bootstrap.fromCredentials(username_variable, password_containing_variable)
```

---

## Metadata access

```Python
from icoscp_core.icos import meta, ATMO_STATION
from icoscp_core.metaclient import TimeFilter, SizeFilter, SamplingHeightFilter

# fetches the list of known data types, including metadata associated with them
all_datatypes = meta.list_datatypes()

# data types with structured data access
previewable_datatypes = [dt for dt in all_datatypes if dt.has_data_access]

# fetch lists of stations
icos_stations = meta.list_stations()
atmo_stations = meta.list_stations(ATMO_STATION)
all_known_stations = meta.list_stations(False)

# list data objects; a contrived, complicated example to demonstrate the possibilities
filtered_atc_co2 = meta.list_data_objects(
	datatype = [
		"http://meta.icos-cp.eu/resources/cpmeta/atcCo2L2DataObject",
		"http://meta.icos-cp.eu/resources/cpmeta/atcCo2NrtGrowingDataObject"
	],
	station = "http://meta.icos-cp.eu/resources/stations/AS_GAT",
	filters = [
		TimeFilter("submTime", ">", "2023-07-01T12:00:00Z"),
		TimeFilter("submTime", "<", "2023-07-10T12:00:00Z"),
		SizeFilter(">", 50000),
		SamplingHeightFilter("=", 216)
	],
	include_deprecated = True,
	order_by = "fileName",
	limit = 50
)

# get detailed metadata for a data object
dobj_uri = 'https://meta.icos-cp.eu/objects/BbEO5i3rDLhS_vR-eNNLjp3Q'
dobj_detailed_meta = meta.get_dobj_meta(dobj_uri)
```

Detailed help on the available metadata access methods can be obtained from `help(meta)` call.

---

## Data access
To fetch data (after having located interesting data objects in the previous step):

```Python
from icoscp_core.icos import data
import pandas as pd

# save the original data object contents to a folder on your machine
filename = data.save_to_folder(dobj_uri, '/myhome/icosdata/')

# get CSV representation of all previewable columns, parse it with pandas
csv_stream = data.get_csv_byte_stream(dobj_uri)
df = pd.read_csv(csv_stream)

# get dataset columns as typed arrays, ready to be imported into pandas
dobj_arrays = data.get_columns_as_arrays(dobj_detailed_meta)
df = pd.DataFrame(dobj_arrays)

# efficiently batch-fetch multiple data objects
multi_dobjs = data.batch_get_columns_as_arrays(filtered_atc_co2)
multi_df = ( (dobj, pd.DataFrame(arrs)) for dobj, arrs in multi_dobjs)
```


Downloading the original object is possible for all data objects. Structured data access, however, is limited to data objects whose data types' `has_data_access` property equals `True`.

