Metadata-Version: 2.1
Name: noko
Version: 0.3.0
Summary: Log first, ask questions later.
Author-email: "K.R. Zentner" <krzentner@gmail.com>
License: MIT License
        
        Copyright (c) 2023 K.R. Zentner
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/krzentner/noko
Project-URL: Bug Reports, https://github.com/krzentner/noko/issues
Project-URL: Source, https://github.com/krzentner/noko
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: System :: Distributed Computing
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: git
Requires-Dist: GitPython; extra == "git"
Provides-Extra: wandb
Requires-Dist: wandb; extra == "wandb"
Provides-Extra: torch
Requires-Dist: torch; extra == "torch"
Provides-Extra: pyarrow
Requires-Dist: pyarrow; extra == "pyarrow"
Provides-Extra: tensorboardx
Requires-Dist: tensorboardX; extra == "tensorboardx"
Provides-Extra: dev
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Provides-Extra: recommended
Requires-Dist: GitPython; extra == "recommended"
Requires-Dist: pyarrow; extra == "recommended"
Requires-Dist: tensorboardX; extra == "recommended"

# noko

[\[api reference\]](https://krzentner.github.io/noko/)

Log first, ask questions later.

`noko` is a numerical logging library designed for logging as many variables at once on as many timescales as possible.

The prototypical use case for `noko` is the following:

```python
loss.backward()
noko.log_row("grad_step", locals(), level=noko.TRACE)
optimizer.step()
```

Which would produce a log file "grad_step.csv", with information on all local variables at every gradient step.

## State of the Library

noko is quite useful already, but I'm continuing to refine the API, add new features as I find need of them, and fix bugs as I encounter them.

If you use the library at this time, you should expect API churn,
so depend on a specific version.

The current recommended install method is:

```
pip install "git+https://github.com/krzentner/noko.git@v0.1.1#egg=noko[recommended]"
```


## Key Idea

Most numerical logging libraries expect you to log one key at a time.
`noko` instead expects you to log one "row" at a time, into a named "table". In the above example, the table is named "grad_step", and the row is all local variables.

`noko` will then go through all values in the row, and determine how to summarize them.
For example, it will summarize a pytorch neural network using the min, max, mean, and std of each parameter, as well as those values for the gradient of each parameter (if available).

Once summarized, `noko` outputs that row to all `noko` backends that have a log level at least as sensitive as the provided level.

## Backends

Always available:
  - newline delimited json (.ndjson)
  - comma separated values (.csv)
  - pprint (usually only used for stdout)

Requires optional dependencies:
  - tensorboard (will use torch.utils.tensorboard or tensorboardX or tf.summary)
  - parquet (requires pyarrow)

## Row Summarization

`noko` runs a preprocessing step on each provided row before logging it.
This pre-process is typically *lossy*, and summarizes large input arrays into a few scalars.
This is necessary to be able to log all local variables, and is an unavoidable trade-off of `noko`'s design.
If you want to be able to exactly recover a value passed to `noko`, format it to a `str` or `bytes` first, and use the csv, ndjson, or parquet backends.

For built-in datatypes (dictionaries, lists, tuples), preprocessing runs recursively.
For commonly used numerical libraries (numpy, pytorch), there are adapter modules to summarize common datatypes.

If you want your type to be flattened, you can define a custom summarizer. The standard way to do this is to use `declare_summarizer`, which can be used without modifying the source for the particular type being summarized.

Example of summarizing a custom type "MyPoint":

```python
from noko import declare_summarizer, summarize, ScalarTypes

@declare_summarizer(MyPoint):
def summarize_mypoint(point, prefix: str, dst: dict[str, ScalarTypes]):
    # Include x and y fields directly
    dst[f"{prefix}.x"] = point.x
    dst[f"{prefix}.y"] = point.y
    # Recursively summarize metadata
    summarize(point.metadata, f"{prefix}.metadata", dst)
```

Sometimes you may have local variables that don't make sense to log in `noko` (e.g. long lists).
In that case the recommended pattern is:

```python
row = locals()
del row["my_long_list"]
noko.log_row("my_locals", row)
```

## Extra Utilities

`noko` has some other useful tricks for reproducibility which can be accessed via calling `noko.init_extra()`.

These include:
  - wandb sync and config logging (if `wandb` is installed).
  - automatically setting the global seed values for most
    libraries if present in the config
  - automatically creating a git commit of the current code on
    a separate branch `noko-checkpoints`, and adding diffs
    using that checkpoint to the log directory (requires `GitPython` to be installed).
