Metadata-Version: 2.1
Name: spelunk
Version: 0.1.1
Summary: Package with helpful object recursion utils
Home-page: https://github.com/tomarken/spelunk
License: MIT
Author: Spencer Tomarken
Author-email: stomarken@gmail.com
Requires-Python: >=3.9,<3.10
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Project-URL: Repository, https://github.com/tomarken/spelunk
Description-Content-Type: text/markdown

# spelunk
`spelunk` is a module containing tools for recursively exploring python objects. Here are a few examples.

### 1. Printing an object's tree


Ex:
  ```python
from spelunk import print_obj_tree
  
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
print_obj_tree(root_obj=obj)

# ROOT -> {'key': [1, ...]}
# ROOT['key'] -> [1, ...]
# ROOT['key'][0] -> 1
# ROOT['key'][1] -> (2.0,)
# ROOT['key'][1][0] -> 2.0
# ROOT['key'][2] -> {3}
# ROOT['key'][2]{id=4431022448} -> 3
# ROOT['key'][3] -> frozenset({4})
# ROOT['key'][3]{id=4431022480} -> 4
# ROOT['key'][4] -> {'subkey': [(1,)]}
# ROOT['key'][4]['subkey'] -> [(1,)]
# ROOT['key'][4]['subkey'][0] -> (1,)
# ROOT['key'][4]['subkey'][0][0] -> 1
  ```
* The root object is referred to as `ROOT`. 
* Attributes are denoted with `ROOT.attr`.
* Keys from mappings are denoted with `ROOT['key']`.
* Indices from sequences are denoted with `ROOT[idx]`.
* Elements of sets and frozensets are indicated by their id in memory with `ROOT{id=10012}`. 
* Elements of a `ValuesView` are indicated by their id in memory with `ROOT{ValuesView_id=10012}`. (These are not common.)

The previous notations will be recursively chained together. For example, the path 
`ROOT['key'][2]` indicates that in order to access the corresponding object `{3}`, we would 
use `root_obj['key'][2]`. For sets it is a bit more difficult due to the need to inspect by id. To
access `4` via `ROOT['key'][3]{id=4431022480}` we would iterate through `root_obj['key'][3]` until we found a
matching id:
  ```python
for elem in root_obj['key'][3]:
    if id(elem) == 4431022480:
      break
      
print(elem)
# 4
  ```

Fortunately, for getting references and manipulating elements of `root_obj`, there are additional tools that 
avoid needing to tediously address and iterate (see below). 


Before moving on, it's worth pointing out you can also sort by element and/or by path name by supplying 
callables `element_test` and `path_test` that determine whether an element or path is interesting 
(by default they always return True). `element_test` operates on the element itself and returns a bool. 
`path_test` operates on either the most recent string (for attributes, mapping keys) or integer 
(for sequence indices, memory ids of element of sets) of the current path and returns a bool.
For example, if you're at `root_obj['key']` with path `ROOT['key']`, it would pass `key` to the input of `path_test`
and `[1, (2,), ...]` to `element_path`.

  ```python
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
print_obj_tree(root_obj=obj, element_test=lambda x: isinstance(x, float))

# ROOT['key'][1][0] -> 2.0
  ```
  ```python
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
print_obj_tree(root_obj=obj, path_test=lambda x: x=='subkey')  

# ROOT['key'][4]['subkey'] -> [(1,)]
  ```

### 2. Getting the values and paths of objects
To get a dictionary of objects filtered by element/path and keyed by full path string, use `get_elements`:
```python
from spelunk import get_elements
  
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
get_elements(root_obj=obj, element_test=lambda x: isinstance(x, frozenset))

# {"ROOT['key'][3]": frozenset({4})}

get_elements(root_obj=obj, element_test=lambda x: isinstance(x, dict))
# {
#   'ROOT':           {'key': [1, (2.0,), {3}, frozenset({4}), {'subkey': [(1,)]}]}, 
#   "ROOT['key'][4]": {'subkey': [(1,)]}
# }
```

### 3. Overwriting elements 
To overwrite elements use `overwrite_elements`:
```python
from spelunk import overwrite_elements

obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
overwrite_elements(
    root_obj=obj, 
    overwrite_value=None, 
    element_test=lambda x: isinstance(x, tuple)
)
print(obj)

# {'key': [1, None, {3}, frozenset({4}), {'subkey': [None]}]}
```
Overwriting will fail if attempting to overwrite an immutable container. 


Ex: 
```python
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
overwrite_elements(
    root_obj=obj, 
    overwrite_value=None, 
    element_test=lambda x: isinstance(x, int)
)
print(obj)

# Failed to overwrite [(<Address.MUTABLE_MAPPING_KEY: 'MutableMappingKey'>, ...
# Exception: Cannot overwrite immutable collections.
# Traceback (most recent call last):
# ...
# TypeError: Cannot overwrite immutable collections.
```
Error messages can be silenced with `silent=True` and exceptions can be dismissed with 
`raise_on_exception`.
```python
obj = {'key': [1, (2.0,), {3}, frozenset((4,)), {'subkey': [(1,)]}]}
overwrite_elements(
    root_obj=obj, 
    overwrite_value=None, 
    element_test=lambda x: isinstance(x, int),
    silent=True,
    raise_on_exception=False
)
print(obj)

# {'key': [None, (2.0,), {None}, frozenset({4}), {'subkey': [(1,)]}]}
```

### 4. Hot swapping
If you need to temporarily overwrite an object's contents with replacement 
values and then restore the original values, there is a context manager `hot_swap` that achieves this. 
As an example, say you had an object that contained threading locks and you wanted to make a deepcopy in 
order to manipulate but preserve the original. The deepcopy will fail on the original object due to the fact 
that thread locks are not serializable. With `hot_swap`, you can safely overwrite the non-serializable elements
with something safe, perform the deepcopy, and then restore the original elements.

```python
from spelunk import hot_swap
from _thread import LockType
from threading import Lock
from copy import deepcopy

lock_0 = Lock()
lock_1 = Lock()
obj = {'key': [1, lock_0, {3}, frozenset((4,)), {'subkey': [(1,)]}], 'other_lock': lock_1}

print(obj)
# {
#   'key': [1, <unlocked _thread.lock object at 0x104a7b870>, {3}, frozenset({4}), {'subkey': [(1,)]}], 
#  'other_lock': <unlocked _thread.lock object at 0x104a7b840>
# }

obj_deepcopy = deepcopy(obj)
# Traceback (most recent call last):
# ...
# TypeError: cannot pickle '_thread.lock' object

with hot_swap(root_obj=obj, overwrite_value='lock', element_test=lambda x: isinstance(x, LockType)):
    obj_deepcopy = deepcopy(obj)

print(obj_deepcopy)
# {'key': [1, 'lock', {3}, frozenset({4}), {'subkey': [(1,)]}], 'other_lock': 'lock'}

print(obj)
# {
#   'key': [1, <unlocked _thread.lock object at 0x104a7b870>, {3}, frozenset({4}), {'subkey': [(1,)]}], 
#  'other_lock': <unlocked _thread.lock object at 0x104a7b840>
# }
```

If performing a `hot_swap` on a `root_obj` would involve attempting to mutate an immutable collection, an exception
will be thrown before any modifications occur (even legal mutations) to leave `root_obj` unchanged. 
Additionally, by default, it will throw an exception before any attempt to hot swap an element of a mutable set because 
this cannot be performed reliably. Imagine swapping all `int` for `None` in `{1, 2, 3, None}` -> `{None}`. It is then ambiguous to determine which 
elements of the new set should be restored. By default, hot swapping is not allowed with sets, however,
if you know it can be performed safely you can use the flag `allow_mutable_set_mutations`. For example,
the set `{1}` could be safely hot swapped to `{None}` and restored due to the fact that the cardinality is unchanged.

## Details
### `__slots__`
`spelunk` fully support objects that define `__slots__` (as well as `__dict__` simultaneously). For each
object that isn't an ignored type or an instance of a `Collection`, the object's MRO is looked up and 
each parent class is queried for possible contents of `__slots__` in order to capture those from inherited classes. 
These attributes are collected together (along with the contents of the instance's `obj.__dict__`). Note that 
although we search for `__slots__` (a class attribute), we do not include the object `__slots__` in our exploration 
because this is a class attribute, not an instance attribute. This changes if we pass a class `cls` as `root_obj`. Here,
`cls.__dict__` contains all of the attached methods and class attributes (including `__slots__` and the content within).
Here, we never inherit `__slots__` contents from parent attributes because for any class `cls`, `cls.__class__` is `type` 
and `type.__mro__` is `(<class 'type'>, <class 'object'>)`. Neither `type` nor `object` define `__slots__`.


Ex:
```python
from spelunk import print_obj_tree

class A:
    important = "important"
    __slots__ = '__dict__', 'val'
    def __init__(self, val):
        self.val = val
        self.other = 'other'

print_obj_tree(A(1))
# ROOT -> <__main__.A object at 0x10a3dcdc0>
# ROOT.other -> 'other'
# ROOT.__dict__ -> {'other': 'other'}
# ROOT.__dict__['other'] -> 'other'
# ROOT.val -> 1
# ...
```
We can see that both the contents of `__slots__` (which containts `__dict__`) and `__dict__` attributes are captured but the 
class attribute `important` is not. However, the class itself can be inspected:
```python
print_obj_tree(A)
# ROOT -> <class '__main__.A'>
# ROOT.__module__ -> '__main__'
# ROOT.important -> 'important'
# ROOT.__slots__ -> ('__dict__', ...)
# ROOT.__slots__[0] -> '__dict__'
# ROOT.__slots__[1] -> 'val'
# ...
```

### Memoization
`spelunk` utilizes memoization by caching previously seen objects in a memoization dictionary during searches. It will not print new paths for 
objects which refer to the same place in memory. This is not only important for speed but also to prevent potential infinite recursive loops. There
is one important class of exceptions. In CPython, certain types of objects always share the same memory location (e.g. certain integers, strings)
regardless of how they're initialized. For a conservative approach, all instances of `(Number, str, ByteString)` are prevented from caching
so that each object's path is memorialized.

### Ignored Collections
`spelunk` intentionally ignores `Collections` that are instances of `(str, ByteString)`. This prevents string-like objects from being broken down by char which is usually not the preferred behavior.

## Installation
If you prefer using `pyenv` and `Poetry` (or have no preference), the `Makefile` provides installation support. Make sure `conda` is deactivated fully (not even `base` active) and `pyenv` is not running a shell. 
1. Run `make install-python` to install `pyenv` (if not present) and then use `pyenv` to install the specific version of `python`.
2. Run `make install-poetry` to install `Poetry` if not already present. 
3. Run `make install-repo` to create a virtual environment `spelunk` stored in `spelunk/.venv` and use `Poetry` to install all dependencies.
4. To use the environment simply run `source .venv/bin/activate`.
5. To deactivate simply run `deactivate`.

If you have a different package management system:
1. Create a virtual environment.
2. Either install using `Poetry` or use external tools to convert `poetry.lock` to a `requirements.txt` and `pip install`.


## Developing
For contributors, kindly use the `Makefile` to perform formatting, linting, and unit testing locally.
1. Run `make style-check` to dry-run `black` formatting changes.
2. Run `make format` to format with `black`.
3. Run `make lint` to lint with `flake8`.
4. Run `make unit-test` to run `pytest` and check the coverage report. 

