scitex_io

scitex-io: Universal scientific data I/O with plugin registry.

Supports 30+ formats out of the box. Register custom handlers via:

from scitex_io import register_saver, register_loader

@register_saver(".myformat")
def save_myformat(obj, path, **kw): ...

@register_loader(".myformat")
def load_myformat(path, **kw): ...
scitex_io.register_saver(ext, fn=None, *, builtin=False)[source]

Register a save handler for a file extension.

Can be used as a decorator or called directly:

@register_saver(".json")
def my_json_saver(obj, path, **kwargs): ...

register_saver(".json", my_json_saver)
Parameters:
  • ext (str) – File extension (e.g., “.json”, “json” — dot is optional).

  • fn (Callable, optional) – Handler function (obj, path, **kwargs) -> None. If None, returns a decorator.

  • builtin (bool) – If True, registers as built-in (lower priority). User registrations always override built-ins.

scitex_io.register_loader(ext, fn=None, *, builtin=False)[source]

Register a load handler for a file extension.

Same API as register_saver().

Parameters:
  • ext (str) – File extension (e.g., “.json”, “json” — dot is optional).

  • fn (Callable, optional) – Handler function (path, **kwargs) -> Any.

  • builtin (bool) – If True, registers as built-in (lower priority).

scitex_io.get_saver(ext)[source]

Look up a save handler. User overrides take priority.

Return type:

Optional[Callable]

scitex_io.get_loader(ext)[source]

Look up a load handler. User overrides take priority.

Return type:

Optional[Callable]

scitex_io.list_formats()[source]

List all registered formats.

Returns:

``{“save”: {“builtin”: […], “user”: […]},

”load”: {“builtin”: […], “user”: […]}}``

Return type:

dict

scitex_io.unregister_saver(ext)[source]

Remove a user-registered saver. Returns True if found.

Return type:

bool

scitex_io.unregister_loader(ext)[source]

Remove a user-registered loader. Returns True if found.

Return type:

bool

scitex_io.save(obj, specified_path, makedirs=True, verbose=True, symlink_from_cwd=False, symlink_to=None, dry_run=False, no_csv=False, use_caller_path=False, **kwargs)[source]

Save an object to a file with the specified format.

Parameters:
  • obj (Any) – The object to be saved.

  • specified_path (Union[str, Path]) – The file name or path where the object should be saved.

  • makedirs (bool, optional) – If True, create the directory path if it does not exist. Default is True.

  • verbose (bool, optional) – If True, print a message upon successful saving. Default is True.

  • symlink_from_cwd (bool, optional) – If True, create a symlink from the current working directory. Default is False.

  • symlink_to (Union[str, Path], optional) – If specified, create a symlink at this path pointing to the saved file.

  • dry_run (bool, optional) – If True, simulate the saving process without writing files. Default is False.

  • no_csv (bool, optional) – If True, skip CSV export for image saves. Default is False.

  • use_caller_path (bool, optional) – If True, skip internal library frames for path detection. Default is False.

  • **kwargs – Additional keyword arguments to pass to the underlying save function.

Returns:

Path to saved file on success, False on error.

Return type:

Path or None

scitex_io.load(lpath, ext=None, show=False, verbose=False, cache=True, **kwargs)[source]

Load data from various file formats.

This function supports loading data from multiple file formats with optional caching.

Parameters:
  • lpath (Union[str, Path]) – The path to the file to be loaded. Can be a string or pathlib.Path object.

  • ext (str, optional) – File extension to use for loading. If None, automatically detects from filename. Useful for files without extensions (e.g., UUID-named files). Examples: ‘pdf’, ‘json’, ‘csv’

  • show (bool, optional) – If True, display additional information during loading. Default is False.

  • verbose (bool, optional) – If True, print verbose output during loading. Default is False.

  • cache (bool, optional) – If True, enable caching for faster repeated loads. Default is True.

  • **kwargs (dict) – Additional keyword arguments to be passed to the specific loading function.

Returns:

The loaded data object, which can be of various types depending on the input file format.

Return type:

object

Raises:
  • ValueError – If the file extension is not supported.

  • FileNotFoundError – If the specified file does not exist.

  • Supported Extensions

  • -------------------

  • - Data formats – .csv, .tsv, .xls, .xlsx, .xlsm, .xlsb, .json, .yaml, .yml:

  • - Scientific – .npy, .npz, .mat, .hdf5, .con:

  • - ML/DL – .pth, .pt, .cbm, .joblib, .pkl:

  • - Documents – .txt, .log, .event, .md, .docx, .pdf, .xml:

  • - Images – .jpg, .png, .tiff, .tif:

  • - EEG data – .vhdr, .vmrk, .edf, .bdf, .gdf, .cnt, .egi, .eeg, .set:

  • - Database – .db:

Examples

>>> data = load('data.csv')
>>> image = load('image.png')
>>> model = load('model.pth')
>>> # Load file without extension (e.g., UUID PDF)
>>> pdf = load('f2694ccb-1b6f-4994-add8-5111fd4d52f1', ext='pdf')
scitex_io.load_configs(IS_DEBUG=None, show=False, verbose=False, config_dir=None)[source]

Load YAML configuration files from specified directory.

Parameters:
  • IS_DEBUG (bool, optional) – Debug mode flag. If None, reads from IS_DEBUG.yaml

  • show (bool) – Show configuration changes

  • verbose (bool) – Print detailed information

  • config_dir (Union[str, Path], optional) – Directory containing configuration files. Can be a string or pathlib.Path object. Defaults to “./config” if None

Returns:

Merged configuration dictionary

Return type:

DotDict

scitex_io.glob(expression, parse=False, ensure_one=False)[source]

Perform a glob operation with natural sorting and extended pattern support.

This function extends the standard glob functionality by adding natural sorting and support for curly brace expansion in the glob pattern.

Parameters:

expressionUnion[str, Path]

The glob pattern to match against file paths. Can be a string or pathlib.Path object. Supports standard glob syntax and curly brace expansion (e.g., ‘dir/{a,b}/*.txt’).

parsebool, optional

Whether to parse the matched paths. Default is False.

ensure_onebool, optional

Ensure exactly one match is found. Default is False.

Returns:

: Union[List[str], Tuple[List[str], List[dict]]]

If parse=False: A naturally sorted list of file paths If parse=True: Tuple of (paths, parsed results)

Examples:

>>> glob('data/*.txt')
['data/file1.txt', 'data/file2.txt', 'data/file10.txt']
>>> glob('data/{a,b}/*.txt')
['data/a/file1.txt', 'data/a/file2.txt', 'data/b/file1.txt']
>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True)
>>> paths
['data/subj_001/run_01.txt', 'data/subj_001/run_02.txt']
>>> parsed
[{'id': '001', 'run': '01'}, {'id': '001', 'run': '02'}]
>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True, ensure_one=True)
AssertionError  # if more than one file matches
scitex_io.parse_glob(expression, ensure_one=False)[source]

Convenience function for glob with parsing enabled.

Parameters:

expressionUnion[str, Path]

The glob pattern to match against file paths. Can be a string or pathlib.Path object.

ensure_onebool, optional

Ensure exactly one match is found. Default is False.

Returns:

: Tuple[List[str], List[dict]]

Matched paths and parsed results.

Examples:

>>> paths, parsed = pglob('data/subj_{id}/run_{run}.txt')
>>> paths
['data/subj_001/run_01.txt', 'data/subj_001/run_02.txt']
>>> parsed
[{'id': '001', 'run': '01'}, {'id': '001', 'run': '02'}]
>>> paths, parsed = pglob('data/subj_{id}/run_{run}.txt', ensure_one=True)
AssertionError  # if more than one file matches
scitex_io.reload(module_or_func, verbose=False)[source]

Reload a module or the module containing a given function.

This function attempts to reload a module directly if a module is passed, or reloads the module containing the function if a function is passed. This is useful during development to reflect changes without restarting the Python interpreter.

Parameters:

module_or_funcmodule or function

The module to reload, or a function whose containing module should be reloaded.

verbosebool, optional

If True, print additional information during the reload process. Default is False.

Returns:

: None

Raises:

Exception

If the module cannot be found or if there’s an error during the reload process.

Notes:

  • Reloading modules can have unexpected side effects, especially for modules that maintain state or have complex imports. Use with caution.

  • This function modifies sys.modules, which affects the global state of the Python interpreter.

Examples:

>>> import my_module
>>> reload(my_module)
>>> from my_module import my_function
>>> reload(my_function)
scitex_io.flush(sys=<module 'sys' (built-in)>)[source]

Flushes the system’s stdout and stderr, and syncs the file system. This ensures all pending write operations are completed.

scitex_io.cache(id, *args)[source]

Store or fetch data using a pickle file.

This function provides a simple caching mechanism for storing and retrieving Python objects. It uses pickle to serialize the data and stores it in a file with a unique identifier. If the data is already cached, it can be retrieved without recomputation.

Parameters:

idstr

A unique identifier for the cache file.

*argsstr

Variable names to be cached or loaded.

Returns:

: tuple

A tuple of cached values corresponding to the input variable names.

Raises:

ValueError

If the cache file is not found and not all variables are defined.

Example:

>>> import scitex
>>> import numpy as np
>>>
>>> # Variables to cache
>>> var1 = "x"
>>> var2 = 1
>>> var3 = np.ones(10)
>>>
>>> # Saving
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)
>>>
>>> # Loading when not all variables are defined and the id exists
>>> del var1, var2, var3
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)
class scitex_io.H5Explorer(filepath, mode='r')[source]

Bases: object

Interactive HDF5 file explorer.

This class provides convenient methods to explore HDF5 files, inspect their structure, and load data.

Example

>>> explorer = H5Explorer('data.h5')
>>> explorer.explore()  # Display file structure
>>> data = explorer.load('group1/dataset1')  # Load specific dataset
>>> explorer.close()
__enter__()[source]

Context manager entry.

__exit__(exc_type, exc_val, exc_tb)[source]

Context manager exit.

__init__(filepath, mode='r')[source]

Initialize H5Explorer.

Parameters:
  • filepath (str) – Path to HDF5 file

  • mode (str) – File opening mode (‘r’ for read, ‘r+’ for read/write)

close()[source]

Close the HDF5 file.

explore(path='/', max_depth=None)[source]

Explore HDF5 file structure interactively.

Return type:

None

find(pattern, path='/')[source]

Find items matching pattern.

Parameters:
  • pattern (str) – Pattern to search for in item names

  • path (str) – Starting path for search

Return type:

List[str]

Returns:

List of paths matching the pattern

get(path)[source]

Alias for load() method for compatibility.

Parameters:

path (str) – Path to dataset or group in HDF5 file

Return type:

Any

Returns:

Data from the specified path

get_dtype(path)[source]

Get dtype of a dataset.

Parameters:

path (str) – Path to dataset

Return type:

Optional[dtype]

Returns:

Numpy dtype or None if not a dataset

get_info(path='/')[source]

Get information about an item.

Parameters:

path (str) – Path to item in HDF5 file

Return type:

Dict[str, Any]

Returns:

Dictionary with item information

get_shape(path)[source]

Get shape of a dataset.

Parameters:

path (str) – Path to dataset

Return type:

Optional[tuple]

Returns:

Shape tuple or None if not a dataset

keys(path='/')[source]

Get keys at specified path.

Parameters:

path (str) – Path in HDF5 file

Return type:

List[str]

Returns:

List of keys at the specified path

load(path)[source]

Load data from specified path.

Parameters:

path (str) – Path to dataset or group in HDF5 file

Return type:

Any

Returns:

Data from the specified path

show(path='/', max_depth=None, indent='', _current_depth=0)[source]

Display HDF5 file structure.

Parameters:
  • path (str) – Starting path in HDF5 file

  • max_depth (Optional[int]) – Maximum depth to explore (None for unlimited)

  • indent (str) – Indentation string (used internally)

  • _current_depth (int) – Current depth (used internally)

Return type:

None

scitex_io.explore_h5(filepath)[source]

Explore HDF5 file structure.

Parameters:

filepath (str) – Path to HDF5 file

Return type:

None

scitex_io.has_h5_key(h5_path, key, max_retries=3, action_on_corrupted='delete')[source]

Robust version of has_h5_key that handles corrupted files and lock conflicts.

class scitex_io.ZarrExplorer(storepath, mode='r')[source]

Bases: object

Interactive Zarr store explorer.

__init__(storepath, mode='r')[source]
explore(path='/', max_depth=None)[source]

Explore Zarr store structure.

has_key(path)[source]

Check if key exists (no locking issues!).

Return type:

bool

keys(path='/')[source]

Get keys at specified path.

Return type:

List[str]

load(path)[source]

Load data from specified path.

Return type:

Any

show(path='/', max_depth=None, indent='', _current_depth=0)[source]

Display Zarr store structure.

scitex_io.explore_zarr(storepath)[source]

Explore Zarr store structure.

Return type:

None

scitex_io.has_zarr_key(zarr_path, key)[source]

Check if key exists in Zarr store (no locking issues!).

Return type:

bool

scitex_io.get_cache_info()[source]

Get cache statistics and configuration.

Returns:

Cache information including stats and config

Return type:

Dict[str, Any]

scitex_io.configure_cache(enabled=None, max_size=None, verbose=None)[source]

Configure cache settings.

Parameters:
  • enabled (Optional[bool]) – Enable or disable caching

  • max_size (Optional[int]) – Maximum number of files to cache

  • verbose (Optional[bool]) – Enable verbose logging

Return type:

None

scitex_io.clear_load_cache()

Clear all cached data.

Return type:

None

scitex_io.save_image(obj, spath, **kwargs)[source]
scitex_io.save_text(obj, spath)

Save text content to a file.

Parameters:
  • obj (str) – The text content to save.

  • spath (str) – Path where the text file will be saved.

Return type:

None

scitex_io.save_mp4(fig, spath_mp4)
scitex_io.save_listed_dfs_as_csv(listed_dfs, spath_csv, indi_suffix=None, overwrite=False, verbose=False)
listed_dfs:

[df1, df2, df3, …, dfN]. They will be written vertically in the order.

spath_csv:

/hoge/fuga/foo.csv

indi_suffix:

At the left top cell on the output csv file, ‘{}’.format(indi_suffix[i]) will be added, where i is the index of the df.On the other hand, when indi_suffix=None is passed, only ‘{}’.format(i) will be added.

scitex_io.save_listed_scalars_as_csv(listed_scalars, spath_csv, column_name='_', indi_suffix=None, round=3, overwrite=False, verbose=False)

Puts to df and save it as csv

scitex_io.save_optuna_study_as_csv_and_pngs(study, sdir)[source]
scitex_io.json2md(obj, level=1)[source]
scitex_io.migrate_h5_to_zarr(h5_path, zarr_path=None, compressor='zstd', chunks=True, overwrite=False, show_progress=True, validate=True)[source]

Migrate HDF5 file to Zarr format.

Parameters:
  • h5_path (str or Path) – Path to input HDF5 file

  • zarr_path (str or Path, optional) – Path for output Zarr store. If None, uses h5_path with .zarr extension

  • compressor (str or compressor object, optional) – Compression to use: ‘zstd’, ‘lz4’, ‘gzip’, ‘blosc’, or None

  • chunks (bool or tuple, optional) – Chunking strategy. True for auto, False for no chunks, or specific shape

  • overwrite (bool, optional) – Whether to overwrite existing Zarr store

  • show_progress (bool, optional) – Whether to show migration progress

  • validate (bool, optional) – Whether to validate the migration by comparing shapes

Returns:

Path to created Zarr store

Return type:

str

scitex_io.migrate_h5_to_zarr_batch(h5_paths, output_dir=None, compressor='zstd', chunks=True, overwrite=False, parallel=False, n_workers=None)[source]

Migrate multiple HDF5 files to Zarr format.

Parameters:
  • h5_paths (list of str or Path) – List of HDF5 files to migrate

  • output_dir (str or Path, optional) – Directory for output Zarr stores

  • compressor (str or compressor object, optional) – Compression to use

  • chunks (bool or tuple, optional) – Chunking strategy

  • overwrite (bool, optional) – Whether to overwrite existing Zarr stores

  • parallel (bool, optional) – Whether to process files in parallel

  • n_workers (int, optional) – Number of parallel workers

Returns:

Paths to created Zarr stores

Return type:

list of str

Core I/O

scitex_io.save(obj, specified_path, makedirs=True, verbose=True, symlink_from_cwd=False, symlink_to=None, dry_run=False, no_csv=False, use_caller_path=False, **kwargs)[source]

Save an object to a file with the specified format.

Parameters:
  • obj (Any) – The object to be saved.

  • specified_path (Union[str, Path]) – The file name or path where the object should be saved.

  • makedirs (bool, optional) – If True, create the directory path if it does not exist. Default is True.

  • verbose (bool, optional) – If True, print a message upon successful saving. Default is True.

  • symlink_from_cwd (bool, optional) – If True, create a symlink from the current working directory. Default is False.

  • symlink_to (Union[str, Path], optional) – If specified, create a symlink at this path pointing to the saved file.

  • dry_run (bool, optional) – If True, simulate the saving process without writing files. Default is False.

  • no_csv (bool, optional) – If True, skip CSV export for image saves. Default is False.

  • use_caller_path (bool, optional) – If True, skip internal library frames for path detection. Default is False.

  • **kwargs – Additional keyword arguments to pass to the underlying save function.

Returns:

Path to saved file on success, False on error.

Return type:

Path or None

scitex_io.load(lpath, ext=None, show=False, verbose=False, cache=True, **kwargs)[source]

Load data from various file formats.

This function supports loading data from multiple file formats with optional caching.

Parameters:
  • lpath (Union[str, Path]) – The path to the file to be loaded. Can be a string or pathlib.Path object.

  • ext (str, optional) – File extension to use for loading. If None, automatically detects from filename. Useful for files without extensions (e.g., UUID-named files). Examples: ‘pdf’, ‘json’, ‘csv’

  • show (bool, optional) – If True, display additional information during loading. Default is False.

  • verbose (bool, optional) – If True, print verbose output during loading. Default is False.

  • cache (bool, optional) – If True, enable caching for faster repeated loads. Default is True.

  • **kwargs (dict) – Additional keyword arguments to be passed to the specific loading function.

Returns:

The loaded data object, which can be of various types depending on the input file format.

Return type:

object

Raises:
  • ValueError – If the file extension is not supported.

  • FileNotFoundError – If the specified file does not exist.

  • Supported Extensions

  • -------------------

  • - Data formats – .csv, .tsv, .xls, .xlsx, .xlsm, .xlsb, .json, .yaml, .yml:

  • - Scientific – .npy, .npz, .mat, .hdf5, .con:

  • - ML/DL – .pth, .pt, .cbm, .joblib, .pkl:

  • - Documents – .txt, .log, .event, .md, .docx, .pdf, .xml:

  • - Images – .jpg, .png, .tiff, .tif:

  • - EEG data – .vhdr, .vmrk, .edf, .bdf, .gdf, .cnt, .egi, .eeg, .set:

  • - Database – .db:

Examples

>>> data = load('data.csv')
>>> image = load('image.png')
>>> model = load('model.pth')
>>> # Load file without extension (e.g., UUID PDF)
>>> pdf = load('f2694ccb-1b6f-4994-add8-5111fd4d52f1', ext='pdf')
scitex_io.load_configs(IS_DEBUG=None, show=False, verbose=False, config_dir=None)[source]

Load YAML configuration files from specified directory.

Parameters:
  • IS_DEBUG (bool, optional) – Debug mode flag. If None, reads from IS_DEBUG.yaml

  • show (bool) – Show configuration changes

  • verbose (bool) – Print detailed information

  • config_dir (Union[str, Path], optional) – Directory containing configuration files. Can be a string or pathlib.Path object. Defaults to “./config” if None

Returns:

Merged configuration dictionary

Return type:

DotDict

scitex_io.glob(expression, parse=False, ensure_one=False)[source]

Perform a glob operation with natural sorting and extended pattern support.

This function extends the standard glob functionality by adding natural sorting and support for curly brace expansion in the glob pattern.

Parameters:

expressionUnion[str, Path]

The glob pattern to match against file paths. Can be a string or pathlib.Path object. Supports standard glob syntax and curly brace expansion (e.g., ‘dir/{a,b}/*.txt’).

parsebool, optional

Whether to parse the matched paths. Default is False.

ensure_onebool, optional

Ensure exactly one match is found. Default is False.

Returns:

: Union[List[str], Tuple[List[str], List[dict]]]

If parse=False: A naturally sorted list of file paths If parse=True: Tuple of (paths, parsed results)

Examples:

>>> glob('data/*.txt')
['data/file1.txt', 'data/file2.txt', 'data/file10.txt']
>>> glob('data/{a,b}/*.txt')
['data/a/file1.txt', 'data/a/file2.txt', 'data/b/file1.txt']
>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True)
>>> paths
['data/subj_001/run_01.txt', 'data/subj_001/run_02.txt']
>>> parsed
[{'id': '001', 'run': '01'}, {'id': '001', 'run': '02'}]
>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True, ensure_one=True)
AssertionError  # if more than one file matches
scitex_io.reload(module_or_func, verbose=False)[source]

Reload a module or the module containing a given function.

This function attempts to reload a module directly if a module is passed, or reloads the module containing the function if a function is passed. This is useful during development to reflect changes without restarting the Python interpreter.

Parameters:

module_or_funcmodule or function

The module to reload, or a function whose containing module should be reloaded.

verbosebool, optional

If True, print additional information during the reload process. Default is False.

Returns:

: None

Raises:

Exception

If the module cannot be found or if there’s an error during the reload process.

Notes:

  • Reloading modules can have unexpected side effects, especially for modules that maintain state or have complex imports. Use with caution.

  • This function modifies sys.modules, which affects the global state of the Python interpreter.

Examples:

>>> import my_module
>>> reload(my_module)
>>> from my_module import my_function
>>> reload(my_function)
scitex_io.flush(sys=<module 'sys' (built-in)>)[source]

Flushes the system’s stdout and stderr, and syncs the file system. This ensures all pending write operations are completed.

scitex_io.cache(id, *args)[source]

Store or fetch data using a pickle file.

This function provides a simple caching mechanism for storing and retrieving Python objects. It uses pickle to serialize the data and stores it in a file with a unique identifier. If the data is already cached, it can be retrieved without recomputation.

Parameters:

idstr

A unique identifier for the cache file.

*argsstr

Variable names to be cached or loaded.

Returns:

: tuple

A tuple of cached values corresponding to the input variable names.

Raises:

ValueError

If the cache file is not found and not all variables are defined.

Example:

>>> import scitex
>>> import numpy as np
>>>
>>> # Variables to cache
>>> var1 = "x"
>>> var2 = 1
>>> var3 = np.ones(10)
>>>
>>> # Saving
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)
>>>
>>> # Loading when not all variables are defined and the id exists
>>> del var1, var2, var3
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)

Registry

scitex_io.register_saver(ext, fn=None, *, builtin=False)[source]

Register a save handler for a file extension.

Can be used as a decorator or called directly:

@register_saver(".json")
def my_json_saver(obj, path, **kwargs): ...

register_saver(".json", my_json_saver)
Parameters:
  • ext (str) – File extension (e.g., “.json”, “json” — dot is optional).

  • fn (Callable, optional) – Handler function (obj, path, **kwargs) -> None. If None, returns a decorator.

  • builtin (bool) – If True, registers as built-in (lower priority). User registrations always override built-ins.

scitex_io.register_loader(ext, fn=None, *, builtin=False)[source]

Register a load handler for a file extension.

Same API as register_saver().

Parameters:
  • ext (str) – File extension (e.g., “.json”, “json” — dot is optional).

  • fn (Callable, optional) – Handler function (path, **kwargs) -> Any.

  • builtin (bool) – If True, registers as built-in (lower priority).

scitex_io.get_saver(ext)[source]

Look up a save handler. User overrides take priority.

Return type:

Optional[Callable]

scitex_io.get_loader(ext)[source]

Look up a load handler. User overrides take priority.

Return type:

Optional[Callable]

scitex_io.list_formats()[source]

List all registered formats.

Returns:

``{“save”: {“builtin”: […], “user”: […]},

”load”: {“builtin”: […], “user”: […]}}``

Return type:

dict

scitex_io.unregister_saver(ext)[source]

Remove a user-registered saver. Returns True if found.

Return type:

bool

scitex_io.unregister_loader(ext)[source]

Remove a user-registered loader. Returns True if found.

Return type:

bool

Cache Control

scitex_io.get_cache_info()[source]

Get cache statistics and configuration.

Returns:

Cache information including stats and config

Return type:

Dict[str, Any]

scitex_io.configure_cache(enabled=None, max_size=None, verbose=None)[source]

Configure cache settings.

Parameters:
  • enabled (Optional[bool]) – Enable or disable caching

  • max_size (Optional[int]) – Maximum number of files to cache

  • verbose (Optional[bool]) – Enable verbose logging

Return type:

None

scitex_io.clear_load_cache()

Clear all cached data.

Return type:

None

Explorers

class scitex_io.H5Explorer(filepath, mode='r')[source]

Interactive HDF5 file explorer.

This class provides convenient methods to explore HDF5 files, inspect their structure, and load data.

Example

>>> explorer = H5Explorer('data.h5')
>>> explorer.explore()  # Display file structure
>>> data = explorer.load('group1/dataset1')  # Load specific dataset
>>> explorer.close()
__init__(filepath, mode='r')[source]

Initialize H5Explorer.

Parameters:
  • filepath (str) – Path to HDF5 file

  • mode (str) – File opening mode (‘r’ for read, ‘r+’ for read/write)

__enter__()[source]

Context manager entry.

__exit__(exc_type, exc_val, exc_tb)[source]

Context manager exit.

close()[source]

Close the HDF5 file.

explore(path='/', max_depth=None)[source]

Explore HDF5 file structure interactively.

Return type:

None

show(path='/', max_depth=None, indent='', _current_depth=0)[source]

Display HDF5 file structure.

Parameters:
  • path (str) – Starting path in HDF5 file

  • max_depth (Optional[int]) – Maximum depth to explore (None for unlimited)

  • indent (str) – Indentation string (used internally)

  • _current_depth (int) – Current depth (used internally)

Return type:

None

keys(path='/')[source]

Get keys at specified path.

Parameters:

path (str) – Path in HDF5 file

Return type:

List[str]

Returns:

List of keys at the specified path

load(path)[source]

Load data from specified path.

Parameters:

path (str) – Path to dataset or group in HDF5 file

Return type:

Any

Returns:

Data from the specified path

get(path)[source]

Alias for load() method for compatibility.

Parameters:

path (str) – Path to dataset or group in HDF5 file

Return type:

Any

Returns:

Data from the specified path

get_info(path='/')[source]

Get information about an item.

Parameters:

path (str) – Path to item in HDF5 file

Return type:

Dict[str, Any]

Returns:

Dictionary with item information

find(pattern, path='/')[source]

Find items matching pattern.

Parameters:
  • pattern (str) – Pattern to search for in item names

  • path (str) – Starting path for search

Return type:

List[str]

Returns:

List of paths matching the pattern

get_shape(path)[source]

Get shape of a dataset.

Parameters:

path (str) – Path to dataset

Return type:

Optional[tuple]

Returns:

Shape tuple or None if not a dataset

get_dtype(path)[source]

Get dtype of a dataset.

Parameters:

path (str) – Path to dataset

Return type:

Optional[dtype]

Returns:

Numpy dtype or None if not a dataset

class scitex_io.ZarrExplorer(storepath, mode='r')[source]

Interactive Zarr store explorer.

__init__(storepath, mode='r')[source]
explore(path='/', max_depth=None)[source]

Explore Zarr store structure.

show(path='/', max_depth=None, indent='', _current_depth=0)[source]

Display Zarr store structure.

keys(path='/')[source]

Get keys at specified path.

Return type:

List[str]

load(path)[source]

Load data from specified path.

Return type:

Any

has_key(path)[source]

Check if key exists (no locking issues!).

Return type:

bool