Quickstart
Save and Load
Format is auto-detected from the file extension:
from scitex_io import save, load
# DataFrames
import pandas as pd
df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
save(df, "data.csv")
loaded = load("data.csv")
# NumPy arrays
import numpy as np
save(np.array([1, 2, 3]), "data.npy")
# Dictionaries
save({"key": "value"}, "config.yaml")
save({"nested": [1, 2]}, "data.json")
# Any Python object
save({"complex": object()}, "data.pkl")
One function for save, one for load — 30+ formats work the same way.
List Available Formats
from scitex_io import list_formats
formats = list_formats()
print(f"Save: {len(formats['save']['builtin'])} built-in formats")
print(f"Load: {len(formats['load']['builtin'])} built-in formats")
Save: 24 built-in formats
Load: 29 built-in formats
Custom Format Registration
Register handlers for any file extension:
from scitex_io import register_saver, register_loader, save, load
@register_saver(".tsv3")
def save_tsv3(obj, path, **kwargs):
"""Save with 3-space-separated values."""
with open(path, "w") as f:
for row in obj:
f.write(" ".join(str(v) for v in row) + "\n")
@register_loader(".tsv3")
def load_tsv3(path, **kwargs):
"""Load 3-space-separated values."""
with open(path) as f:
return [line.strip().split(" ") for line in f]
# Now .tsv3 works like any built-in format
save([[1, 2], [3, 4]], "data.tsv3")
assert load("data.tsv3") == [["1", "2"], ["3", "4"]]
Note
User-registered handlers take priority over built-in ones for the same extension. This lets you override default behavior without modifying the library.
Caching
Repeated loads are cached automatically:
from scitex_io import load, get_cache_info, clear_load_cache
data1 = load("large_file.hdf5") # reads from disk
data2 = load("large_file.hdf5") # returns cached copy (instant)
info = get_cache_info()
print(f"Cache hits: {info['hits']}, misses: {info['misses']}")
clear_load_cache() # free memory