# import numpy as np
# import anndata as ad
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from ez_zarr import hcs_wrappersUsing ez_zarr to explore Fractal outputs
Goal
The aim of ez_zarr is to provide easy, high-level access to high content screening microscopy data, stored as OME-Zarr filesets according to the NGFF specifications with additional metadata fields, for example the ones generated by the Fractal platform.
The goal is that users can write simple scripts working with plates, wells and fields of view, without having to understand how these are represented within an OME-Zarr fileset.
Installation
ez_zarr is hosted on GitHub at github.com/fmicompbio/ez_zarr and the documentation of can be found at fmicompbio.github.io/ez_zarr.
To install it, activate a conda environment, or create a new one (using python 3.9 or newer), and install ez_zarr from GitHub using:
#| eval: False
pip install git+ssh://git@github.com/fmicompbio/ez_zarr.git
Quickstart
Here are some examples of how you can use ez_zarr:
Load packages
Open OME-Zarr file set
We open a single OME-Zarr directory, typically representing a single plate:
zarr_directory = './231202-ExM-d0-A04-rbtKDEL.zarr'
plate = hcs_wrappers.FractalZarr(zarr_directory, name = 'ExM-d0')
plateFractalZarr ExM-d0
path: ./231202-ExM-d0-A04-rbtKDEL.zarr
n_wells: 1
n_channels: 4 (DAPI, 488, A03_C03, FFFF00)
n_pyramid_levels: 5
pyramid_zyx_scalefactor: [1. 2. 2.]
full_resolution_zyx_spacing: [30.0, 0.325, 0.325]
segmentations:
tables (measurements): FOV_ROI_table, well_ROI_table
plate represents an acquisition of 1 wells with 4 channels.
There are 5 pyramid_levels available (0 to 4), meaning that in addition to the full resolution data (level 0), we have four more levels that provide the data in 2-fold lower resolutions (see pyramid_zyx_scalefactor), for example for faster plotting.
This OME-Zarr fileset does not contain any segmentations, but some tables, such as FOV_ROI_table, which contains the coordinates of the fields of view.
Get information from plate
You can obtain information on the content available in the fileset:
# path to the OME-Zarr fileset
plate.get_path()'./231202-ExM-d0-A04-rbtKDEL.zarr'
# available channels
plate.get_channels()[{'active': True,
'coefficient': 1,
'color': '00FFFF',
'inverted': False,
'label': 'DAPI',
'wavelength_id': 'A01_C01',
'window': {'end': 6000, 'max': 65535, 'min': 0, 'start': 110}},
{'active': True,
'coefficient': 1,
'color': '008000',
'inverted': False,
'label': '488',
'wavelength_id': 'A02_C02'},
{'active': True,
'coefficient': 1,
'color': 'FF0000',
'inverted': False,
'label': 'A03_C03',
'wavelength_id': 'A03_C03'},
{'active': True,
'coefficient': 1,
'color': 'FFFF00',
'inverted': False,
'label': 'FFFF00',
'wavelength_id': 'A04_C04'}]
# available wells
plate.get_wells()
plate.get_wells(simplify=True)['A04']
# available tables
plate.get_table_names()['FOV_ROI_table', 'well_ROI_table']
# zyx pixel spacing in micrometers for pyramid levels
plate.level_zyx_spacing[[30.0, 0.325, 0.325],
[30.0, 0.65, 0.65],
[30.0, 1.3, 1.3],
[30.0, 2.6, 2.6],
[30.0, 5.2, 5.2]]
Extract a table from the OME-Zarr fileset
The tables are stored within the fileset as describe in the Fractal documentation.
As mentioned the goal of ez_zarr is to abstract the internal structure and make it simple to obtain these tables, if necessary accumulated over many wells, as a pandas.DataFrame :
df = plate.get_table(table_name='FOV_ROI_table')
df| well | x_micrometer | y_micrometer | z_micrometer | len_x_micrometer | len_y_micrometer | len_z_micrometer | x_micrometer_original | y_micrometer_original | |
|---|---|---|---|---|---|---|---|---|---|
| FieldIndex | |||||||||
| FOV_6 | A04 | 0.0 | 0.000000 | 0.0 | 650.0 | 650.0 | 120.0 | 4928.600098 | -1653.300049 |
| FOV_7 | A04 | 650.0 | 0.000000 | 0.0 | 650.0 | 650.0 | 120.0 | 5578.600098 | -1653.300049 |
| FOV_10 | A04 | 0.0 | 650.000061 | 0.0 | 650.0 | 650.0 | 120.0 | 4928.600098 | -1003.299988 |
| FOV_11 | A04 | 650.0 | 650.000061 | 0.0 | 650.0 | 650.0 | 120.0 | 5578.600098 | -1003.299988 |
or anndata.AnnData object:
ann = plate.get_table(table_name='FOV_ROI_table')
ann| well | x_micrometer | y_micrometer | z_micrometer | len_x_micrometer | len_y_micrometer | len_z_micrometer | x_micrometer_original | y_micrometer_original | |
|---|---|---|---|---|---|---|---|---|---|
| FieldIndex | |||||||||
| FOV_6 | A04 | 0.0 | 0.000000 | 0.0 | 650.0 | 650.0 | 120.0 | 4928.600098 | -1653.300049 |
| FOV_7 | A04 | 650.0 | 0.000000 | 0.0 | 650.0 | 650.0 | 120.0 | 5578.600098 | -1653.300049 |
| FOV_10 | A04 | 0.0 | 650.000061 | 0.0 | 650.0 | 650.0 | 120.0 | 4928.600098 | -1003.299988 |
| FOV_11 | A04 | 650.0 | 650.000061 | 0.0 | 650.0 | 650.0 | 120.0 | 5578.600098 | -1003.299988 |
Get an overview of a full well
get_image_ROI() extracts a rectangular region of interest (ROI) from a well image by coordinates. If no coordinates or other arguments are given, it returns the whole well at the lowest available resolution:
img = plate.get_image_ROI()
print(img.shape) # (ch, z, y, x)
with plt.style.context('dark_background'):
fig = plt.figure(figsize=(4, 4))
fig.set_dpi(100)
plt.imshow(img[0,2], cmap='gray', vmin=100, vmax=200)
plt.title(plate.name + ' A04')
plt.show()
plt.close()(4, 4, 250, 250)
Get salient regions at higher resolution
get_image_grid_ROIs() allows you to split the well image into a rectangular grid of num_x-by-num_y regions, and automatically select the num_select top regions based on a given property (sample_method).
Here, we pick three regions with the highest signal in channel zero (DAPI):
coord_list, img_list = plate.get_image_grid_ROIs(well = 'A04',
pyramid_level=2,
num_x=10, num_y=10, num_select=3,
sample_method='sum',
channel=0)
coord_list # (y_start, y_end, x_start, x_end)
len(img_list)
with plt.style.context('dark_background'):
fig = plt.figure(figsize=(9,3))
fig.set_dpi(100)
for i in range(len(img_list)):
plt.subplot(1, 3, i + 1)
plt.imshow(img_list[i][0,0], cmap='gray', vmin=100, vmax=200)
plt.axis('off')
plt.title('A04 ' + str(coord_list[i]))
plt.show()
plt.close()Zoom in
Alternatively, we can zoom in by coordinates:
img = plate.get_image_ROI(well='A04',
upper_left_yx=(90, 50),
lower_right_yx=(230, 180))
print(img.shape)
with plt.style.context('dark_background'):
fig = plt.figure(figsize=(4, 4))
fig.set_dpi(100)
plt.imshow(img[0,0], cmap='gray', vmin=100, vmax=200)
plt.title(plate.name + ' A04')
plt.show()
plt.close()(4, 4, 141, 131)
The above automatically uses the lowest resolution (highest pyramid level) version of the data, and the pixel coordinates refer to that space.
If we want to get the same image at a higher resolution (pyramid_level=2), we need to convert them using the convert_pixel_to_pixel() function:
# convert coordinates from lowest-resolution (None, here: 4) to
# highest-resolution (0) pyramid level
# remark: coordinates are (z,y,x)
new_upper_left = plate.convert_pixel_to_pixel(zyx = (0, 90, 50),
pyramid_level_from = None,
pyramid_level_to=2)
new_lower_right = plate.convert_pixel_to_pixel(zyx = (0, 230, 180),
pyramid_level_from = None,
pyramid_level_to=2)
# extract image
img = plate.get_image_ROI(well='A04',
upper_left_yx=new_upper_left[1:],
lower_right_yx=new_lower_right[1:],
pyramid_level=2)
print(img.shape)
with plt.style.context('dark_background'):
fig = plt.figure(figsize=(4, 4))
fig.set_dpi(100)
plt.imshow(img[0,0], cmap='gray', vmin=100, vmax=200)
plt.title(plate.name + ' A04')
plt.show()
plt.close()(4, 4, 561, 521)
Coordinate conversions
As mentioned the pixel coordinates depend on the pyramid_level, but as the scaling factor between pyramid levels and also the pixel sizes in micrometers are known, it’s easy to convert between them:
zyx = (2, 230, 180) # highest pyramid level (4)
print('starting coordinates (pyramid_level=4): ' + str(zyx))
# convert from pixel to micrometers
zyx_um = plate.convert_pixel_to_micrometer(zyx, pyramid_level=4)
print('in micrometer: ' + str(zyx_um))
# convert from micrometers to pixels
zyx_px = plate.convert_micrometer_to_pixel(zyx_um, pyramid_level=4)
print('back in pixel: ' + str(zyx_px))
# convert pixels between pyramid levels
zyx_px_0 = plate.convert_pixel_to_pixel(zyx_px, pyramid_level_from=4, pyramid_level_to=0)
print('in pixel (pyramid_level=0): ' + str(zyx_px_0))starting coordinates (pyramid_level=4): (2, 230, 180)
in micrometer: (60.0, 1196.0, 936.0)
back in pixel: (2, 230, 180)
in pixel (pyramid_level=0): (2, 3680, 2880)
This makes it quite easy to add a scalebar to an image:
# calculate the pixel length of a scalebar
scalebar_micrometer = 100
scalebar_pixels = plate.convert_micrometer_to_pixel(zyx = (0, 0, scalebar_micrometer),
pyramid_level=2)[2]
print(str(scalebar_micrometer) + ' µm correspond to ' + str(scalebar_pixels) + ' pixel in x')
with plt.style.context('dark_background'):
fig = plt.figure(figsize=(4, 4))
fig.set_dpi(100)
plt.imshow(img[0,0], cmap='gray', vmin=100, vmax=200)
plt.title(plate.name + ' A04')
# add scalebar
# patches.Rectangle((x, y), width, height, edgecolor, facecolor)
rect = patches.Rectangle((30, 30), scalebar_pixels, 6, edgecolor='white', facecolor='white')
ax = plt.gca() # get current axes
ax.add_patch(rect) # add the patch to the Axes
# plt.text(x, y, 'Your text', color, ha='center')
plt.text(30 + scalebar_pixels / 2, 45, str(scalebar_micrometer) + ' µm', color='white', ha='center', va='top')
# show image
plt.show()
plt.close()100 µm correspond to 77 pixel in x
Calculate on the image data
To save memory, images are stored on disk and only loaded when needed (for implementation details see the dask Array documentation).
This can be demonstrated by type(img). To force loading of the data, you can call .compute():
print(type(img)) # note that the image is an 'on-disk' array
print(type(img.compute())) # ... and can be loaded into memory using .compute()<class 'dask.array.core.Array'>
<class 'numpy.ndarray'>
In general, you can use dask arrays like numpy arrays. For example, we can threshold the image:
# threshold
img_mask = img[0,0]>140
# ... and compute to numpy array
img_mask = img_mask.compute()
# plot
with plt.style.context('dark_background'):
fig = plt.figure(figsize=(12, 6))
fig.set_dpi(100)
# ... channel for segmentation mask
plt.subplot(1, 4, 1)
plt.title('Zoom-in image (DAPI)')
plt.imshow(img[0,0], vmin=100, vmax=200, cmap='gray')
plt.axis('off')
# ... threshold mask
plt.subplot(1, 4, 2)
plt.title('Mask (threshold on DAPI)')
# define a custom colormap for segmentation
my_cmap = plt.cm.colors.ListedColormap(['black', 'red'])
plt.imshow(img_mask, cmap=my_cmap)
plt.axis('off')
plt.show()
plt.close()This is of course not a recommended way to segment this image, but rather meant to demonstrate that the dask arrays can be used just like numpy arrays.
Session info
import session_info
session_info.show()Click to view session information
----- dask 2024.1.0 ez_zarr 0.1.1 matplotlib 3.8.2 pandas 1.5.3 session_info 1.0.0 -----
Click to view modules imported as dependencies
PIL 10.2.0 anndata 0.10.4 anyio NA arrow 1.3.0 asciitree NA asttokens NA attr 23.2.0 attrs 23.2.0 babel 2.14.0 certifi 2023.11.17 cffi 1.16.0 charset_normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 colorama 0.4.6 comm 0.2.1 cycler 0.12.1 cython_runtime NA dateutil 2.8.2 debugpy 1.8.0 decorator 5.1.1 defusedxml 0.7.1 distributed 2024.1.0 exceptiongroup 1.2.0 executing 2.0.1 fasteners 0.19 fastjsonschema NA fqdn NA fsspec 2023.12.2 gmpy2 2.1.2 h5py 3.10.0 idna 3.6 importlib_metadata NA importlib_resources NA ipykernel 6.29.0 isoduration NA jedi 0.19.1 jinja2 3.1.2 json5 NA jsonpointer 2.4 jsonschema 4.21.1 jsonschema_specifications NA jupyter_events 0.9.0 jupyter_server 2.12.5 jupyterlab_server 2.25.2 kiwisolver 1.4.5 locket NA markupsafe 2.1.3 matplotlib_inline 0.1.6 mpl_toolkits NA mpmath 1.3.0 msgpack 1.0.7 natsort 8.4.0 nbformat 5.9.2 numcodecs 0.12.1 numpy 1.26.3 overrides NA packaging 23.2 parso 0.8.3 pexpect 4.9.0 platformdirs 4.1.0 prometheus_client NA prompt_toolkit 3.0.43 psutil 5.9.8 ptyprocess 0.7.0 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.17.2 pyparsing 3.1.1 pythonjsonlogger NA pytz 2023.3.post1 referencing NA requests 2.31.0 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rpds NA scipy 1.12.0 send2trash NA six 1.16.0 sniffio 1.3.0 sortedcontainers 2.4.0 stack_data 0.6.3 sympy 1.12 tblib 3.0.0 tlz 0.12.1 toolz 0.12.1 torch 2.1.2 torchgen NA tornado 6.4 tqdm 4.66.1 traitlets 5.14.1 typing_extensions NA uri_template NA urllib3 1.26.18 wcwidth 0.2.13 webcolors 1.13 websocket 1.7.0 yaml 6.0.1 zarr 2.16.1 zict 3.0.0 zipp NA zmq 25.1.2 zoneinfo NA
----- IPython 8.18.1 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyterlab 4.0.11 notebook 7.0.7 ----- Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0] Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.17 ----- Session information updated at 2024-01-25 09:49