Metadata-Version: 2.4
Name: turing_segment
Version: 0.3.7
Summary: BioTuring cell segmentation inference framework
Author: BioTuring
License: **Proprietary Software License Agreement**
        
        TERMS AND CONDITIONS FOR USE
        
        1. Definitions.
        
            “BioTuring” shall mean BioTuring Inc. and its affiliates.
        
            "License" shall mean the terms and conditions for use, reproduction, and
            distribution as defined in this document.
        
            "Licensor" shall mean “BioTuring” or the entity granting the License.
        
            "Licensee" shall mean the individual or entity that receives the Software.
        
            "Software" shall mean the proprietary software provided under this License.
        
            "Derivative Works" shall mean any modifications, enhancements, or
            adaptations of the Software.
        
            "Object Code" shall mean the compiled, binary, executable version of the
            Software.
        
        2. Grant of License.
        
            Subject to the terms and conditions of this License, Licensor grants
            Licensee a non-exclusive, non-transferable, non-sublicensable, and revocable
            license to use the Software solely for internal/academic and non-profit
            purposes. This license does not include any rights to distribute, modify, or
            create Derivative Works of the Software.
        
            The Software may incorporate third party software which may be subject to
            additional terms and conditions. Any use, or reproduction, of the Software
            shall comply with any additional terms and conditions applicable to such
            third party software.
        
        3. Restrictions. Licensee shall not:
        
             a. Distribute, sublicense, or otherwise provide access to the Software to
             any third party.
        
             b. Modify, reverse-engineer, decompile, or disassemble the Software.
        
             c. Use the Software for any commercial purpose direct or indirect,
             including for any research and/or development purpose by a for-profit
             entity or on behalf of a for-profit entity, without a separate written
             agreement with Licensor. Please direct any inquiries concerning uses under
             this Section to [support@bioturing.com](mailto:support@bioturing.com).
        
             d. Create any Derivative Works without explicit written permission from
             Licensor.
        
        4. The copyright of the original Cellpose 2.0 software as described in
        [https://github.com/MouseLand/cellpose/blob/master/LICENSE](https://github.com/MouseLand/cellpose/blob/master/LICENSE).
        
        5. Any publication of results obtained with the Software shall acknowledge it’s
        use by an appropriate citation attribution to BioTuring and any other Licensor.
        
        6. Confidentiality.
        
            The Software contains confidential and proprietary information of the
            Licensor. Licensee agrees not to disclose, share, or otherwise make
            available the Software or any related documentation to any third party
            without the express written consent of Licensor.
        
        7. Attribution and Marks.
        
            This License does not grant permission to use the names, trade names,
            trademarks, service marks, or product names of any Licensor, except as required
            by Section 5 and for reasonable and customary use in describing the origin of
            the Software reproducing the content of the NOTICE file.
        
        8. Termination.
        
            Licensor reserves the right to terminate this License immediately if
            Licensee fails to comply with any of the terms and conditions set forth in
            this License. Upon termination, Licensee shall cease all use of the
            Software and destroy all copies in their possession.
        
        9. You are not required to accept this License, since you have not signed it.
        However, nothing else grants you permission to use, or reproduce the Software.
        These actions are prohibited by law if you do not accept this License.
        Therefore, by using, or reproducing, the Software, you indicate your acceptance
        of this License to do so, and all its terms and conditions.
        
        10. Disclaimer of Warranty.
        
             THE SOFTWARE IS PROVIDED "AS IS," WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
             IMPLIED. LICENSOR DISCLAIMS ALL WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
             THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
             PURPOSE.
        
        11. Limitation of Liability
        
             IN NO EVENT SHALL LICENSOR BE LIABLE FOR ANY DAMAGES, INCLUDING BUT NOT
             LIMITED TO DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, ARISING
             OUT OF OR IN CONNECTION WITH THE USE OR INABILITY TO USE THE SOFTWARE.
        
        Please direct any inquiries concerning uses under this Section to
        [support@bioturing.com](mailto:support@bioturing.com).
        
Description-Content-Type: text/markdown
License-File: licenses/LICENSE
License-File: licenses/NOTICE
License-File: licenses/cellpose
License-File: licenses/numba
License-File: licenses/numpy
Requires-Dist: torch>=2
Requires-Dist: onnx
Requires-Dist: onnxruntime-gpu
Requires-Dist: tifffile
Requires-Dist: numpy<2
Requires-Dist: scipy
Requires-Dist: zarr
Requires-Dist: tqdm
Requires-Dist: pandas
Requires-Dist: geopandas>=1
Requires-Dist: opencv-python
Requires-Dist: numba
Requires-Dist: pyarrow
Requires-Dist: imagecodecs
Requires-Dist: requests
Requires-Dist: pyyaml
Requires-Dist: nvidia-ml-py
Requires-Dist: joblib
Requires-Dist: scikit-image
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: build; extra == "dev"
Dynamic: license-file

# Turing Segment: High-performance Cellpose Algorithm

***Turing Segment*** is a high-performance package for cell segmentation based on the popular [Cellpose](https://github.com/MouseLand/cellpose) algorithm. It is designed to provide lightning-fast performance while maintaining the accuracy and robustness of the original Cellpose model.

Explore our [blog post](https://bioturing.com/blog/turing-segment-a-high-performance-cellpose-algorithm-for-cell-segmentation/) for in-depth details.

## Features
- Built on top of the Cellpose framework, leveraging its proven segmentation capabilities.
- Our new post-processing algorithm is significantly faster than the original Cellpose algorithm, reducing computational overhead and enabling faster processing times.
- Support Tiled processing that is optimized for handling large images. This allows for segmentation of high-resolution or whole-slide images without running into memory constraints.
- Turing Segment is highly parallelized, leveraging both CPU and GPU resources to achieve accelerated processing speeds.
- The package is designed to be easy to use. It provides a simple CLI for quick integration into your image analysis workflows.

## Requirements
- NVIDIA GPU (recommend 40xx series, but it should work on lower series as well)
- CUDA Runtime 11.7 or later
- Python 3.8 or later
- PyTorch 2.0 or later
- geopandas 1.0 or later

## Installation

We recommend using conda to install Turing Segment. Run the following command to create a new conda environment:

```bash
conda create -n turing_segment python=3.10
conda activate turing_segment
```

Make sure your PyTorch version is compatible with the CUDA Runtime version and select the correct PyTorch version from the [PyTorch website](https://pytorch.org/get-started/locally/). For example, if you have CUDA 12.1, use the following command to install PyTorch:

```bash
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia
```

Install Turing Segment using `pip`:

```bash
pip install -U turing_segment
```

## Usage

We provide three types of commands: 
- `infer`: segment the image
- `poly2mask`: convert polygons to zarr labeled mask
- `stitch`: Stitch 2D cell segmentation results into 3D cells

To see the full list of available options, use the `--help` flag.

```bash
turing_segment --help
```
---
### Inference

**1. Single image segmentation**
```bash
tuirng_segment infer --image-path /path/to/image --output-dir /path/to/output_dir
```    
By default, the model used is `cyto3`, if you want to use HE model, you must provide the `--model-type he2` argument or `--model-type he` for old HE version.

```bash
tuirng_segment infer --image-path /path/to/image --model-type he2 --output-dir /path/to/output_dir
```   

Output format:

```
/path/to/output_dir/
├── metadata.json
└── polygons.parquet
```

- `metadata.json` contains the information of the image, including the `image-shape`, `scale`, `model-type`, ... of the image. 
- `polygons.parquet` contains the polygons of the segmented cells in Shapely geometry format.

If the `--output-dir` is not provided, the output dir will be created with the name `{image_name}_{page}_{channels}_{model_type}`.

---
**2. Segmentation for multiple images**

To segment multiple images, replace the `--image-path` with `--image-dir` to specify the folder containing multiple images.

```bash
tuirng_segment infer --image-dir /path/to/image_dir --output-dir /path/to/output_dir
```
If the `--output-dir` is not provided, the output dir will be created with the name `{image_dir}_results`.

Output format:
```
/path/to/output_dir/
├── configs.yml
├── image_name1/
│   ├── metadata.json
│   └── polygons.parquet
├── image_name2/
│   ├── metadata.json
│   └── polygons.parquet
└── ...
```

`configs.yml` will be automatically generated to store the postprocess config for each image. We only support for storing the config for 3D stitching. If you want to run 3D stitching, you must modify `z_index` for each image name in the config file.

Example of `configs.yml`:
```yaml
stitch:
- image_name: image_name1
  z_index: null
- image_name: image_name2
  z_index: null
```

---
**3. Specify the `channels`**

<!-- To specify channels to segment, use the `--channels` flag. The channels are specified as a comma-separated list of channel indices where the first channel is for membrane and the second channel is for nucleus. If the nucleus channel is not specified, a zero channel is used. -->
To specify channels to segment, use the `--channels` flag. The channels are specified as a comma-separated list of channel indices. You must past the channel indices in the order of the model input. 
    
  - Fluorescence image: The first channel is for membrane and the second channel is for nucleus. If the nucleus channel is not specified, a zero channel is used.
    ```bash
    tuirng_segment infer --image-path /path/to/image --model-type cyto3 --channels 0,1
    ```
    
  - Hematoxylin and eosin (H&E) image: The first channel is for red, the second channel is for green, and the third channel is for blue (RGB).
    ```bash
    tuirng_segment infer --image-path /path/to/image --model-type he2 --channels 0,1,2
    ```
If the image has the channel in the last dimension, use the `--channel-last` flag.  

---
**4. Specify the `image-type`**

You can specify the image type explicitly using the `--image-type` flag. If the flag is not specified, the image type is inferred from the input file. Currently, `tiff`, `zarr` and `cv2` are supported image types.

```bash
turing_segment infer --image-path /path/to/image --image-type tiff
```

---
**5. Specify the `config-file`**

The tool also supports modifying some parameters of the segmentation process and post-processing. This can be done by using `--config-file` to specify a YAML configuration file containing the parameters:

```bash
turing_segment infer --image-path /path/to/image --config-file /path/to/config.yaml
```

A sample config file can be found from `configs/config.yaml`, the parameters from the file are also the default parameters for the segmentation process and post-processing:

```yaml
pipeline:
  infer_size: 1024              # Tile size (in pixels) to feed the model. The size is of the scaled tiles, not the original tiles from dividing the input image.
  overlap_margin: 32            # Margin (in pixels) to overlap between tiles during inference to reduce edge artifacts. Also of the scaled tiles.
  n_postprocess_processes: 32   # Number of parallel processes used for post-processing
  postprocess_queue_size: 128   # Size of the queue to store model output before post-processing
  n_merge_tile_processes: 16    # Number of parallel processes used to merge tiles into the final output

postprocess:
  niter: 200                    # Number of iterations for following the flow
  cellprob_threshold: 0         # Threshold for the probability of a pixel being part of a cell (in logit; 0 corresponds to 0.5 probability)
  flow_threshold: 0.4           # Threshold for the flow magnitude to filter out low-confidence regions
  min_size: 15                  # Minimum size (in pixels) for objects to be considered as cells
  resample: false               # Whether to resample the output to match the original image size when postprocessing (false to keep the inferred size). When scale > 1, set to true will improve postprocessing performance
```

---
**6. Checkpoint download** 

By default, the model checkpoints are downloaded if they are not present in the cache directory. If you want to use a custom checkpoint from cellpose training pipeline, you can specify the paths to the model and size model checkpoints using the `--model-path` and `--size-model-path` flags, respectively. You may skip specifying `--size-model-path` if a provided scale is specified by ``--scale``.

```bash
turing_segment infer \ 
--image-path /path/to/image \
--model-type <cyto_or_he> \
--model-path /path/to/model_checkpoint \
--size-model-path /path/to/size_model_checkpoint
```

---
### Poly2mask

We provide a command to convert the polygons to zarr labeled mask. The default behavior should be getting shape from `metadata.json` in the same folder with the polygons file. If the `metadata.json` is not found, we will default the mask shape as max of the polygons coordinates. If your polygons not generated from our segmentation, please make sure the input `polygon_dir` contains the `polygons.parquet`

```bash
turing_segment poly2mask /path/to/polygons --output-dir /path/to/output_dir
```

If the `--output-dir` is not provided, the output will be created in the same folder with the polygons file with the name `mask.zarr`.

---
### 3D Stitch

We provide a command to stitch 2D segmentation results to 3D segmentation results. With each polygon in the $i'th$ image will be matched with the polygon in the $(i+1)'th$ image with the highest IOU (intersection over union). If the highest iou is less than the threshold, the polygon will be considered as a new object. If `--iou-threshold` is not provided, we will default the iou threshold as 0.5.

```bash
turing_segment stitch /path/to/segmentation_results --output-dir /path/to/output_dir --iou-threshold 0.5 --num-process 8
```
The `/path/to/segmentation_results` is the folder contains all the folder of the segmentations. Each folder must contain the `polygons.parquet` and `metadata.json` files. The `metadata.json` file must contain the `z_index` field to indicate the z-index of the image.
 
If `--output-dir` is not provided, the output will be saved in the same folder of the segmentation results with the name `3d_polygons.parquet`.

Default the stitching algorithm will run in 8 processes. For disable multi-process (only run in main process), set `--num-process` to 0.

**For example of each command, please refer to the [EXAMPLE.md](./example/EXAMPLE.md) file.**

## Benchmark

1. Processing Time:
   - Turing Segment significantly outperforms the original Cellpose, especially for larger images.
   - For a 40,000 x 40,000 pixel image, Turing Segment is 294 times faster, reducing processing time from hours to less than 1 minute.

   ![Processing Time](static/performance.png)
   ![Processing Time Ratio](static/performance_ratio.png)

2. Memory Consumption:
   - Turing Segment uses considerably less memory than the original Cellpose.
   - For a 40,000 x 40,000 pixel image, Turing Segment consumes 23 times less memory.

   ![Memory Consumption](static/memory_usage.png)
   ![Memory Consumption Ratio](static/memory_ratio.png)

3. Accuracy:
   - Turing Segment maintains comparable accuracy to the original Cellpose algorithm.

   ![Accuracy](static/accuracy.png)

These improvements allow Turing Segment to process larger images more efficiently while maintaining accuracy.

## Feedback

If you encounter any issues or bugs while using Turing Segment, please let us know by submitting an issue on this GitHub repository.

For other feedback or support, you can reach out to our dedicated support team at [support@bioturing.com](mailto:support@bioturing.com).


