Metadata-Version: 2.1
Name: torchseg
Version: 0.0.1a1
Summary: TorchSeg: Semantic Segmentation models for PyTorch
Author-email: Isaac Corley <isaac.corley@utsa.edu>
Maintainer-email: Isaac Corley <isaac.corley@utsa.edu>
License: MIT License
        
        Copyright (c) 2024 Isaac Corley
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Keywords: pytorch,deep learning,machine learning,computer vision,semantic segmentation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: timm>=0.4.12
Requires-Dist: torch>=1.13
Requires-Dist: torchvision>=0.14
Requires-Dist: einops>=0.3
Requires-Dist: pretrainedmodels==0.7.4
Requires-Dist: efficientnet-pytorch==0.7.1
Provides-Extra: style
Requires-Dist: black[jupyter]>=21.8; extra == "style"
Requires-Dist: flake8>=3.8; extra == "style"
Requires-Dist: isort[colors]>=5.8; extra == "style"
Requires-Dist: pyupgrade>=2.8; extra == "style"
Provides-Extra: tests
Requires-Dist: pytest>=7.3; extra == "tests"
Requires-Dist: pytest-cov>=4; extra == "tests"
Provides-Extra: all
Requires-Dist: torchseg[style,tests]; extra == "all"

## TorchSeg

TorchSeg is an actively maintained and up-to-date fork of the Segmentation Models PyTorch (torchseg).

#### Features
The main features of this library are:

 - High level API (just two lines to create a neural network)
 - 9 models architectures for binary and multi class segmentation (including legendary Unet)
 - 124 available encoders (and 500+ encoders from [timm](https://github.com/rwightman/pytorch-image-models))
 - All encoders have pre-trained weights for faster and better convergence
 - Popular losses for training routines

#### Example Usage

Segmentation model is just a PyTorch nn.Module, which can be created as easy as:

```python
import torchseg

model = torchseg.Unet(
    encoder_name="resnet34",        # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
    encoder_weights="imagenet",     # use `imagenet` pre-trained weights for encoder initialization
    in_channels=1,                  # model input channels (1 for gray-scale images, 3 for RGB, etc.)
    classes=3,                      # model output channels (number of classes in your dataset)
)
```

   - see [table](#architectures) with available model architectures
   - see [table](#encoders) with available encoders and their corresponding weights


### Models

#### Architectures

   - Unet [[paper](https://arxiv.org/abs/1505.04597)]
   - Unet++ [[paper](https://arxiv.org/pdf/1807.10165.pdf)]
   - MAnet [[paper](https://ieeexplore.ieee.org/abstract/document/9201310)]
   - Linknet [[paper](https://arxiv.org/abs/1707.03718)]
   - FPN [[paper](http://presentations.cocodataset.org/COCO17-Stuff-FAIR.pdf)]
   - PSPNet [[paper](https://arxiv.org/abs/1612.01105)]
   - PAN [[paper](https://arxiv.org/abs/1805.10180)]
   - DeepLabV3 [[paper](https://arxiv.org/abs/1706.05587)]
   - DeepLabV3+ [[paper](https://arxiv.org/abs/1802.02611)]

#### Encoders

The following is a list of supported encoders in TorchSeg. Select the appropriate family of encoders and click to expand the table and select a specific encoder and its pre-trained weights (`encoder_name` and `encoder_weights` parameters).

<details>
<summary style="margin-left: 25px;">ResNet</summary>
<div style="margin-left: 25px;">

| Encoder   |        Weights        | Params, M |
| --------- | :-------------------: | :-------: |
| resnet18  | imagenet / ssl / swsl |    11M    |
| resnet34  |       imagenet        |    21M    |
| resnet50  | imagenet / ssl / swsl |    23M    |
| resnet101 |       imagenet        |    42M    |
| resnet152 |       imagenet        |    58M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">ResNeXt</summary>
<div style="margin-left: 25px;">

| Encoder           |              Weights              | Params, M |
| ----------------- | :-------------------------------: | :-------: |
| resnext50_32x4d   |       imagenet / ssl / swsl       |    22M    |
| resnext101_32x4d  |            ssl / swsl             |    42M    |
| resnext101_32x8d  | imagenet / instagram / ssl / swsl |    86M    |
| resnext101_32x16d |      instagram / ssl / swsl       |   191M    |
| resnext101_32x32d |             instagram             |   466M    |
| resnext101_32x48d |             instagram             |   826M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">ResNeSt</summary>
<div style="margin-left: 25px;">

| Encoder                 | Weights  | Params, M |
| ----------------------- | :------: | :-------: |
| timm-resnest14d         | imagenet |    8M     |
| timm-resnest26d         | imagenet |    15M    |
| timm-resnest50d         | imagenet |    25M    |
| timm-resnest101e        | imagenet |    46M    |
| timm-resnest200e        | imagenet |    68M    |
| timm-resnest269e        | imagenet |   108M    |
| timm-resnest50d_4s2x40d | imagenet |    28M    |
| timm-resnest50d_1s4x24d | imagenet |    23M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">Res2Ne(X)t</summary>
<div style="margin-left: 25px;">

| Encoder                | Weights  | Params, M |
| ---------------------- | :------: | :-------: |
| timm-res2net50_26w_4s  | imagenet |    23M    |
| timm-res2net101_26w_4s | imagenet |    43M    |
| timm-res2net50_26w_6s  | imagenet |    35M    |
| timm-res2net50_26w_8s  | imagenet |    46M    |
| timm-res2net50_48w_2s  | imagenet |    23M    |
| timm-res2net50_14w_8s  | imagenet |    23M    |
| timm-res2next50        | imagenet |    22M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">RegNet(x/y)</summary>
<div style="margin-left: 25px;">

| Encoder          | Weights  | Params, M |
| ---------------- | :------: | :-------: |
| timm-regnetx_002 | imagenet |    2M     |
| timm-regnetx_004 | imagenet |    4M     |
| timm-regnetx_006 | imagenet |    5M     |
| timm-regnetx_008 | imagenet |    6M     |
| timm-regnetx_016 | imagenet |    8M     |
| timm-regnetx_032 | imagenet |    14M    |
| timm-regnetx_040 | imagenet |    20M    |
| timm-regnetx_064 | imagenet |    24M    |
| timm-regnetx_080 | imagenet |    37M    |
| timm-regnetx_120 | imagenet |    43M    |
| timm-regnetx_160 | imagenet |    52M    |
| timm-regnetx_320 | imagenet |   105M    |
| timm-regnety_002 | imagenet |    2M     |
| timm-regnety_004 | imagenet |    3M     |
| timm-regnety_006 | imagenet |    5M     |
| timm-regnety_008 | imagenet |    5M     |
| timm-regnety_016 | imagenet |    10M    |
| timm-regnety_032 | imagenet |    17M    |
| timm-regnety_040 | imagenet |    19M    |
| timm-regnety_064 | imagenet |    29M    |
| timm-regnety_080 | imagenet |    37M    |
| timm-regnety_120 | imagenet |    49M    |
| timm-regnety_160 | imagenet |    80M    |
| timm-regnety_320 | imagenet |   141M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">GERNet</summary>
<div style="margin-left: 25px;">

| Encoder       | Weights  | Params, M |
| ------------- | :------: | :-------: |
| timm-gernet_s | imagenet |    6M     |
| timm-gernet_m | imagenet |    18M    |
| timm-gernet_l | imagenet |    28M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">SE-Net</summary>
<div style="margin-left: 25px;">

| Encoder             | Weights  | Params, M |
| ------------------- | :------: | :-------: |
| senet154            | imagenet |   113M    |
| se_resnet50         | imagenet |    26M    |
| se_resnet101        | imagenet |    47M    |
| se_resnet152        | imagenet |    64M    |
| se_resnext50_32x4d  | imagenet |    25M    |
| se_resnext101_32x4d | imagenet |    46M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">SK-ResNe(X)t</summary>
<div style="margin-left: 25px;">

| Encoder                | Weights  | Params, M |
| ---------------------- | :------: | :-------: |
| timm-skresnet18        | imagenet |    11M    |
| timm-skresnet34        | imagenet |    21M    |
| timm-skresnext50_32x4d | imagenet |    25M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">DenseNet</summary>
<div style="margin-left: 25px;">

| Encoder     | Weights  | Params, M |
| ----------- | :------: | :-------: |
| densenet121 | imagenet |    6M     |
| densenet169 | imagenet |    12M    |
| densenet201 | imagenet |    18M    |
| densenet161 | imagenet |    26M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">Inception</summary>
<div style="margin-left: 25px;">

| Encoder           |             Weights             | Params, M |
| ----------------- | :-----------------------------: | :-------: |
| inceptionresnetv2 | imagenet /  imagenet+background |    54M    |
| inceptionv4       | imagenet /  imagenet+background |    41M    |
| xception          |            imagenet             |    22M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">EfficientNet</summary>
<div style="margin-left: 25px;">

| Encoder                 |              Weights               | Params, M |
| ----------------------- | :--------------------------------: | :-------: |
| efficientnet-b0         |              imagenet              |    4M     |
| efficientnet-b1         |              imagenet              |    6M     |
| efficientnet-b2         |              imagenet              |    7M     |
| efficientnet-b3         |              imagenet              |    10M    |
| efficientnet-b4         |              imagenet              |    17M    |
| efficientnet-b5         |              imagenet              |    28M    |
| efficientnet-b6         |              imagenet              |    40M    |
| efficientnet-b7         |              imagenet              |    63M    |
| timm-efficientnet-b0    | imagenet / advprop / noisy-student |    4M     |
| timm-efficientnet-b1    | imagenet / advprop / noisy-student |    6M     |
| timm-efficientnet-b2    | imagenet / advprop / noisy-student |    7M     |
| timm-efficientnet-b3    | imagenet / advprop / noisy-student |    10M    |
| timm-efficientnet-b4    | imagenet / advprop / noisy-student |    17M    |
| timm-efficientnet-b5    | imagenet / advprop / noisy-student |    28M    |
| timm-efficientnet-b6    | imagenet / advprop / noisy-student |    40M    |
| timm-efficientnet-b7    | imagenet / advprop / noisy-student |    63M    |
| timm-efficientnet-b8    |         imagenet / advprop         |    84M    |
| timm-efficientnet-l2    |           noisy-student            |   474M    |
| timm-efficientnet-lite0 |              imagenet              |    4M     |
| timm-efficientnet-lite1 |              imagenet              |    5M     |
| timm-efficientnet-lite2 |              imagenet              |    6M     |
| timm-efficientnet-lite3 |              imagenet              |    8M     |
| timm-efficientnet-lite4 |              imagenet              |    13M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">MobileNet</summary>
<div style="margin-left: 25px;">

| Encoder                            | Weights  | Params, M |
| ---------------------------------- | :------: | :-------: |
| mobilenet_v2                       | imagenet |    2M     |
| timm-mobilenetv3_large_075         | imagenet |   1.78M   |
| timm-mobilenetv3_large_100         | imagenet |   2.97M   |
| timm-mobilenetv3_large_minimal_100 | imagenet |   1.41M   |
| timm-mobilenetv3_small_075         | imagenet |   0.57M   |
| timm-mobilenetv3_small_100         | imagenet |   0.93M   |
| timm-mobilenetv3_small_minimal_100 | imagenet |   0.43M   |

</div>
</details>

<details>
<summary style="margin-left: 25px;">DPN</summary>
<div style="margin-left: 25px;">

| Encoder |   Weights   | Params, M |
| ------- | :---------: | :-------: |
| dpn68   |  imagenet   |    11M    |
| dpn68b  | imagenet+5k |    11M    |
| dpn92   | imagenet+5k |    34M    |
| dpn98   |  imagenet   |    58M    |
| dpn107  | imagenet+5k |    84M    |
| dpn131  |  imagenet   |    76M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">VGG</summary>
<div style="margin-left: 25px;">

| Encoder  | Weights  | Params, M |
| -------- | :------: | :-------: |
| vgg11    | imagenet |    9M     |
| vgg11_bn | imagenet |    9M     |
| vgg13    | imagenet |    9M     |
| vgg13_bn | imagenet |    9M     |
| vgg16    | imagenet |    14M    |
| vgg16_bn | imagenet |    14M    |
| vgg19    | imagenet |    20M    |
| vgg19_bn | imagenet |    20M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">Mix Vision Transformer</summary>
<div style="margin-left: 25px;">

Backbone from SegFormer pretrained on Imagenet! Can be used with other decoders from package, you can combine Mix Vision Transformer with Unet, FPN and others!

Limitations:  

   - encoder is **not** supported by Linknet, Unet++
   - encoder is supported by FPN only for encoder **depth = 5**

| Encoder | Weights  | Params, M |
| ------- | :------: | :-------: |
| mit_b0  | imagenet |    3M     |
| mit_b1  | imagenet |    13M    |
| mit_b2  | imagenet |    24M    |
| mit_b3  | imagenet |    44M    |
| mit_b4  | imagenet |    60M    |
| mit_b5  | imagenet |    81M    |

</div>
</details>

<details>
<summary style="margin-left: 25px;">MobileOne</summary>
<div style="margin-left: 25px;">

Apple's "sub-one-ms" Backbone pretrained on Imagenet! Can be used with all decoders.

Note: In the official github repo the s0 variant has additional num_conv_branches, leading to more params than s1.

| Encoder      | Weights  | Params, M |
| ------------ | :------: | :-------: |
| mobileone_s0 | imagenet |   4.6M    |
| mobileone_s1 | imagenet |   4.0M    |
| mobileone_s2 | imagenet |   6.5M    |
| mobileone_s3 | imagenet |   8.8M    |
| mobileone_s4 | imagenet |   13.6M   |

</div>
</details>


\* `ssl`, `swsl` - semi-supervised and weakly-supervised learning on ImageNet ([repo](https://github.com/facebookresearch/semi-supervised-ImageNet1K-models)).

#### Timm Encoders

Pytorch Image Models (a.k.a. timm) has a lot of pretrained models and interface which allows using these models as encoders in torchseg, however, not all models are supported

   - not all transformer models have ``features_only`` functionality implemented that is required for encoder
   - some models have inappropriate strides

Total number of supported encoders: 549

### Models API

   - `model.encoder` - pretrained backbone to extract features of different spatial resolution
   - `model.decoder` - depends on models architecture (`Unet`/`Linknet`/`PSPNet`/`FPN`)
   - `model.segmentation_head` - last block to produce required number of mask channels (include also optional upsampling and activation)
   - `model.classification_head` - optional block which create classification head on top of encoder
   - `model.forward(x)` - sequentially pass `x` through model\`s encoder, decoder and segmentation head (and classification head if specified)

##### Input channels
Input channels parameter allows you to create models, which process tensors with arbitrary number of channels.
If you use pretrained weights from imagenet - weights of first convolution will be reused. For
1-channel case it would be a sum of weights of first convolution layer, otherwise channels would be 
populated with weights like `new_weight[:, i] = pretrained_weight[:, i % 3]` and than scaled with `new_weight * 3 / new_in_channels`.

```python
model = torchseg.FPN('resnet34', in_channels=1)
mask = model(torch.ones([1, 1, 64, 64]))
```

##### Auxiliary classification output

All models support `aux_params` parameters, which is default set to `None`.
If `aux_params = None` then classification auxiliary output is not created, else
model produce not only `mask`, but also `label` output with shape `NC`.
Classification head consists of GlobalPooling->Dropout(optional)->Linear->Activation(optional) layers, which can be
configured by `aux_params` as follows:

```python
aux_params=dict(
    pooling='avg',             # one of 'avg', 'max'
    dropout=0.5,               # dropout ratio, default is None
    activation='sigmoid',      # activation function, default is None
    classes=4,                 # define number of output labels
)
model = torchseg.Unet('resnet34', classes=4, aux_params=aux_params)
mask, label = model(x)
```

##### Depth

Depth parameter specify a number of downsampling operations in encoder, so you can make
your model lighter if specify smaller `depth`.

```python
model = torchseg.Unet('resnet34', encoder_depth=4)
```
