Metadata-Version: 2.1
Name: glasses
Version: 0.0.6
Summary: Compact, concise and customizable deep learning computer vision
Home-page: https://github.com/FrancescoSaverioZuppichini/glasses
Author: Francesco Saverio Zuppichini & Francesco Cicala
Author-email: francesco.zuppichini@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown

```python
%load_ext autoreload
%autoreload 2
```

# Glasses 😎

![alt](https://github.com/FrancescoSaverioZuppichini/glasses/blob/develop/docs/_static/images/background.png?raw=true)

[![codecov](https://codecov.io/gh/FrancescoSaverioZuppichini/glasses/branch/develop/graph/badge.svg)](https://codecov.io/gh/FrancescoSaverioZuppichini/glasses)

Compact, concise and customizable 
deep learning computer vision library

**So far I have the [following](#pretrained-models) pretrainde weights. I am working on porting more. They are hosted on GitHub if < 100MB and on AWS (thaks to Francis Ukpeh) if > 100MB.***

Doc is [here](https://francescosaveriozuppichini.github.io/glasses/index.html)

## TL;TR

This library has

- human readable code, no *research code*
- common component are shared across [models](#Models)
- [same APIs](#classification) for all models (you learn them once and they are always the same)
- clear and easy to use model constomization (see [here](#block))
- [classification](#classification) and [segmentation](#segmentation) 
- emoji in the name ;)

Architectures implemented so far:

- [Training data-efficient image transformers & distillation through attention](https://arxiv.org/pdf/2010.11929.pdf)
- [Vision Transformer -  An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale](https://arxiv.org/pdf/2010.11929.pdf)
- [ResNeSt: Split-Attention Networks](https://arxiv.org/abs/2004.08955) 
- [AlexNet-  ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)
- [DenseNet - Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993)
- [EfficientNet - EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946)
- [EfficientNetLite - Higher accuracy on vision models with EfficientNet-Lite](https://blog.tensorflow.org/2020/03/higher-accuracy-on-vision-models-with-efficientnet-lite.html)
- [FishNet - FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction
](https://arxiv.org/abs/1901.03495)
- [MobileNet - MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/pdf/1801.04381.pdf)
- [RegNet - Designing Network Design Spaces](https://arxiv.org/abs/2003.13678)
- [ResNet - Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
- [ResNetD - Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/pdf/1812.01187.pdf)
- [ResNetXt - Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf)
- [SEResNet - Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks](https://arxiv.org/abs/1803.02579)
- [VGG - Very Deep Convolutional Networks For Large-scale Image Recognition](https://arxiv.org/pdf/1409.1556.pdf)
- [WideResNet - Wide Residual Networks](https://arxiv.org/pdf/1605.07146.pdf)
- [FPN - Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144)
- [PFPN - Panoptic Feature Pyramid Networks](https://arxiv.org/pdf/1901.02446.pdf)
- [UNet - U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597)
- [Squeeze and Excitation - Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks](https://arxiv.org/abs/1803.02579)
- [ECA - ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks](https://arxiv.org/pdf/1910.03151.pdf)

## Installation

You can install `glasses` using pip by running

```
pip install git+https://github.com/FrancescoSaverioZuppichini/glasses
```

### Motivations

Almost all existing implementations of the most famous model are written with very bad coding practices, what today is called *research code*. I struggled myself to understand some of the implementations that in the end were just a few lines of code. 

Most of them are missing a global structure, they used tons of code repetition, they are not easily customizable and not tested. Since I do computer vision for living, so I needed a way to make my life easier.

## Getting started

The API are shared across **all** models!


```python
import torch
from glasses.models import AutoModel, AutoConfig
from torch import nn
# load one model
model = AutoModel.from_pretrained('resnet18')
cfg = AutoConfig.from_name('resnet18')
model.summary(device='cpu' ) # thanks to torchsummary
AutoModel.models() # 'resnet18', 'resnet26', 'resnet26d', 'resnet34', 'resnet50', ...
```

### Interpretability


```python
import requests
from PIL import Image
from io import BytesIO
from glasses.interpretability import GradCam, SaliencyMap
from torchvision.transforms import Normalize
r = requests.get('https://i.insider.com/5df126b679d7570ad2044f3e?width=700&format=jpeg&auto=webp')
im = Image.open(BytesIO(r.content))
# un normalize when done
postprocessing = Normalize(-cfg.mean / cfg.std, (1.0 / cfg.std))
# apply preprocessing
x =  cfg.transform(im).unsqueeze(0)
_ = model.interpret(x, using=GradCam(), postprocessing=postprocessing).show()
```

![alt](https://github.com/FrancescoSaverioZuppichini/glasses/blob/develop/docs/_static/images/grad_cam.png?raw=true)

## Classification


```python
from glasses.models import ResNet
# change activation
ResNet.resnet18(activation = nn.SELU)
# change number of classes
ResNet.resnet18(n_classes=100)
# freeze only the convolution weights
model = ResNet.resnet18(pretrained=True)
model.freeze(who=model.encoder)
# get the last layer, usuful to hook to it if you want to get the embeeded vector
model.encoder.layers[-1]
# what about resnet with inverted residuals?
from glasses.models.classification.efficientnet import InvertedResidualBlock
ResNet.resnet18(block = InvertedResidualBlock)
```

## Segmentation


```python
from functools import partial
from glasses.models.segmentation.unet import UNet, UNetDecoder
# vanilla Unet
unet = UNet()
# let's change the encoder
unet = UNet.from_encoder(partial(AutoModel.from_name, 'efficientnet_b1'))
# mmm I want more layers in the decoder!
unet = UNet(decoder=partial(UNetDecoder, widths=[256, 128, 64, 32, 16]))
# maybe resnet was better
unet = UNet(encoder=lambda **kwargs: ResNet.resnet26(**kwargs).encoder)
# same API
unet.summary(input_shape=(1,224,224))
```

### More examples


```python
# change the decoder part
model = ResNet.resnet18(pretrained=True)
my_head = nn.Sequential(
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(),
    nn.Linear(model.encoder.widths[-1], 512),
    nn.Dropout(0.2),
    nn.ReLU(),
    nn.Linear(512, 1000))

model.head = my_head

x = torch.rand((1,3,224,224))
model(x).shape #torch.Size([1, 1000])
```

## Pretrained Models

**I am currently working on the pretrained models and the best way to make them available**

This is a list of all the pretrained models available so far!. They are all trained on *ImageNet*.

I used a `batch_size=64` and a GTX 1080ti to evaluale the models.

|                        |    top1 |    top5 |     time |   batch_size |
|:-----------------------|--------:|--------:|---------:|-------------:|
| efficientnet_b3        | 0.82034 | 0.9603  | 199.599  |           64 |
| regnety_032            | 0.81958 | 0.95964 | 136.518  |           64 |
| deit_small_patch16_224 | 0.81082 | 0.95316 | 132.868  |           64 |
| resnet50d              | 0.80492 | 0.95128 |  97.5827 |           64 |
| cse_resnet50           | 0.80292 | 0.95048 | 108.765  |           64 |
| efficientnet_b2        | 0.80126 | 0.95124 | 127.177  |           64 |
| resnext101_32x8d       | 0.7921  | 0.94556 | 290.38   |           64 |
| wide_resnet101_2       | 0.7891  | 0.94344 | 277.755  |           64 |
| wide_resnet50_2        | 0.78464 | 0.94064 | 201.634  |           64 |
| efficientnet_b1        | 0.7831  | 0.94096 |  98.7143 |           64 |
| resnet152              | 0.7825  | 0.93982 | 186.191  |           64 |
| regnetx_032            | 0.7792  | 0.93996 | 319.558  |           64 |
| resnext50_32x4d        | 0.77628 | 0.9368  | 114.325  |           64 |
| regnety_016            | 0.77604 | 0.93702 |  96.547  |           64 |
| efficientnet_b0        | 0.77332 | 0.93566 |  67.2147 |           64 |
| resnet101              | 0.77314 | 0.93556 | 134.148  |           64 |
| densenet161            | 0.77146 | 0.93602 | 239.388  |           64 |
| resnet34d              | 0.77118 | 0.93418 |  59.9938 |           64 |
| densenet201            | 0.76932 | 0.9339  | 158.514  |           64 |
| regnetx_016            | 0.76684 | 0.9328  |  91.7536 |           64 |
| resnet26d              | 0.766   | 0.93188 |  70.6453 |           64 |
| regnety_008            | 0.76238 | 0.93026 |  54.1286 |           64 |
| resnet50               | 0.76012 | 0.92934 |  89.7976 |           64 |
| densenet169            | 0.75628 | 0.9281  | 127.077  |           64 |
| resnet26               | 0.75394 | 0.92584 |  65.5801 |           64 |
| resnet34               | 0.75096 | 0.92246 |  56.8985 |           64 |
| regnety_006            | 0.75068 | 0.92474 |  55.5611 |           64 |
| regnetx_008            | 0.74788 | 0.92194 |  57.9559 |           64 |
| densenet121            | 0.74472 | 0.91974 | 104.13   |           64 |
| deit_tiny_patch16_224  | 0.7437  | 0.91898 |  66.662  |           64 |
| vgg19_bn               | 0.74216 | 0.91848 | 169.357  |           64 |
| regnety_004            | 0.73766 | 0.91638 |  68.4893 |           64 |
| regnetx_006            | 0.73682 | 0.91568 |  81.4703 |           64 |
| vgg16_bn               | 0.73476 | 0.91536 | 150.317  |           64 |
| vgg19                  | 0.7236  | 0.9085  | 155.851  |           64 |
| regnetx_004            | 0.72298 | 0.90644 |  58.0049 |           64 |
| vgg16                  | 0.71628 | 0.90368 | 135.398  |           64 |
| vgg13_bn               | 0.71618 | 0.9036  | 129.077  |           64 |
| vgg11_bn               | 0.70408 | 0.89724 |  86.9459 |           64 |
| vgg13                  | 0.69984 | 0.89306 | 116.052  |           64 |
| regnety_002            | 0.6998  | 0.89422 |  46.804  |           64 |
| resnet18               | 0.69644 | 0.88982 |  46.2029 |           64 |
| vgg11                  | 0.68872 | 0.88658 |  79.4136 |           64 |
| regnetx_002            | 0.68658 | 0.88244 |  45.9211 |           64 |

Assuming you want to load `efficientnet_b1`:


```python
from glasses.models import EfficientNet, AutoModel, AutoConfig

# load it using AutoModel
model = AutoModel.from_pretrained('efficientnet_b1')
# or from its own class
model = EfficientNet.efficientnet_b1(pretrained=True)
# you may also need to get the correct transformation that must be applied on the input
cfg = AutoConfig.from_name('efficientnet_b1')
transform = cfg.transform
```

    INFO:root:Loaded efficientnet_b1 pretrained weights.
    INFO:root:Loaded efficientnet_b1 pretrained weights.


In this case, `transform` is 

```
Compose(
    Resize(size=240, interpolation=PIL.Image.BICUBIC)
    CenterCrop(size=(240, 240))
    ToTensor()
    Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
)
```

## Deep Customization

All models are composed by sharable parts:
- `Block`
- `Layer`
- `Encoder`
- `Head`
- `Decoder`

### Block

Each model has its building block, they are noted by `*Block`. In each block, all the weights are in the `.block` field. This makes it very easy to customize one specific model. 


```python
from glasses.models.classification.vgg import VGGBasicBlock
from glasses.models.classification.resnet import ResNetBasicBlock, ResNetBottleneckBlock, ResNetBasicPreActBlock, ResNetBottleneckPreActBlock
from glasses.models.classification.senet import SENetBasicBlock, SENetBottleneckBlock
from glasses.models.classification.resnetxt import ResNetXtBottleNeckBlock
from glasses.models.classification.densenet import DenseBottleNeckBlock
from glasses.models.classification.wide_resnet import WideResNetBottleNeckBlock
from glasses.models.classification.efficientnet import EfficientNetBasicBlock
```

For example, if we want to add Squeeze and Excitation to the resnet bottleneck block, we can just


```python
from glasses.nn.att import SpatialSE
from  glasses.models.classification.resnet import ResNetBottleneckBlock

class SEResNetBottleneckBlock(ResNetBottleneckBlock):
    def __init__(self, in_features: int, out_features: int, squeeze: int = 16, *args, **kwargs):
        super().__init__(in_features, out_features, *args, **kwargs)
        # all the weights are in block, we want to apply se after the weights
        self.block.add_module('se', SpatialSE(out_features, reduction=squeeze))

SEResNetBottleneckBlock(32, 64)
```

Then, we can use the class methods to create the new models following the existing architecture blueprint, for example, to create `se_resnet50`


```python
ResNet.resnet50(block=ResNetBottleneckBlock)
```

The cool thing is each model has the same api, if I want to create a vgg13 with the `ResNetBottleneckBlock` I can just


```python
from glasses.models import VGG
model = VGG.vgg13(block=SEResNetBottleneckBlock)
model.summary()
```

Some specific model can require additional parameter to the block, for example `MobileNetV2` also required a `expansion` parameter so our `SEResNetBottleneckBlock` won't work. 

### Layer

A `Layer` is a collection of blocks, it is used to stack multiple blocks together following some logic. For example, `ResNetLayer`


```python
from glasses.models.classification.resnet import ResNetLayer

ResNetLayer(64, 128, depth=2)
```

### Encoder

The encoder is what encoders a vector, so the convolution layers. It has always two very important parameters.

- widths
- depths


**widths** is the wide at each layer, so how much features there are
**depths** is the depth at each layer, so how many blocks there are

For example, `ResNetEncoder` will creates multiple `ResNetLayer` based on the len of `widths` and `depths`. Let's see some example.


```python
from glasses.models.classification.resnet import ResNetEncoder
# 3 layers, with 32,64,128 features and 1,2,3 block each
ResNetEncoder(
    widths=[32,64,128],
    depths=[1,2,3])

```

All encoders are subclass of `Encoder` that allows us to hook on specific stages to get the featuers. All you have to do is first call `.features` to notify the model you want to receive the features, and then pass an input.


```python
enc = ResNetEncoder()
enc.features
enc(torch.randn((1,3,224,224)))
print([f.shape for f in enc.features])
```

**Remember** each model has always a `.decoder` field


```python
from glasses.models import ResNet

model = ResNet.resnet18()
model.encoder.widths[-1]
```

The encoder knows the number of output features, you can access them by

#### Features

Each encoder can return a list of features accessable by the `.features` field. You need to call it once before in order to notify the encoder we wish to also store the features


```python
from glasses.models.classification.resnet import ResNetEncoder

x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
enc(x)
features = enc.features # now we have all the features from each layer (stage)
[print(f.shape) for f in features]
# torch.Size([1, 64, 112, 112])
# torch.Size([1, 64, 56, 56])
# torch.Size([1, 128, 28, 28])
# torch.Size([1, 256, 14, 14])
```

### Head

Head is the last part of the model, it usually perform the classification


```python
from glasses.models.classification.resnet import ResNetHead


ResNetHead(512, n_classes=1000)
```

### Decoder

The decoder takes the last feature from the `.encoder` and decode it. This is usually done in `segmentation` models, such as Unet.


```python
from glasses.models.segmentation.unet import UNetDecoder
x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
x = enc(x)
features = enc.features
# we need to tell the decoder the first feature size and the size of the lateral features
dec = UNetDecoder(start_features=enc.widths[-1],
                  lateral_widths=enc.features_widths[::-1])
out = dec(x, features[::-1])
out.shape
```

**This object oriented structure allows to reuse most of the code across the models**

### Models

The models so far

😥 = I don't have enough GPU RAM 

| name                   | Parameters   | Size (MB)   |
|:-----------------------|:-------------|:------------|
| resnet18               | 11,689,512   | 44.59       |
| resnet26               | 15,995,176   | 61.02       |
| resnet26d              | 16,014,408   | 61.09       |
| resnet34               | 21,797,672   | 83.15       |
| resnet34d              | 21,816,904   | 83.22       |
| resnet50               | 25,557,032   | 97.49       |
| resnet50d              | 25,576,264   | 97.57       |
| resnet101              | 44,549,160   | 169.94      |
| resnet152              | 60,192,808   | 229.62      |
| resnet200              | 64,673,832   | 246.71      |
| se_resnet18            | 11,776,552   | 44.92       |
| se_resnet34            | 21,954,856   | 83.75       |
| se_resnet50            | 28,071,976   | 107.09      |
| se_resnet101           | 49,292,328   | 188.04      |
| se_resnet152           | 66,770,984   | 254.71      |
| cse_resnet18           | 11,778,592   | 44.93       |
| cse_resnet34           | 21,958,868   | 83.77       |
| cse_resnet50           | 28,088,024   | 107.15      |
| cse_resnet101          | 49,326,872   | 188.17      |
| cse_resnet152          | 66,821,848   | 254.91      |
| resnext50_32x4d        | 25,028,904   | 95.48       |
| resnext101_32x8d       | 88,791,336   | 338.71      |
| resnext101_32x16d      | 194,026,792  | 740.15      |
| resnext101_32x32d      | 468,530,472  | 1787.30     |
| resnext101_32x48d      | 828,411,176  | 3160.14     |
| regnetx_002            | 2,684,792    | 10.24       |
| regnetx_004            | 5,157,512    | 19.67       |
| regnetx_006            | 6,196,040    | 23.64       |
| regnetx_008            | 7,259,656    | 27.69       |
| regnetx_016            | 9,190,136    | 35.06       |
| regnetx_032            | 15,296,552   | 58.35       |
| regnety_002            | 3,162,996    | 12.07       |
| regnety_004            | 4,344,144    | 16.57       |
| regnety_006            | 6,055,160    | 23.10       |
| regnety_008            | 6,263,168    | 23.89       |
| regnety_016            | 11,202,430   | 42.73       |
| regnety_032            | 19,436,338   | 74.14       |
| resnest14d             | 10,611,688   | 40.48       |
| resnest26d             | 17,069,448   | 65.11       |
| resnest50d             | 27,483,240   | 104.84      |
| resnest50d_1s4x24d     | 25,677,000   | 97.95       |
| resnest50d_4s2x40d     | 30,417,592   | 116.03      |
| resnest101e            | 48,275,016   | 184.15      |
| resnest200e            | 70,201,544   | 267.80      |
| resnest269e            | 7,551,112    | 28.81       |
| wide_resnet50_2        | 68,883,240   | 262.77      |
| wide_resnet101_2       | 126,886,696  | 484.03      |
| densenet121            | 7,978,856    | 30.44       |
| densenet169            | 14,149,480   | 53.98       |
| densenet201            | 20,013,928   | 76.35       |
| densenet161            | 28,681,000   | 109.41      |
| fishnet99              | 16,630,312   | 63.44       |
| fishnet150             | 24,960,808   | 95.22       |
| vgg11                  | 132,863,336  | 506.83      |
| vgg13                  | 133,047,848  | 507.54      |
| vgg16                  | 138,357,544  | 527.79      |
| vgg19                  | 143,667,240  | 548.05      |
| vgg11_bn               | 132,868,840  | 506.85      |
| vgg13_bn               | 133,053,736  | 507.56      |
| vgg16_bn               | 138,365,992  | 527.82      |
| vgg19_bn               | 143,678,248  | 548.09      |
| efficientnet_b0        | 5,288,548    | 20.17       |
| efficientnet_b1        | 7,794,184    | 29.73       |
| efficientnet_b2        | 9,109,994    | 34.75       |
| efficientnet_b3        | 12,233,232   | 46.67       |
| efficientnet_b4        | 19,341,616   | 73.78       |
| efficientnet_b5        | 30,389,784   | 115.93      |
| efficientnet_b6        | 43,040,704   | 164.19      |
| efficientnet_b7        | 66,347,960   | 253.10      |
| efficientnet_b8        | 😥           | 😥          |
| efficientnet_l2        | 😥           | 😥          |
| efficientnet_lite0     | 4,652,008    | 17.75       |
| efficientnet_lite1     | 5,416,680    | 20.66       |
| efficientnet_lite2     | 6,092,072    | 23.24       |
| efficientnet_lite3     | 8,197,096    | 31.27       |
| efficientnet_lite4     | 13,006,568   | 49.62       |
| vit_small_patch16_224  | 48,602,344   | 185.40      |
| vit_base_patch16_224   | 86,415,592   | 329.65      |
| vit_base_patch16_384   | 86,415,592   | 329.65      |
| vit_base_patch32_384   | 88,185,064   | 336.40      |
| vit_huge_patch16_224   | 631,823,080  | 2410.21     |
| vit_huge_patch32_384   | 634,772,200  | 2421.46     |
| vit_large_patch16_224  | 304,123,880  | 1160.14     |
| vit_large_patch16_384  | 304,123,880  | 1160.14     |
| vit_large_patch32_384  | 306,483,176  | 1169.14     |
| deit_tiny_patch16_224  | 5,872,400    | 22.40       |
| deit_small_patch16_224 | 22,359,632   | 85.30       |
| deit_base_patch16_224  | 87,184,592   | 332.58      |
| mobilenetv2            | 3,504,872    | 13.37       |
| unet                   | 23,202,530   | 88.51       |
| deit_base_patch16_384  | 87,184,592   | 332.58      |

## Credits

Most of the weights were trained by other people and adapted to glasses. It is worth cite

- [pytorch-image-models](https://github.com/rwightman/pytorch-image-models)
- [torchvision](hhttps://github.com/pytorch/vision)



