This is output of UNet with EfficientNetB0 as encoder (Original image — link)

Vanilla UNet vs UNet with EfficientNetB0 as Encoder

Sahil Chachra

--

In this blog, I will discuss the performance of Vanilla(basic/simple) UNet architecture vs UNet with EfficientNetB0 as Encoder.

Recently, I had participated in a Kaggle competition and for that competition, a maximum number of submissions were made using EfficientNet architectures. I was surprised to see that almost any submission I viewed had an implementation of EfficientNet models. Then, later I came across EfficientDet, an object detection architecture having EfficientNet architecture in the backbone! So I thought, why not use this as a backbone architecture in UNet and try it out?

To know about EfficientNet, you can refer to this link and for EfficientDet, this link.

To take a look at the UNet with EfficientNetB0 as encoder architecture, click here.

Vanilla UNet vs UNet with EfficientNetB0 as encoder

So, coming to the experiment I performed — I trained two models, one using simple UNet architecture and the other one having EfficientNetB0 as an encoder in UNet architecture. The dataset I used was Person Segmentation, from Kaggle.

Both the models were trained for 20 epochs on Google Colab. Now, why 20 epochs? The answer is I was performing Transfer learning on EffficientNetB0-UNet, so it achieved quite good values (in specific performance metrics) at 20 epochs. Since this blog deals with a comparison of two architectures, both were supposed to be trained on the same number of epochs. Results are as follows.

The Loss Values (Training Loss and Validation Loss)

Left — Vanilla UNet Loss and Validation loss & Right — UNetEfficientNetB0 Loss and Validation loss

In the left graph(Vanilla UNet), we can see a smooth line for training loss whereas an unstable line for validation loss. In the right graph (UNet_EfficienetNetB0), we can see two smooth lines (almost) for both training loss and validation loss. The vanilla UNet model shows lower loss during training but higher loss value during validation whereas UNet_EfficientNetB0 model has almost same value loss for both training and validation.

Recall values

Left — Comparison of Recall values during training & Right — Comparison of Recall values during validation.

Precision Values

Left — Comparison of Precision values during training & Right — Comparison of Precision values during validation.

For recall and precision graphs, we can see that Vanilla UNet performs better during training but on the validation set, UNet_EfficienetNetB0 performs better.

Inference

After training both the models on Google Colab, I tested them on my local machine. The results are as follows :

Original Images:-

Left Image Source — link & Right image source — link

Output:-

Left — UNet with EfficientNetB0 as encoder output & Right — Vanilla UNet output
Left — UNet with EfficientNetB0 as encoder output & Right — Vanilla UNet output

From the above inference, we can see that UNet with EfficientNetB0 as encoder gives better output compared to Vanilla UNet. I think playing around with the encoder part of UNet, as in say, using EfficientNetB1 or B2 may result in an even better model.

In my next blog, I will compare the performance of EfficientNetB0 to B7 as encoder in UNet.

Connect with me on LinkedIn :-D. Thanks for reading my blog!

--

--

Sahil Chachra
Sahil Chachra

Written by Sahil Chachra

AI Engineer @ SparkCognition| Applied Deep Learning & Computer Vision | Nvidia Jetson AI Specialist

Responses (2)