Comparison with U-Net with VGG16 encoder
EfficientNet U-Net: 1,992,175
VGG16 U-Net’s Total parameters: 29,652,055
The EfficientNet variant has almost 15x fewer parameters than it’s VGG-16 counterpart, meaning it will undoubtedly have lower performance. However, in some applications where this performance can be foregone — this model might be preferred for the aforementioned reasons.
In other applications where accuracy is vital, other methods such as quantization and mixed accuracy training should be tested with the larger models. If the effect on the accuracy metrics is minimal, these methods provide a way to improve memory usage/inference speeds- without having to change encoder/architecture.
The optical benefit
There are several new look features present in the EfficientNet encoder when compared to a plain CNN, however most of the computational resources are still used for convolutions (e.g. the squeeze and activation blocks add less than 1% in computational cost). Though the convolutions are separable, which already provides a boost in efficiency, there are gains to be had as the architecture still performs 3×3 and 5×5 convolutions, in the depth-wise stage.
A truly optimal configuration would be our hardware in tandem with electronic GPUs, which can be used to compute the lightweight 1×1 convolutions, in the point-wise stages.
Results
The performance of this architecture can be optimized by creating a deeper U-Net (going to 32x dimension reduction, instead of 16x), altering the arbitrarily set channel dimensions of the decoder (e.g by using Bayesian methods) and by tuning the hyper-parameters. These optimizations will likely still result in a model that is much lighter weight than the VGG-16 variant.
For the purpose of this article, however, we would like to simply present the results from the model using dice loss and a learning-rate of 0.0001 on Adam optimizer. The aggressive early stopping stopped the training at the 76th epoch, after 5 successive epochs without an improvement in the validation mIOU.
(Key metrics — mIOU: 45.5, pixel accuracy: 78.1 , average pixel accuracy: 56.9)