There are many different approaches taken by architectures that perform semantic segmentation. However, most of these models have one thing in common; they have an encoder-decoder structure which utilises convolutional layers in both the feature extraction and up-sampling stages.
The motivation for this article follows the same reasoning as our previous pieces on AI. Performing convolution operations in the reciprocal domain of the data reduces the computational complexity of these operations from quadratic O(n²) to linear O(n) (where n is the input size). The Fourier transform is a mathematical tool that allows us to convert data into its reciprocal form; as such, when paired with fast AI acceleration hardware such as a GPU or TPU, using the Fourier transform can notionally provide a dramatic reduction in the workload, effectively allowing more convolutions to be performed in the same time and using the same hardware.
However, while GPUs typically already contain hardware designed to compute Fourier transforms, the electronic approach to this calculation (regardless of specific device architecture) is bound by a fundamental O(n log n) algorithmic complexity in time. As a means of accelerating convolutions, the electronic Fourier transform would always be a bottleneck in the processing pipeline. This approach to acceleration therefore only works if you have a system that can somehow surpass the limits of the electronic FFT and deliver transformed data at a rate that can keep up with the linear complexity scaling of convolution in the Fourier domain.
When paired with a GPU, the optical approach to Fourier transform calculation can provide this capability; we’ve laid out the essentials of how this works in another article, but how much additional performance can this technique actually provide? This article aims to explore image segmentation networks in detail and provide an indication of how the structure of these networks can, when paired with the above reasoning, see some very significant advantages.
We see a future where optical hardware can work in tandem with a whole range of electronic processors, providing the end user with faster and more efficient computer vision processing solutions. Of course, the properties of the optical Fourier transform are only half of the story; there’s also a huge range of additional considerations regarding the digital-optical interface and the way an optical system interacts with an electronic host, but we’ll be covering these factors in an upcoming article.
For now, in this article we apply our existing hardware demonstrator system to executing tasks in semantic segmentation. This is an especially interesting use of AI for us, as semantic segmentation is often used in mobile and edge applications such as autonomous vehicles and drones, cases where the intensive processing needs of AI often run up against factors such as weight and power consumption. As our system is a micro-scale system that uses vastly less power per operation than an electronic system, we see the combination of greater energy efficiency and fundamental advantages in convolution processing as offering an opportunity for a significant shift in the capabilities of autonomous systems.
Optalysys Beta Program
In order to realise this ambition, we are interested in working with third parties to create next-generation AI and encryption systems leveraging the optical Fourier transform. Access to the Optalysys optical system is not yet open to everyone, but for those interested, please contact us via www.optalysys.com/beta-program (or by mail to info@optalysys.com), for enquiries about the beta program for bench-marking and evaluation.
Article contents