Optalysys: What we do (And why we do it)

[12 minute read]

Conventional computing is rapidly approaching the physical limits of what can be achieved with silicon electronics. As the size of the transistors used to create complex logic-solving circuits shrinks ever further, gains in performance become outweighed by growing electrical power and thermal management requirements. At this point Moore’s law (an observation that the density of transistors on a chip doubles roughly every two years), which has reliably predicted processor development for over 50 years, breaks down.

The breakdown of Moore’s law coincides with an explosion in demand for data processing. There have been huge advances made in the development of Graphics Processing Units (GPUs) and other specialist processors dedicated to performing a single task on many items of data simultaneously, with far greater speeds and efficiencies than a Central Processing Unit (CPU). However, the problem remains that these co-processors still face the same fundamental limitations of every other electronic system.

As a result, there has been a push towards radically different types of information processing using alternatives to silicon electronics. Quantum computing is a prime example, as are biologically-inspired systems that take their inspiration from the function of the brain (so-called “Neuromorphic” systems) or indeed from other even more exotic sources

Performing calculations using light, optical computing, is one of these processing solutions. It is this alternative that we at Optalysys are developing.

What is Optical Computing?

Light has some remarkable properties. Famously, it acts as both a particle and a wave. Individual photons are discrete packets or “quanta” of energy, but they have no mass and travel as an oscillating perturbation of the electromagnetic field. This wave-like property allows light to interfere with itself, the waves either cancelling or reinforcing each other in different locations, creating “interference fringes”.

This effect can only usually be seen under certain conditions. The light has to be initially coherent, which means that all the photons start out sharing the same frequency and a consistent phase relationship. This is the kind of light produced by a laser. If the phase of any part of the light is altered, the interference pattern changes.

It is this ability of light to interfere with itself that forms the basis of computing with silicon photonics. By adjusting these properties so that we can represent and store information in beams of light and then allow them to interact, we can perform a calculation without using any transistors at all and overcome the limitations of Moore’s law.

The advent of artificial intelligence and machine learning techniques has spurred efforts across the computing world to make these tools work faster. A significant part of executing most machine learning models is performing multiplication operations on what are known as “matrices”, grids of numbers that have specific mathematical rules for (amongst other things) multiplication and addition. In a hardware context, these multiplications can be performed using elementary “Multiply and Accumulate” (MAC) operations.

While the operations needed for machine learning are almost entirely multiplications and additions of two numbers, a great many of them need to be performed in a very short period of time. Silicon photonics offers a way to perform these operations at phenomenal speed, and most companies working in silicon photonics are pursuing a similar objective to this.

Optalysys are taking a different approach. Matrix multiplication is not the only powerful computing tool that can be accelerated by optical computing, and our hardware is designed to perform another very specific mathematical function. This function is not only of tremendous importance to scientists and engineers, but has great relevance to the digital world around us. 

If you’ve ever watched a show on Netflix, listened to a piece of digital music or just browsed the web, this function has been applied countless times in analysing, processing, and transmitting this information. 

This function is known as the Fourier Transform.

The Fourier Transform

The Fourier transform is fundamentally a tool for analysing repeating patterns. Given some piece of information, which could be an audio track, an image, or a machine-learning data-set, it allows us to unpick and isolate the simple wave-like harmonic patterns (sines and cosines) that, when added back together, return the original piece of information. For such a phenomenally powerful tool, it has a surprisingly simple equation:

The 1-dimensional Fourier transform equation. Explained verbally, it tells us that we can represent our original function f(x) by continuously multiplying it by a vector of length 1 rotating around the origin of the complex plane at different frequencies, and then summing up the area under the curve this produces. The value of this summation as it changes with frequency is the Fourier transform of f(x).

Despite the deceptive simplicity, this mathematical function has applications across the entire spectrum of information and data processing. In some cases, the Fourier transform is directly useful, such as when we are dealing with signals that contain information that we want to extract. In other cases, it can massively simplify complex problems that would otherwise be intractable. To say that it is one of the most useful mathematical functions in history is not an overstatement. 

Applications of the Fourier Transform

A great example of the way that Fourier transforms can accelerate computing tasks lies in machine learning for image recognition, through the use of what is known as a Convolutional Neural Network or CNN.

In a digital system, images are represented as grids of numbers. The value of these numbers corresponds to the colour of each pixel that is to be displayed. A colour image is usually made from three of these grids, which represent the individual red, green and blue colour channels that make up the image. 

However, a photograph of an object also contains significantly more information than is required to recognise the subject. Consider the following images of a mug. The image on the left is a colour photograph; the image on the right is a line drawing of the same mug.

The image on the right contains less information than the image on the left. There are no shadows, no colour, no background, and yet we can still recognise that we’re looking at two different pictures of the same mug.

This is because our brains contain neural structures that perform this task at incredible speed. You’re using them right now, as you read this sentence, to recognise letters and words.

What’s really impressive about the brain is that it can recognise objects regardless of their orientation. If we flip the image of the mug on its side, we can still recognise it as the same mug.

For a conventional (or “fully connected”) neural network, recognising a mug from a photograph would be an extremely difficult task. Even if the network could learn to recognise a mug, it would struggle with changes in orientation.

A fully connected neural network. Each element in a layer of the network can pass values forward to any element in the next layer. By changing how significant each connection is to the answer, the network can be taught to perform classification tasks. If we were to make a fully connected network to process a colour image, there would need to be millions of inputs (one per pixel), most of them corresponding to irrelevant information. This would drown out the relevant information, making it difficult for the network to accurately classify the subject of the image.

CNNs take their name from their use of a mathematical tool called convolution. In image recognition, a grid of numbers called a kernel (K) is “convolved” with a section of the grid of numbers that make up the input image (I). The kernel “slides” over the image, multiplying together the values of K and I that overlap.

CNNs take their name from their use of a mathematical tool called convolution. In image recognition, a grid of numbers called a kernel (K) is “convolved” with a section of the grid of numbers that make up the input image (I). The kernel “slides” over the image, multiplying together the values of K and I that overlap.

This sliding operation produces a new grid of numbers made from the results of each convolution operation. This is how CNNs strip out the useful information from an image. By using an appropriate arrangement of numbers, if the kernel passes over a pattern in the picture that corresponds to the feature the kernel is designed to look for, the convolution of the kernel with this pattern will return a more significant value than if this feature is absent.

We’ll go into greater detail as to how CNNs work in a later article; for now, lets return to our picture of a mug and see this in action. We’ll apply a simple edge-detection filter that uses kernels to strip out the excess information and returns a map of the visible edges in the picture.

An example of a simple kernel that detects edges as part of Sobel edge detection and the feature map it creates. The kernel shown covers an area of 5×5 pixels, and detects horizontal edges. If there is no edge in the kernel window, the arrangement of the numbers in the kernel ensures that the sum of the multiplications is small, and no edge is detected. The image on the right is the feature map of edges that is built up as the kernel moves over the image on the left.

We can see that there is still a large amount of excess information even after filtering, so a CNN network often features what are known as “pooling” stages in which the resolution of the feature maps is progressively reduced. Pooling strips out excess data and prevents a rapid explosion in the amount of information in the network due to an increasing number of output channels. This is then followed by additional feature-extraction stages which are capable of recognising more complex structures in an image. 

We can think of a CNN as a machine that crushes an image down until only the most relevant information remains, which can then be fed into a fully-connected network of the kind described above. The fully connected network is then capable of deciding what the subject of the image is without the need for millions of inputs, and the removal of irrelevant data makes it much more accurate at the task as well.

Unlike the edge detection kernels used above, which were developed by human researchers, the kernels used by a CNN are usually created by “training” the network. This is done by giving the network a huge data set of images from which the system can learn which patterns of numbers are good for picking out the details that are useful for categorising particular objects.

This is a very powerful technique, and can also be used for machine learning tasks beyond image recognition; Convolutional Neural Network techniques have been applied in solving significant problems in fundamental fields such as structural bioinformatics and applied quantum mechanics.

However, this is also computationally “expensive”. Let’s say we have a simple colour picture with 3 input channels corresponding to the individual red, green and blue colours that make up the image, and we want to get 4 output channels (one for each kernel that detects a different feature).

A diagram representation showing how 3 input channels I are used to produce 4 output channels O by using 4 different kernels K. Each kernel is intended to detect a different feature in an image.

These output channels are all maps of one particular set of features, so we need to combine the outputs from a kernel sweep of each colour channel. To produce these 4 output channels, we would have to perform 3 x 4 different convolutions with 4 convolutions per input channel. After this process is finished, we then perform the element-wise sum of the result of the convolutions across the input channels. In the case of the above example, this would mean that the first output channel is defined by the equation below:

 
1*iraXfOfDHe5RPhOCtZXMxA-2.png
 

This requires quite a lot of multiplications just to generate one output channel! This leads to the idea of the number of operations required to perform a calculation. In computer science, this is often expressed through something called “Big-O notation”, which gives an overview of computational efficiency.

For example, if the number of required operations increases by 1 as you add 1 more input, this operation is said to grow or “scale” as O(n), where n is the total number of inputsFor the 2D convolution operation seen above, the number of operations scales as O(n²).

 
 

A graph showing how the number of operations increases as more data is added to algorithms with O(n) and O(n²) scaling. For a system where the time taken to execute operations in sequence is fixed (like a computer with a fixed clock speed), the increase in operations correlates directly with an increase in the time taken to calculate the answer.

This is a steep cost for calculation, and makes the standard convolution operation computationally “expensive”.

However, the Fourier transform gives us a much faster method. One of the properties of the Fourier Transform is that the convolution of two grids of numbers can be performed by converting the kernel and image from their representation in terms of position into a representation in terms of frequency by taking the 2-dimensional Fourier transform

of each, multiplying the two together, and then converting the result back into the spatial representation by performing the “inverse” Fourier transform on the grid of multiplied values. The equation for this is

This conversion from one representation to another is what the Fourier transform excels at. The result of applying the Fourier transform to an image is shown below:

An image of a mug (spatial representation), and the 2-dimensional map of harmonics that can be used to represent the image. The frequency map is generated using the 2-dimensional Fourier transform. The brighter regions correspond to those frequencies which contribute more information to the reconstruction of the image. By simply multiplying each point in the Fourier transform of the image with the corresponding point in the Fourier transform of a kernel and then taking the inverse Fourier transform, the convolution operation can be performed more efficiently.

This drastically reduces the number of operations required from O(n²) to O(n), which speeds things up considerably! This is just one of the many ways that using Fourier transforms can accelerate massive computing tasks, and convolution stages performed on Graphics Processing Units frequently make use of this property.

But there is a catch. The usual algorithm for calculating a Fourier transform in a digital system is known as the Cooley-Tukey Fast Fourier Transform (FFT), which requires O(n log n) operations. Here’s a graph comparing O(n) and O(n²) to O(n log n)

This is a significant drawback, as we can see that O(n log n) also becomes very large very quickly. To make use of the O(n) time scaling provided by the convolution function, we first have to prepare the data using an O(n log n) operation! It’s still faster, but not as fast as we’d like.

Optalysys have created a solution that overcomes not only this problem, but many others too.

Fourier Optics

The interference patterns produced by light reveal something fundamental and deeply beautiful about the universe, but they also have useful properties. Under specific circumstances, if we focus the interference patterns created by a light source that contains light of different phases and intensities, the pattern of intensity and phase that we see at the focal plane corresponds to the Fourier transform of the original light source. For our purposes, this means that light can perform the Fourier transform instantly, in O(1) time! 

An optical Fourier transform arrangement. Here, we take the Fourier transform of an image of the figure “4” by using a grid of individual points that emit laser light to create a single beam encoded with the original image. As this beam travels along the optical path, the individual light sources spread out and interfere with one another. Focusing this “diffracting” beam by placing a lens at the appropriate focal length produces the 2-dimensional Fourier transform of the original image at a detection plane. This entire process takes place at the speed of light. 

Here’s the same graph from before, but now with O(1) time operation added to it.

O(1) time is the green line at the bottom. It’s flat because the number of operations doesn’t increase as we add more data to the calculation. This is what makes the Fourier-optical approach so powerful; not only does it return the answer instantly, it does so regardless of the amount of information put into the system!

The way that light from the source spreads out and interferes with itself is actually doing the work for us in a massively parallel calculation without a single bit of the Fourier transform having to be done in electronics. Not only is this incredibly fast, but it is tremendously energy efficient compared to conventional electronic systems. 

In fact, for the purposes of the convolution operation described above, the entire process of convolution can be performed in a 4-f coherent optical processing system, in which two Fourier optical stages are linked by an optical multiplication stage that performs the multiplication of the transform of the kernel and image.

Previous implementations of this type of system were based on liquid crystal micro-displays that encoded information into the laser light, using bulk optical components and discrete camera sensors.

A Bench mounted 4-f optical system

Aligning the laser and lenses correctly is a difficult task, and the entire system is so sensitive to movement it has to be assembled on a special vibration-isolating optics bench.

Optalysys’s previous developments have produced optical co-processor versions of such systems mounted to drive electronics that can be installed into PC motherboards.

The Optalysys FT:X 2000 optical co-processor, a compact Fourier-optical system.

However, by building the optical circuits on a single piece of silicon, the problem of size and vibration are eliminated. This is what Optalysys have done; we have designed and built an integrated silicon photonic co-processor that uses the interference properties of light to perform the Fourier transform at a speed and power consumption that was previously impossible. In our architecture, we can perform a Multiply and Fourier Transform operation, or MFT, at incredible speed.

A section of the Optalysys silicon-photonic chip, shown with scale

Now, applications and uses of the Fourier transform that were previously locked away by computational effort are made possible. A new era of fast computing with light, powered by our technology, is dawning.

……..

As the optical computing revolution rolls on, we’ll be using this space to explore the underlying science and applications of this technology across a vast array of disciplines that extends from academia all the way through to the end-user. Our next articles on Medium, coming soon, will explore the fundamentals of the Fourier transform, the use of our system in cryptography, how optical systems can effectively work alongside conventional electronics, and where the future will take us. There is much more to come, and we hope to see you there.