The value of data is well recognised, but what does it look like to actually realise this value? Data is often bought and sold in such a way that you can put a nominal price tag on it. However, the value of data is only truly revealed when you can use it to make good decisions.
Whether you’re figuring out who to advertise to, deciding on capital allocation for business projects or attempting to find correlations between lifestyle factors and illness, the unifying aspect of data is that you need to know something about the problem that you’re targeting. This might be knowledge about the preferences of individuals, the performance of competitors, or how active certain people are.
Yet the data that is most useful to you is often in the hands of others. Data about your interests is frequently scattered across a fractured landscape of different services, organisations and authorities. This is why collaboration and data sharing between businesses is becoming an increasingly important part of the modern world.
At the same time, the most valuable data is often the data that can have serious implications in the wrong hands. At the far end of the spectrum of consequences, a failure to manage sensitive data could ruin lives or bring down companies. It could expose the most personal information imaginable to a world of bad actors. That’s why this data is almost always confidential; it needs to be kept out of the wrong hands and shared only with organisations and people that you trust.
However, when companies collaborate on data, there’s a tension between the benefits of collaboration and the risks to confidentiality.
Data has value because it relates to something. Remove that relationship in the interests of protection, and you remove the value. Preserve that relationship, and the risks associated with exposure run higher.
But what if there was a way of resolving this tension? What kind of a world would we see, and what would this mean for businesses and organisations?
As an example of just how powerful the ability to collaborate and share data is, consider the COVID 19 pandemic.
An oft-commented aspect of the vaccine development process is that it typically takes many years, yet the first vaccines were developed with incredible speed.
This speed was in part a consequence of an understandable acceleration in the regulatory processes that govern clinical trials. This in itself is no surprise; faced with collapsing economic output and social unrest, the severity of the pandemic made finding an answer a top priority for governments around the world.
However, the vaccine development push was also characterised by a comparatively unheard-of degree of cooperation between companies and other organisations. Writing on the subject in the journal Vaccine, Druedhal et al identified 93 vaccine development partnerships (collaborations featuring more than 2 distinct organisations) of which 35 explicitly engaged in the sharing of knowledge and expertise.
This includes the partnership between BioNTech, Fosun Pharma and Pfizer that was responsible for what is by far the most widely produced and distributed vaccine.
What makes the Pfizer (and related Moderna) vaccines particularly interesting is the use of a novel approach to vaccine technology. In the case of the Pfizer-BioNTech collaboration, BioNTech provided novel mRNA vaccine candidates while Pfizer provided the supporting infrastructure (including clinical trial, legal and manufacturing capabilities) that were essential to proving the capability of the vaccine and delivering the result at scale.
The development of the Pfizer vaccine is notable not only for the extent and speed of the collaboration, but also for the quantity of unique and valuable information that needed to be shared. This not only included the core biomedical understanding of how the vaccine functioned, but also the data proving that it worked.
However, in the earliest stages of the pandemic, BioNTech and Pfizer were even sharing information without a formal agreement in place.
This was in itself quite unusual. Companies often collaborate, but the process is typically lengthy and features extensive negotiations regarding aspects such as intellectual property, royalties, and of course the legal implications if confidentiality is violated.
Even then, only information that must be shared is exchanged.
The reason for this is as much about the security of information itself as it is about trusting the other party. Digital systems and telecommunications networks have made the transfer and duplication of information a trivial task. This is mostly a good thing, but the ease with which data can be shared has always been understood to carry enhanced risks too.
The readiness with which digital information can be copied, coupled with the difficulty of protecting complex modern computing systems, makes important information comparatively easy to steal.
As a result, the world is well understood to be a hazardous place for valuable data. Quite aside from the ongoing churn of breaches and thefts committed by criminal gangs, reporting on the early stages of the war in Ukraine has been dotted with recurrent concerns about the potential for Russia to lash out with state-sponsored cyberattacks.
While this particular threat has yet to materialise to the extent that was originally predicted, the global cybersecurity threat landscape is extremely broad, ranging from malicious individuals through to state actors with massive resources at their disposal.
And generally speaking, the more valuable the data, the more effort that these threats will put in to acquiring it. If the data you have is valuable enough to come to the attention of the biggest threats, maintaining the security of that data can be extremely hard. Early knowledge in the vaccine sector (especially of the outcome of the clinical trials) would have been incredibly valuable; PWC estimates that up to $110 billion dollars worth of investor money was dependent on the outcome.
And so it is that another notable aspect of the story of the COVID-19 vaccine is the scale of the espionage efforts that were launched against researchers, including the Pfizer-BioNTech-Fosun Pharma work. Accusations of state-backed hacking targeting vaccine data have been levelled against China, Russia and North Korea, albeit to strenuous denials.
However, while large states have at their disposal considerable technical expertise, state sponsored hacking doesn’t necessarily entail sophistication. In the case of the COVID 19 research attacks, Microsoft identified brute-force login attempts and social engineering methods as major attack vectors.
These seemingly crude methods nevertheless highlight the particular problem with collaboration. Even if you trust who you’re collaborating with, the more organisations that have access to your commercially sensitive data, the greater the attack surface and the less you can do about it. All it takes is one weak link in the chain and confidential information can be lost.
This is the principle that these attacks were leveraging. Why put time and effort into more creative or risky techniques when you can simply roll the dice over and over again? You only need to be lucky once; if enough external organisations are admitted into the circle of trust, the ability to control data access becomes more and more limited, and the odds of success climb dramatically. The speed and scope of COVID 19 vaccine collaborations will have made ensuring confidentiality across the full chain of companies, suppliers and even individual academics especially difficult.
However, it needn’t be this way. Novel technologies offer us a new model of collaboration, one in which information can be shared and worked on while simultaneously shutting out existing attacks. Here’s why that’s revolutionary.
When it comes to medical research, the current balance is always one of confidentiality against utility. Even identifying potential participants for a clinical trial requires the ability to search over characteristics that can be intensely sensitive, such as age and ethnicity, or medical characteristics such as cardiac function and mental health.
This is some of the most sensitive data that can be held about a person, and on a global level the confidentiality of this data is often enforced by regulation, such as GDPR in the EU, or HIPAA and the HITECH act in the US.
Trust is therefore critical to the process, yet the landscape of scientific research (and who carries it out) poses its own challenges. While it could be inferred that most people would support the sharing of data for social benefit, clinical development is often for commercial gain, and the entanglement between public, private and third-sector organisations poses difficult questions about who can access this data, how it is transferred and handled, and the validity of its use.
At the same time, the ability to safely gather, share and use this information is without question. Data sharing platforms such as Vivli allow data from medical trials (including the Pfizer-BioNTech COVID 19 studies) to be shared, increasing the amount of scientific knowledge that can be extracted from one or more datasets. Meanwhile, companies such as Huma are leveraging mobile platforms to revolutionise the speed and efficiency with which clinical trials can be set-up, run and evaluated.
The seemingly intractable problem here lies in severing the link between sensitive data and the individuals who generated it. Existing methods of doing this tend to rely on forms of data anonymization to strip away information that might be useful in associating sensitive information with specific individuals. However, not only does this result in the loss of potentially useful data or accuracy (e.g removing or aggregating aspects such as address, or adding statistical noise to the dataset), but anonymization itself is not sufficient to guarantee that this process cannot be reversed, especially in an age of widely available computing power.
Fully homomorphic encryption offers a solution by enabling blind computation. This allows the generation of a statistical analysis (be that relatively simple, or a more complex process such as executing a deep learning model) without ever needing to work on the dataset in the clear. Because the underlying data is never decrypted, the individual data-points that could be used to de-anonymize an individual can’t be seen at any point.
FHE also supports models of secure multi-party computing, allowing organisations to share information and gain insight into data without revealing individual elements of data that could be confidential.
Under this use model, companies can share data without ever exchanging information in the clear. Centralised data silos can now allow data to flow more freely, unlocking insights and ushering in a new era of information utility.
Indeed, FHE even has applications in mobile device contact tracing that overcome some of the known problems in existing techniques. Through blind computation over the proximity and time duration between devices, users can be notified of their exposure through a process that never reveals any identifying data.
But why stop here? Medical research is an exceptionally valuable application, but the capabilities that FHE gives us are not limited to this field.
What if every organisation could now work with a high level of data sharing and still retain control over critical information?
In short, what if the incredible rate of progress made in developing the COVID 19 vaccine could be the new normal across every industry, and not an exception?
This would require a world in FHE is available to every enterprise, not just those dealing with the most sensitive information. Achieving this vision means realising a world in which FHE is deployable at scale.
It’s worth noting that the notion of scale invoked here is quite flexible. At the top end, a world truly transformed by FHE is one in which everything that needs to be kept secret online is encrypted as a matter of course. However, on an industry-by-industry basis, there is still a significant amount of commercial value to be realised by enabling even a comparatively limited amount of secret computing.
What is undeniable is that the ability to realise the bigger vision is no longer a purely academic question. In the 13 years since the first demonstration of a fully homomorphic encryption scheme, an astonishing amount of progress has taken place to make FHE faster, more efficient, and applicable to a greater range of problems. Work continues apace, with an emphasis now on making FHE more accessible to end-users and non-experts. The sole limiting factor to reaching the very top of the scale is now a matter of engineering, overcoming the problems of speed inherent to executing FHE on existing computing architectures.
At Optalysys, we are solving this last challenge. By tackling a key mathematical operation in FHE using our optical computing technology, we can deliver enormous acceleration while minimising the inefficiency of existing hardware in performing FHE tasks. However, while the core optical technology we have developed is essential to a solution, we also understand that it isn’t the only piece of the puzzle. That’s why we’ve engineered a complete answer to the technical needs of FHE, one that not only brings a new method of computing to bear on the problem, but can be deployed at the scale needed to make these opportunities a reality.
Engineering this solution is not a trivial task. It involves translating a wide range of FHE schemes intended for different use-cases into a hardware architecture that is flexible and powerful enough to support a range of different FHE workflows.
This is the purpose of our Enable product development that we described in our last article.
Enable is designed from the ground-up to be an FHE accelerator that can provide universal acceleration for every contemporary FHE scheme.
This single solution is intended to allow FHE to be deployed at the scale required in the cloud, and alongside development of our core optical chip technology will be the focus of our efforts over the next 18 months.