Businesses, large and small, are embracing the Cloud. No matter which research company, vendor or expert you consult, everyone agrees that about four in every five enterprise workloads are either already in the Cloud or will be within the next few months. That means the best-practice security strategy you established just a year or two ago may no longer be fit for purpose.
You need to be ready for a variety of web-based threats such as zero-day exploits, brute force attacks, trojans, phishing, ransomware, Distributed Denial of Service (DDoS) and compromised credentials. The old approach of traditional firewalls and perhaps an intrusion detection system is no longer the answer.
The adversaries you’re defending against typically fall into four main groups. Knowing which of these groups are your most likely attackers and understanding what assets they are most likely to target will help you funnel your resources in the most efficient and effective way possible.
- Cybercriminals are focussed on one key target: money. By either stealing or encrypting data through theft or ransomware attacks, they look for ways to monetise their activities. Your defence against them is philosophically simple: hit their return on investment. By making yourself a more difficult target, you change the economics of the attack, making it harder to profit from their activity.
- Insider threats fall into two groups. You may have a malicious insider who is seeking to steal data for their own profit or other motivations. Or, more commonly, staff members leak information — either by not following procedures correctly or through accidental error. The challenge here is that these are typically trusted people who have been granted some level of access by you.
- Hacktivists are trying to promote their own ideology. For example, web site defacements are typically carried out to promote a political or fanatical religious agenda — like a form of high-tech graffiti.
- Nation-state attackers are countries or political regimes with significant resources and patience. They are trying to further their own political or military interests. More recently, some nation-state attacks have been financially motivated as poorer nations seek to steal money from individuals and businesses in wealthier countries.
Each of those different attackers has specific tools and techniques they favour. But there is significant overlap in how they work. For example, phishing is often used to try to steal login credentials by many attackers. Ransomware can be used by hacktivists to generate funds to support their cause, or by criminals to make money.
The first layer in your defensive strategy should use cheaper and less targeted defences that can quickly and efficiently eliminate crude attack types. For example, filtering email before it reaches inboxes can eliminate many phishing attacks before they reach users. Or honeypots can be established to lure potential attackers to areas on your network that contain data that looks attractive to attackers but is actually worthless. That enables you to slow them down and detect them so you can take further action. You can also block access from known bad IP addresses and web sites, writes Anthony Caruana .
The second layer is more targeted. It involves numerous strategies that address the specific concerns you have. For example, setting application-specific policies on web application firewalls helps protect against attacks focussed on specific applications. Segmenting the network prevents an attacker from easily moving laterally from system to system should they get past your initial defences. You can throttle access to APIs (Application Programming Interfaces) so attackers can’t launch a DDoS attack. All of these solutions can be deployed on the Cloud.
There is also well known conflict between privacy and security, a particularly evident problem in security monitoring solutions. Rising number of cyber threats, have increased the organizations/companies interest about security concerns. Further, the rising costs of an efficient IT security staff and environment is posing a significant challenge. These have created a new fast growing trend named Managed Security Services (MSS). Often customers turn to MSS providers to alleviate the pressures they face daily related to information security. One of the most critical aspect, related to the outsourcing of security issues, is privacy. Security monitoring and in general security services require access to as much data as possible, in order to provide an effective and reliable service.
Privacy and Security
In the last few years, data privacy has become a hot-button issue globally, with high profile scandals and data leaks surrounding prominent companies like Facebook and Equifax resulting in greater privacy awareness among both consumers and businesses. On top of that, companies often share this data with third parties that can analyze it or use it to improve customer experiences, requiring them to give up control over the data that they own. Website cookies have historically been used to track web browsing via a piece of data inserted into your browser, but other techniques such as MAC address and account tracking can be used to see what you’ve been doing on the web. The primary reason that companies are collecting so much data is that they can use it to look for patterns. These patterns power the algorithms that provide personalized experiences, from those annoying ads that follow you around the internet to insurance premiums that are calculated using exercise data.
It’s the insights from analysis that are the real value of data–many businesses don’t care about any single individual’s data, but the insights they can glean from the aggregate. That’s why so many businesses claim to protect user privacy by anonymizing large datasets–they can still look for patterns, while appeasing privacy concerns (though we know that most anonymized data is so distinct that it can easily be identified).But a growing desire to maintain a hold over data, combined with fear over regulation and public frustration, is leading companies to look for more ways to ensure that private data really does stay private.
AI and privacy needn’t be mutually exclusive. After a decade in the labs, homomorphic encryption (HE) is emerging as a top way to help protect data privacy in machine learning (ML) and cloud computing. It’s a timely breakthrough: Data from ML is doubling yearly. At the same time, concern about related data privacy and security is growing among industry, professionals and the public.
The term is derived from the Greek words for “same shape ot structure.” In mathematics, homomorphic describes the transformation of one data set into another while preserving relationships between elements in both sets. Because the data in a homomorphic encryption scheme retains the same structure, identical mathematical operations — whether they are performed on encrypted or decrypted data — will yield equivalent results.
Homomorphic encryptions allow complex mathematical operations to be performed on encrypted data without compromising the encryption. Homomorphic encryption is the conversion of data into ciphertext that can be analyzed and worked with as if it were still in its original form. Encryption typically happens where the sensitive data are first captured, for example, in a camera or edge device. Processing encrypted data happens wherever the AI system needs to operate on sensitive data, typically in a data center. And finally, decryption happens only at the point where you need to reveal the results to a trusted party.
It means that if you do encryption in the right way, you can transform ordinary numbers into encrypted numbers, then do the same computations you would do with regular numbers. Whatever you do in this encrypted domain has the same shape as in the regular domain. When you bring your results back, you decrypt back to ordinary numbers, and you get the answer you wanted.
“It doesn’t have to be a zero-sum game,” says Casimir Wierzynski, senior director, office of the CTO, AI Products Group at Intel. HE allows AI computation on encrypted data, enabling data scientists and researchers to gain valuable insights without decrypting the underlying data or models. This is particularly important for sensitive medical, financial, and customer data. The technique itself has been around for more than 20 years as a theoretical construct. The criticism has been, okay, you can operate on encrypted data, but it takes you a million times longer than using regular data. It was an academic curiosity. But in the last five years, and especially the last two years, there have been huge advances in the performance of these techniques. We’re not talking about a factor of a million anymore. It’s more like a factor of 10 to 100, says Casimir Wierzynski.
The military and intelligence agencies are notoriously miserly when it comes to sharing information. With homomorphic encryption, an agency could perform an encrypted search on another agency’s encrypted data without revealing the subject of the search. Meanwhile, the agency with the data doesn’t have to decrypt it and expose it to the agency performing the search.
Military is also employing cloud to exploit it’s many capabilities including a rent-on-demand computing model, the potential for dramatic elasticity (launching large numbers of applications on short notice at runtime, and for hosting huge amounts of data), tools for data mining in parallel at massive scale and carrying out machine learning and other optimization tasks on huge data sets, tools for graphical knowledge discovery, etc However cloud computing poses many challenges for military to leverage cloud computing , the cloud has a security model strongly shaped by commercial security for web purchases using credit cards. The consistency model is shaped by a desire to rapidly respond to queries using cached data, even if that data might be stale. And the list goes on: cloud users worry about the safety of data on the data center disks, even if encrypted. They worry about vendor lock-in, and about the danger that a critical system might be unavailable at time of urgent need because of some form of third-party cloud outage.
Homomorphic encryption for Data privacy
Recent advances in machine learning applications has been driven by innovation in algorithm design, low cost storage for large training datasets, and powerful neuromorphic computing. However, many useful training datasets can never be shared.
Consider a biomedical application where a large cohort of patient genomic data needs to be compared to identify previously unknown genetic markers of disease. Of course, we need to safeguard patient data privacy and security and, therefore, cannot openly share their genomic data within and between healthcare and/or research organizations.
This will require advanced analytic approaches, such as machine learning, and likely a substantially greater amount of data. Gaining access to large sets of patient genomic data for a particular disease is challenging due to the necessary legal agreements that need to be in place to obtain that data. Homomorphic encryption may address this challenge. If we homomorphically encrypt the DNA sequences of patients, we can then query homomorphically encrypted databases for genetic comparisons. We can then decrypt the final result and get the same answer as we would have gotten using unencrypted DNA sequences.
But Goldwasser’s startup Duality ’s technology could provide an actual solution to the data privacy problem by allowing companies to keep their data fully encrypted and still find patterns in it.
CEO and cofounder Alon Kaufman uses a simple metaphor to explain how it works. Imagine that you’ve put your data inside a box to protect it, he explains. You’re the only one who has the key. With homomorphic encryption, you can then give the box to someone else, and they can put their hands in with their eyes closed. That person can shuffle around the numbers inside without ever seeing them.
“It means the entity doing the math doesn’t ever see the data, doesn’t see the answers but can employ the computations,” Kaufman says. “That’s what companies want. They don’t want the raw data, they want to know the insights. They want to know if they should offer you this deal.”
Carson Sweet, founder and chief executive of Cloudpassage, which works on security for cloud services, says the technology will need considerable development to attract the interest of commercial cloud providers, but could solve significant problems. “You can push encrypted data into a cloud service today, but it can’t be indexed, searched, or operated on,” he says. Sweet says that the privacy and security issues associated with storing and processing medical records make this an area in which the technology could be deployed first. “Federal government and financial services are other areas where people are willing to accept a performance penalty to get better security,” says Sweet.
Homomorphic encryption for Cloud Security
Cloud services are increasingly being used for every kind of computing, from entertainment to business software. Yet there are justifiable fears over security, as the attacks on Sony’s servers that liberated personal details from 100 million accounts demonstrated. Homomorphic encryption is expected to play an important part in cloud computing, allowing companies to store encrypted data in a public cloud and take advantage of the cloud provider’s analytic services.
Kristin Lauter, the Microsoft researcher who collaborated with colleagues Vinod Vaikuntanathan and Michael Naehrig worked on the new design, of prototype storage system based on cryptographic technique known as homomorphic encryption. They say it would ensure that data could only escape in an encrypted form that would be nearly impossible for attackers to decode without possession of a user’s decryption key. “This proof of concept shows that we could build a medical service that calculates predictions or warnings based on data from a medical monitor tracking something like heart rate or blood sugar,” she says. “A person’s data would always remain encrypted, and that protects their privacy.
Here is a very simple example of how a homomorphic encryption scheme might work in cloud computing: Business XYZ has a very important data set (VIDS) that consists of the numbers 5 and 10. To encrypt the data set, Business XYZ multiplies each element in the set by 2, creating a new set whose members are 10 and 20. Business XYZ sends the encrypted VIDS set to the cloud for safe storage. A few months later, the government contacts Business XYZ and requests the sum of VIDS elements. Business XYZ is very busy, so it asks the cloud provider to perform the operation. The cloud provider, who only has access to the encrypted data set, finds the sum of 10 + 20 and returns the answer 30. Business XYZ decrypts the cloud provider’s reply and provides the government with the decrypted answer, 15.
Mobile applications can also benefit from homomorphic encryption. Instead of an application bringing encrypted data into a phone, decrypting it and performing tasks with it, operations could be performed on the encrypted data in the cloud. Not only would that be more secure, but it would also help improve the performance of a device’s resources by transferring compute tasks to the cloud.
Homomorphic Encryption Technology
Only in 2009 did Craig Gentry of IBM publish a mathematical proof showing fully homomorphic encryption was possible. He described a fully homomorphic scheme that allowed complicated processing even though the data was encrypted and users couldn’t see it. “However,” Archer said, processing with FHE even five years ago “was still around 12 orders of magnitude slower than computing with the unencrypted data. A [Defense Advanced Research Projects Agency] program called PROCEED (PROgramming Computation on EncryptEd Data) reduced that to just six orders of magnitude, but that’s still thousands of times slower than computing in the clear.”
A partial solution to computing with encrypted data came out of work done in the 1970s, according to David Archer, principal investigator at Galois. But this “somewhat homomorphic” scheme allowed for computations for only a small number of operations before too much “noise” was introduced, which prevented decryption of the data.
Lauter and colleagues implemented only the most efficient parts of a fully homomorphic encryption system. As a result, they’ve produced a system dubbed “somewhat” homomorphic that can only perform some calculations, but is speedy enough to be used in real software. “I’m trying to look at this from a practical perspective and say what can we do now,” she says.
Only additions and a few multiplications can be done on a piece of encrypted data sent to the system, but that’s enough for many services, says Lauter. “You can still do a lot of statistical functions and perform analysis like logistical regression, which is used to do things like predict how likely a person is to have a heart attack,” she says.
As techniques for fully homomorphic encryption evolve, it might be possible to gradually increase the complexity of calculations that can be performed practically. Today, however, performing calculations using fully homomorphic encryption often takes around 30 minutes, not a few milliseconds, says Micciancio.
A key area of ongoing research in cryptography focuses on bridging this speed gap, and recent developments pioneered by Microsoft Research’s (MSR) cryptography group have made significant improvements. In December 2017, MSR released version 2.3 of its Simple Encrypted Arithmetic Library (SEAL), a fast C++ implementation of the homomorphic encryption system described by Fan and Vercauteren in their paper “Somewhat Practical Fully Homomorphic Encryption”. The encryption system proceeds in two separate stages: First, numerical data are converted into polynomials and embedded in a specified polynomial ring.
Then, the ciphertexts (encryptions) of the polynomials are computed by applying noisy linear transformations involving the public key to each polynomial. Arithmatic operations are then performed on the encrypted data, and as long as the accumulated noise in our computations does not exceed a threshold amount, we can decrypt the computational output with a linear transformation involving the private key and obtain the correct result. The MSR implementation has a number of nice features in addition to the basic encryption apparatus, such as recommendation methods that provide optimal parameters for the initial encryption setup, and a noise budget that reflects the noise incurred when performing a given computational procedure.
Research on FHE has pushed it to the point where it’s feasible for practical use with some applications, Archer said, but programming for it “is a bit worse than using Assembly language 30 years ago. We need a system where domain experts, not necessarily programmers, should be able to take a dataset, homomorphically encrypt it, then send it to my server and it will run and I don’t need to know anything about the code.”
Galois, a Portland, Ore., security services company, intends to change that with help from a $1 million Intelligence Advanced Research Projects Activity contract that aims to assess how feasible it is to easily program with FHE. The one-year Rapid MAchine-learning Processing Applications and Reconfigurable Targeting of Security (RAMPARTS) initiative could finally move FHE to where domain experts who have little cryptographic expertise can use it.
Galois pointed to various areas where this kind of encrypted computing would be valuable:
- Public health analysts could use patient medical records to compute trends such as opioid addiction without having to use unencrypted personal information.
- Nations could work together to make sure their satellites don’t collide without having to share sensitive data such as trajectory information.
- Law enforcement could use facial recognition to identify criminals in videos without risking the privacy of individuals.
- Cyberthreat information could be more easily shared without running the risk of unintentionally revealing proprietary data.
For starters, homomorphic encryption is very compute-intensive. This is an area where Intel can really shine in terms of building optimized silicon to handle this fundamentally new way of computing. We are in a unique position, because Intel works closely with hyperscale, data-centric users of hardware who are building all kinds of exciting AI applications. At the same time, we are a neutral party with respect to data. To make these technologies perform at scale is going to require the kinds of complex software and hardware co-design that Intel is uniquely positioned to provide. We get to collaborate actively with a fascinating range of players across industry, says Casimir Wierzynski.
As with any crypto scheme, people will use it when there’s interoperability and trust in the underlying math and technology. We recognize that to really scale up, to bring all the homomorphic encryption goodness to the world, we need to have standards around it. As interest in privacy preserving methods for machine learning grows, it’s essential for standards to be debated and agreed upon by the community, spanning business, government, and academia. So in August, Intel co-hosted an industry gathering of individuals from Microsoft, IBM, Google, and 15 other companies interested in this space. We’ve identified many points of broad agreement. Now we’re trying to figure out the right approach to bring it to standards bodies, like ISO, IEEE, ITU, and others. They all have different structures and timelines. We’re strategizing on how best to move that forward.