Significant technological advances are being made across a range of fields, including information communications technology (ICT); artificial intelligence (AI), particularly in terms of machine learning and robotics; nanotechnology; space technology; biotechnology; and quantum computing. These advances promise significant social and economic benefits, increased efficiency, and enhanced productivity across a host of sectors. The technologies are mostly dual use, in that they can be used as much to serve malicious or lethal purposes in the hands of hackers and terrorists.
One of the threats from emerging technologies have been privacy threat. ICT technologies like Wi-Fi hotspots, mobile internet, and broadband connections has made us online for predominant part of our daily lives. While living in an ever-more connected world provides us with easier access to lot of useful services and information, it also exposes large amounts of our personal information including our personal data, habits, and life to a wider world. Depending on your browsing habits, the websites and services you visit, all manner of data from your birthday, address and marital status can be harvested from your online presence. Data is valuable & can cause harm if released for example medical records, purchase history, internal company documents, etc.
Some governments are responsible for privacy breach. They carry out online surveillance and don’t really allow their citizens to web browse privately. In the UK, the Investigatory Powers Act allows government authorities to legally spy on the browsing and internet use of British citizens. As such, the government can directly breach your online privacy if they suspect you may be involved in criminal activity, though they need to apply for a warrant to do so, which should mean the average person isn’t being spied on by MI5.
In the last years, privacy has become a critical issue in the development of IT systems. This reflects the growing attention of customers to their personal data and the increasing number of statutes, directives, and regulations that are intended to safeguard the right to privacy. The Privacy security goal is to ensure that only a limited amount of information about “individual” users can be learned. The Confidentiality, that data is inaccessible to external parties is ensured via cryptography. Another is Integrity which ensures that data can’t be modified by external parties.
The security measures s to ensure privacy are Access Control so that only the “right” users can perform various operations. This typically relies on Authentication: a way to verify user identity (e.g. password) and secondly Authorization, a way to specify what users may take what actions (e.g. file permissions). Auditing system can record an incorruptible audit trail of who did each action. Many Security policies can be employed to prevent privacy threat. For example, each user can only see data from their friends. Analyst can only query aggregate data. EU GDPR: General Data Protection Regulation (2018) enables Users to ask to see & delete their data.
A unique solution to preserve the integrity of citizens’ personal data is the attempt to de-anonymize data and prevent anyone from narrowing down to individuals from bulk data used for statistics. This solution was attempted at scale by the U.S. Census Bureau which played around with data labels in census data without interfering with the totals of each age or ethnic group. This method proves that injecting noise into data without impacting overall figures and totals can prevent the malicious identification of individuals through bulk data collected about a certain population. If employed at scale, this can prevent covert estimation of voting patterns of individuals and their likes and preferences.
Differentially private techniques can strip data of their identifying characteristics so that they can’t be used by anyone — hackers, government agencies, and even the company that collects them — to compromise any user’s privacy. That’s important for anyone who cares about protecting the rights of at-risk users, whose privacy is vital for their safety. Ideally, differential privacy will enable companies to collect information, while reducing the risk that it will be accessed and used in a way that harms human rights.
Privacy can be quantified. Better yet, we can rank privacy-preserving strategies and say which one is more effective. Better still, we can design strategies that are robust even against hackers that have auxiliary information. And as if that wasn’t good enough, we can do all of these things simultaneously. These solutions, and more, reside in a probabilistic theory called differential privacy.
Is anonymizing the data be good enough? If you have auxiliary information from other data sources, anonymization is not enough. For example, in 2007, Netflix released a dataset of their user ratings as part of a competition to see if anyone can outperform their collaborative filtering algorithm. The dataset did not contain personally identifying information, but researchers were still able to breach privacy; they recovered 99% of all the personal information that was removed from the dataset. In this case, the researchers breached privacy using auxiliary information from IMDB.
How can anyone defend against an adversary who has unknown, and possibly unlimited, help from the outside world? Differential privacy offers a solution. Differentially-private algorithms are resilient to adaptive attacks that use auxiliary information . These algorithms rely on incorporating random noise into the mix so that everything an adversary receives becomes noisy and imprecise, and so it is much more difficult to breach privacy (if it is feasible at all).
Differential privacy is a rigorous mathematical definition of privacy. In the simplest setting, consider an algorithm that analyzes a dataset and computes statistics about it (such as the data’s mean, variance, median, mode, etc.). Such an algorithm is said to be differentially private if by looking at the output, one cannot tell whether any individual’s data was included in the original dataset or not. In other words, the guarantee of a differentially private algorithm is that its behavior hardly changes when a single individual joins or leaves the dataset — anything the algorithm might output on a database containing some individual’s information is almost as likely to have come from a database without that individual’s information. Most notably, this guarantee holds for any individual and any dataset. Therefore, regardless of how eccentric any single individual’s details are, and regardless of the details of anyone else in the database, the guarantee of differential privacy still holds. This gives a formal guarantee that individual-level information about participants in the database is not leaked.
Global vs. local
In general, there are two ways a differentially private system can work: with global privacy and with local privacy. In a globally private system, one trusted party, whom we’ll call the curator — like Alice, above — has access to raw, private data from lots of different people. She does analysis on the raw data and adds noise to answers after the fact. For example, suppose Alice is recast as a hospital administrator. Bob, a researcher, wants to know how many patients have the new Human Stigmatization Virus (HSV), a disease whose symptoms include inexplicable social marginalization. Alice uses her records to count the real number. To apply global privacy, she chooses a number at random (using a probability distribution, like the Laplacian, which both parties know). She adds the random “noise” to the real count, and tells Bob the noisy sum. The number Bob gets is likely to be very close to the real answer. Still, even if Bob knows the HSV status of all but one of the patients in the hospital, it is mathematically impossible for him to learn whether any particular patient is sick from Alice’s answer.
With local privacy, there is no trusted party; each person is responsible for adding noise to their own data before they share it. It’s as though each person is a curator in charge of their own private database. Usually, a locally private system involves an untrusted party (let’s call them the aggregator) who collects data from a big group of people at once. Imagine Bob the researcher has turned his attention to politics. He surveys everyone in his town, asking, “Are you or have you ever been a member of the Communist party?” To protect their privacy, he has each participant flip a coin in secret. If their coin is heads, they tell the truth, if it’s tails, they flip again, and let that coin decide their answer for them (heads = yes, tails = no). On average, half of the participants will tell the truth; the other half will give random answers. Each participant can plausibly deny that their response was truthful, so their privacy is protected. Even so, with enough answers, Bob can accurately estimate the portion of his community who support the dictatorship of the proletariat. This technique, known as “random response,” is an example of local privacy in action.
Globally private systems are generally more accurate: all the analysis happens on “clean” data, and only a small amount of noise needs to be added at the end of the process. However, for global privacy to work, everyone involved has to trust the curator. Local privacy is a more conservative, safer model. Under local privacy, each individual data point is extremely noisy and not very useful on its own. In very large numbers, though, the noise from the data can be filtered out, and aggregators who collect enough locally private data can do useful analysis on trends in the whole dataset.
But after multiple attempts, they will be able to estimate the ground truth. This “estimation from repeated queries” is one of the fundamental limitations of differential privacy. If the adversary can make enough queries to a differentially-private database, they can still estimate the sensitive data. In other words, with more and more queries, privacy is breached.
“Privacy by Design” Approach
When building IT systems that store and process personal data, designers need to dene system requirements and to ensure that personal data are handled in accordance with applicable laws and regulations. In most applications, informally stated and implicit privacy requirements are as urging as functional and security ones, but they are rarely analyzed and designed carefully from the beginning of the development process. Rather, they often add privacy as an after-though, exposing the system to higher costs while endangering overall
Privacy shouldn’t be treated as an added feature once the product or service has been built and marketed. It must be encompassed in the entire design. In 2020, more businesses will adopt the Privacy by Design approach while creating new technologies and systems. Privacy will be incorporated into tech and systems, by default, which means products or services will be designed with privacy as a priority, along with whatever other purpose the system serves.