Differential privacy a data perturbation technique are Privacy-preserving data publishing techniques based on differential privacy through data perturbation provide a safe release of datasets such that sensitive information present in the dataset cannot be inferred from the published data. Existing privacy-preserving data publishing solutions have focused on publishing a single snapshot of the data with the assumption that all users of the data share the same level of privilege and access the data with a fixed privacy level. Thus, such schemes do not directly support data release in cases when data users have different levels of access on the published data. Privacy-preserving data publishing (PPDP) schemes are designed to prevent the inference of sensitive information in published datasets from data users accessing the published information. Dataset owners use privacy-preserving data publishing (PPDP) techniques to perturb their datasets prior to publishing. In many real-world scenarios, users of a dataset may have different privileges and may need to access the same dataset at different privacy/utility levels requiring multi-level access on the published data.
Can I recommend that you look at k-anonymity by Sweeney? The method is not sufficient, but the intention makes sense: not disclosing anything which could harm the people described in the dataset.
I would recommend you to also look at Quantization, done in Information Theory to efficiently represent/transmit data.
Assume that John's height is 1.8555 m. He is the only one in the database to reach this value. There are 2000 other members in the database who have height of 1.85 m with an accuracy in cm. Then you quantize and assign 1.85 m to John for his height.
Renaud Di Francesco sir, I went through techniques proposed by sweeney like k-anonymity, l-diversity and t-closeness, but this all have limitation in big data environment. here in differential privacy can perrtube data based on probability and add random noise in your data to increase privacy in your data while publishing in the open world.
While recognising the limitations of k-anonymity and evolutions, I recommend to look at these in mathematical depth:
f: name -> data record
it's a mapping
What Sweeney says is: look at k-surjectivity, that is if the inverse map which associates to every "data record" the "name"s of all whose f(name) is "data record" , denote it by f**(-1)(data record), and require before disclosure (as part of a SDC Statistical Disclosure Control methodology) that the set f**(-1)(data record) has at least k elements.
My point is, yes, that's a first approach, Card(f**(-1)(name)) > k
where CardX is the number of elements x in set X.
Now let us define G=f**(-1) and study its properties.
G is a set valued map. There is a deep mathematical framework for set valued maps, developed by Aubin, Cellina, Frankowska.
This includes subdifferentials and similar replacement for set valued maps of differentials for point valued maps (classical f: x ->y, where x and y are elements not like G: y -> X, where y is an element and X is a set), and there is a complete construction called viability theory by Aubin. See for instance:
Book Viability Theory: New Directions
The above defines a massive research domain, to be explored further, beyond the k-anonymity (but inspired by it) and differential privacy (but informed by it).
In this case, you need to distinguish between the definition of differential privacy, and the various algorithms that may be used to achieve it.
Differential privacy itself is defined as a property that helps to measure the degree of privacy achieved with a randomized function. In that sense, it is not a technique for achieving privacy but measures the degree to which such techniques are successful.
To achieve differential privacy, there are many different algorithms which are all based on some form of data perturbation.
Put differently, differential privacy is not itself a data perturbation technique, but you need such a technique to achieve differential privacy.
The time and space complexity can of course only be evaluated for any specific algorithm for achieving differential privacy, not for differential privacy itself. It might be possible to prove that there cannot be any algorithm for achieving differential privacy with a space or time complexity better than some defined value, but I do not know whether any such work exists.
Differential privacy is a data disturbance mechanism. Dwork et al. proposed the definition of differential privacy, which solved the two shortcomings of the traditional privacy protection model. 1) Differential privacy protection has nothing to do with background knowledge. 2) Differential privacy is based on strict mathematics and provides a quantitative evaluation method for privacy protection. The core concept of differential privacy covers a series of research from the field of privacy protection to data science (such as machine learning, data mining, statistics and learning theory). Differential privacy implementation mechanisms generally include Laplace mechanism and exponential mechanism. The perturbation methods include input perturbation, target perturbation, gradient perturbation and output perturbation.
Reference
J. Jia and W. Qiu, "Research on an Ensemble Classification Algorithm Based on Differential Privacy," in IEEE Access, vol. 8, pp. 93499-93513, 2020, doi: 10.1109/ACCESS.2020.2995058.