Motivation:

My plan is to get the overall euclidean distance matrix for all the vectors in N number of dataset. Each dataset is basically an array of n-dimensional points. For e.g: A dataset can be like [[1,2,3,...n],[1,2,3,..n],...[1,2,3,n]]. However, the datasets are not gonna be shared to a single entity for which I cannot compile them and thus will not know all the points to calculate pairwise euclidean distance for some similarity calculation and clustering analysis and modeling. However, I will be only notified what are the common vectors in the datasets and distance of the vectors in any dataset with respect to those common vectors without knowing or passing the point's coordinates. If the dataset could be shared in the same model, I wouldn't have to face this distributed calculation problem.

Details:

A and C are two n-dimensional vectors from two different dataset. They have a common vector B. I want to calculate euclidean distance between A and C without exposing A and C, rather using B. Lets say ED(A,B) is calculated in model_1 and ED(B,C) is calculated in model_2, I can easily use ED(A,B) i.e AB and ED(B,C) i.e BC to compute ED(A,C). However, if someone know B they can easily find out A and C. Even if B is a randomly created common point, if it available, A and C can also be found. Is it possible to use B such that A and C cannot be retrieve in anyway. I have looked into differential privacy, PATE, SMPC techniques but they have their limitations in preserving privacy.

Work done so far :

https://www.online-python.com/Pq7OhM8FJe

Question

Given the euclidean distance between vector A and vector B, and that between vector B and vector C, how to calculate the euclidean-distance between point A and C such that there is no way to retrieve A and C even if B is known?

Any computation technique I can follow?

Thanks in advance

More Md Shihab Ullah's questions See All
Similar questions and discussions