How to get euclidean distance between vector A and C without anyway of retrieving them when their distances with a common vector B is known?

15 August 2021 3 7K Report

Motivation:

My plan is to get the overall euclidean distance matrix for all the vectors in N number of dataset. Each dataset is basically an array of n-dimensional points. For e.g: A dataset can be like [[1,2,3,...n],[1,2,3,..n],...[1,2,3,n]]. However, the datasets are not gonna be shared to a single entity for which I cannot compile them and thus will not know all the points to calculate pairwise euclidean distance for some similarity calculation and clustering analysis and modeling. However, I will be only notified what are the common vectors in the datasets and distance of the vectors in any dataset with respect to those common vectors without knowing or passing the point's coordinates. If the dataset could be shared in the same model, I wouldn't have to face this distributed calculation problem.

Details:

A and C are two n-dimensional vectors from two different dataset. They have a common vector B. I want to calculate euclidean distance between A and C without exposing A and C, rather using B. Lets say ED(A,B) is calculated in model_1 and ED(B,C) is calculated in model_2, I can easily use ED(A,B) i.e AB and ED(B,C) i.e BC to compute ED(A,C). However, if someone know B they can easily find out A and C. Even if B is a randomly created common point, if it available, A and C can also be found. Is it possible to use B such that A and C cannot be retrieve in anyway. I have looked into differential privacy, PATE, SMPC techniques but they have their limitations in preserving privacy.

Work done so far :

https://www.online-python.com/Pq7OhM8FJe

Question

Given the euclidean distance between vector A and vector B, and that between vector B and vector C, how to calculate the euclidean-distance between point A and C such that there is no way to retrieve A and C even if B is known?

Any computation technique I can follow?

Thanks in advance

David Eugene Booth

I would consult Kauffman and Rousseau, Finding Groups in Data, for some ideas. the z-library has the book. Software is in R package, cluster. Best wishes, David Booth

Aravinda C V

https://math.stackexchange.com/questions/4217205/get-distance-between-point-a-and-c-when-their-distances-with-a-common-point-b-is

Md Shihab Ullah

@Aravinda, this is similar question I post in math-exchange. My approach mentioned in the python scripted attached in the question has huge privacy issue.

I think the main privacy problem is in line 19 where we can get direction of the vectors from the difference:

https://www.online-python.com/puVbWm8yce

Let's say, instead of (vectorB - VectorAD1) and (vectorB - VectorAD2), we take euclidean distance of AD1 and B and that of B and AD2 which are scalar.

Are there any way to compute the aggregated euclidean distance of AD1 and AD2 using those scalar? If we can minimize the error as much as possible is also great.

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

How to convert g/kg Humic acid dose to kg/ha?

How to send my account link and how to make my account public?

Bangladesh government's reported plan to use lethal force against protesters? We need help Urgently ?

Transfection in HEK293T cells?

"How has Leader Sheikh Hasina's government allegedly responded to student protests, including the reported killing of over 500 students ?

Are you looking for research collaboration ?

Can a photocatalytic degradation of methylene blue from red mud be pseudo- zero order kinetics?

How to calculate pseudo order kinetics?

How can I calculate spin texture using Quantum Espresso for non-colinear case ?

How can I prepare virus for a TEM or SEM imaging?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?