For a classification task I want to use a siamese network such that the class assignment is based on the closest match between a given query and a fixed support set. This match is computed based on a l1-norm distance between the query and support embedding and then given to a softmax-layer to make the binary prediction. Now I want to replace or enance the l1-norm distance calculation with an attention mechanism. Any ideas or related research?