Homogeneous person re-identification based on RGB images has been well researched. With the development of public security and the need of pedestrian retrieval in complex real-world situations, a great deal of research work begins to explore the field of pedestrian matching based on multi- modal heterogeneous data sources, called cross-modality heterogeneous person re-identification.
As we all know, graph neural network can transform image into graph structure and match the similarity of two images. But can graph neural network work for cross modal scenes?