Consider the diversity of evaluation metrics, that different algorithms detect distinct communities in the same network and that the ground truth does not correspond to the topological structure. In this context, how to properly evaluate the quality of a community structure?