I wanted to see what are the algorithms proposed to the day to handle {detect communities or clusters of densely interconnected nodes} large graphes with millions of nodes and edges, any resources or suggestions would be appreciated.
I think we could first answer to your question by another question: what kind of community detection do you want ?
I think until maybe 4/5 years ago, there was a rush to always faster and more scalable algorithms, with a common consensus (sort of) that "best communities" was a synonym of "best global modularity". The Louvain algorithm kind of solved this problem, it's super efficient and gives very good modularity scores. There have been some improvements on it but nothing radical. However, since then, it has been shown that the best modularity was NOT synonym with best communities. So, since then, people are mostly asking the question: WHAT ARE good communities ?. And the answer is not out yet, I feel. So people explore many different aspect (overlap of communities, hierarchical decompositions, modifications of the modularity, other global metrics, local based approaches ...) but there is no clear winner yet to my knowledge. So that was my question, what kind of communities are you interested in ?
So basically if you just care about speed, most people still use one of these 2 for "basic", large scale, non overlapping community detection: Louvain (if you're a modularity believer) or Infomap (if you're not). Both are very fast and scalable, using the codes provided by the authors (I've used them on large graphs). And give meaningful results on most networks. There are optimizations of them (Louvain at least) for multi-thread and so-on.
Of course there are plenty of other methods and some people might have their own favorites.
This is my impression though (I'm not involved with any of these 2 methods)
In terms of @Hassan Abedi's interest in detection algorithms, this thesis is good. See Section 6.3.4 (Visualizing the backbone tree), starting on page 71. Clusters with large numbers of nodes can be visualized by large visual elements (see, e.g., Fig. 6.6 on page 72). The top view of the layout of individual clusters is effective (see Fig. 6.7, page 73). Tracing node paths is shown in Fig. 6.11, page 78. The mathematics used to characterize particular clusters is given in Section 6.4.3, page 79. There are lots of nice examples in Fig. 6.13, page 82. A particularly interesting representation of clusters in a graph with 25,898 nodes is shown in Fig. 6.14, page 83. What is beginning to look like a Hadamard matrix in visualizing graphs is shown on page 90.
Very large graphs are common in biochemistry, biological systems (the brain), bonds between molecules in solids, the internet (consider social networks), traffic systems in large cities, electrical grids, triangulations of large number so sites (e.g., up to a million sites in some digital images) with corresponding graphs and huge clusters. To gain insight into the complexity of some triangulation graphs, see Fig. 7.1, page 84, in
E. Csoka, Sampling and local algorithms in large graphs, Ph.D. thesis, Eoetvoes Lorand University, 2012:
F. Rahimian, Gossip-based algorithms for information dissemination and graph clustering, KTH School of Information and Communication Technology, Sweden, 2014:
See Chapter 10, starting on page 127, on community detection. The approach in this thesis is strictly algorithmic and short when it comes to the underlying mathematics. To complete the picture, it is necessary to take into account not only Rahimian's approach but also Csoka's approach.
thanks @Andreas, i've read that, it's really really a good read about the subject but i'm more interested in knowing about the recent progress in these 5 years, mainly related to the proposed algorithms that could be used for large scale real world graphs;
I think we could first answer to your question by another question: what kind of community detection do you want ?
I think until maybe 4/5 years ago, there was a rush to always faster and more scalable algorithms, with a common consensus (sort of) that "best communities" was a synonym of "best global modularity". The Louvain algorithm kind of solved this problem, it's super efficient and gives very good modularity scores. There have been some improvements on it but nothing radical. However, since then, it has been shown that the best modularity was NOT synonym with best communities. So, since then, people are mostly asking the question: WHAT ARE good communities ?. And the answer is not out yet, I feel. So people explore many different aspect (overlap of communities, hierarchical decompositions, modifications of the modularity, other global metrics, local based approaches ...) but there is no clear winner yet to my knowledge. So that was my question, what kind of communities are you interested in ?
So basically if you just care about speed, most people still use one of these 2 for "basic", large scale, non overlapping community detection: Louvain (if you're a modularity believer) or Infomap (if you're not). Both are very fast and scalable, using the codes provided by the authors (I've used them on large graphs). And give meaningful results on most networks. There are optimizations of them (Louvain at least) for multi-thread and so-on.
Of course there are plenty of other methods and some people might have their own favorites.
This is my impression though (I'm not involved with any of these 2 methods)
We have introduced a new community detection algorithm, or more fundamentally a new way of thinking for community detection, or classification in general.
Simple networks are like a mechanical watch, which is decomposable, while complex networks like a human brain, which is hard to decompose.
Jiang B. and Ma D. (2015), Defining least community as a homogeneous group in complex networks, Physica A: Statistical Mechanics and its Applications, 428, 154-160.
The head/tail breaks enables us to see the underlying scaling structure of far more less-connected nodes than well-connected ones.
Jiang B., Duan Y., Lu F., Yang T. and Zhao J. (2014), Topological structure of urban street networks from the perspective of degree correlations, Environment and Planning B: Planning and Design, 41(5), 813-828.