What are the state of the art community detection algorithms for very large graphs?

Remy Cazabet Popular answer

I think we could first answer to your question by another question: what kind of community detection do you want ?

I think until maybe 4/5 years ago, there was a rush to always faster and more scalable algorithms, with a common consensus (sort of) that "best communities" was a synonym of "best global modularity". The Louvain algorithm kind of solved this problem, it's super efficient and gives very good modularity scores. There have been some improvements on it but nothing radical. However, since then, it has been shown that the best modularity was NOT synonym with best communities. So, since then, people are mostly asking the question: WHAT ARE good communities ?. And the answer is not out yet, I feel. So people explore many different aspect (overlap of communities, hierarchical decompositions, modifications of the modularity, other global metrics, local based approaches ...) but there is no clear winner yet to my knowledge. So that was my question, what kind of communities are you interested in ?

So basically if you just care about speed, most people still use one of these 2 for "basic", large scale, non overlapping community detection: Louvain (if you're a modularity believer) or Infomap (if you're not). Both are very fast and scalable, using the codes provided by the authors (I've used them on large graphs). And give meaningful results on most networks. There are optimizations of them (Louvain at least) for multi-thread and so-on.

Of course there are plenty of other methods and some people might have their own favorites.

This is my impression though (I'm not involved with any of these 2 methods)

James F Peters

This is a good question with many possible answers.

Clustering of elements in very large graphs is considered in great detail in

F.J.J. van Ham, Interactive visualization of large graphs, Proefschrift, Technische Universiteit Eindhoven, 2005:

http://www.win.tue.nl/vis1/home/fvham/DL/thesis.pdf

In terms of @Hassan Abedi's interest in detection algorithms, this thesis is good. See Section 6.3.4 (Visualizing the backbone tree), starting on page 71. Clusters with large numbers of nodes can be visualized by large visual elements (see, e.g., Fig. 6.6 on page 72). The top view of the layout of individual clusters is effective (see Fig. 6.7, page 73). Tracing node paths is shown in Fig. 6.11, page 78. The mathematics used to characterize particular clusters is given in Section 6.4.3, page 79. There are lots of nice examples in Fig. 6.13, page 82. A particularly interesting representation of clusters in a graph with 25,898 nodes is shown in Fig. 6.14, page 83. What is beginning to look like a Hadamard matrix in visualizing graphs is shown on page 90.

Very large graphs are common in biochemistry, biological systems (the brain), bonds between molecules in solids, the internet (consider social networks), traffic systems in large cities, electrical grids, triangulations of large number so sites (e.g., up to a million sites in some digital images) with corresponding graphs and huge clusters. To gain insight into the complexity of some triangulation graphs, see Fig. 7.1, page 84, in

E. Csoka, Sampling and local algorithms in large graphs, Ph.D. thesis, Eoetvoes Lorand University, 2012:

https://www.cs.elte.hu/math/phd_th/Csoka_Endre.pdf

More to the point, consider

F. Rahimian, Gossip-based algorithms for information dissemination and graph clustering, KTH School of Information and Communication Technology, Sweden, 2014:

https://www.sics.se/~fatemeh/files/thesis/phd_thesis.pdf

See Chapter 10, starting on page 127, on community detection. The approach in this thesis is strictly algorithmic and short when it comes to the underlying mathematics. To complete the picture, it is necessary to take into account not only Rahimian's approach but also Csoka's approach.

Andreas Kanavos

For a complete survey, you can take a look here

{Community detection in graphs}, http://arxiv.org/abs/0906.0612

Hassan Abedi

thanks @Andreas, i've read that, it's really really a good read about the subject but i'm more interested in knowing about the recent progress in these 5 years, mainly related to the proposed algorithms that could be used for large scale real world graphs;

Remy Cazabet

I think we could first answer to your question by another question: what kind of community detection do you want ?

Of course there are plenty of other methods and some people might have their own favorites.

This is my impression though (I'm not involved with any of these 2 methods)

Bin Jiang

We have introduced a new community detection algorithm, or more fundamentally a new way of thinking for community detection, or classification in general.

Simple networks are like a mechanical watch, which is decomposable, while complex networks like a human brain, which is hard to decompose.

Jiang B. and Ma D. (2015), Defining least community as a homogeneous group in complex networks, Physica A: Statistical Mechanics and its Applications, 428, 154-160.

The head/tail breaks enables us to see the underlying scaling structure of far more less-connected nodes than well-connected ones.

Jiang B., Duan Y., Lu F., Yang T. and Zhao J. (2014), Topological structure of urban street networks from the perspective of degree correlations, Environment and Planning B: Planning and Design, 41(5), 813-828.

Can the popularity of an open-source software project be predicted by analysing its source repository?

Any idea how to identify a "fake user" on a social network?

Best strategy to partition a biconnected graph?

Is there a survey/research paper containing a thorough list of edge or node centralities measures?

How to tell "where does the tail start" in a data pertaining to have heavy-tailed distributions?

Is there a way to shrink a graph while preserving some features from the original graph?

Can someone help me with a community detection algorithm code?

Are there any data mining and machine learning research papers with the data and code available for download?

How to think or reason more critically and analytically?

What features of users textual messages are more important for clustering users into distinctive groups?

Which type of compound does lamda max of 218 indicate in a uv-vis spectrum of a partially purified compound through column and TLC?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Do you know best mines of western part of Afghanistan?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

How can I interpret the data without the need of solving it manually?

Why can't academics earn the money they deserve?