SUMMARY:
The authors in this paper propose X-Stream, which is a system for scaling-up graph processing on a single shared-memory machine. It keeps state in the vertices and disclosures a scatter-gather programming model. The computation is structured as a loop, each iteration of which consists of a scatter phase followed by a gather phase. The approach to scale-up graph processing is to sort the edges of the graph by originating vertex and build an index over the sorted edge list. The results show that X-Stream scales pretty well and is in fact an appealing approach compared to the other approaches in indexing the edge list and performing random access through the index.
Pros:
For me the design of a statically sized and statically allocated data structure, which they called the stream buffer, to store variable-sized data items was a well-designed approach to avoid the overhead of dynamic memory allocation.
Also supporting interfaces other than edge-centric scatter-gather was a plus. For example, X-Stream supports the semi-streaming model for graphs or graph algorithms that are built on top of the W-Stream model.
Absorbing new edges with little overhead because it supports efficient recomputation on graphs with the newly added edges.
In general it seems it provides a pretty fast approach.
Cons:
One thing that I’m concerned about is the fact of edge streaming at the cost of random access into the set of vertices. This might not be good for graphs in which the vertex set is bigger compared to the edge set.
Also not sure if it could be counted as a con, but X-Stream is limited to only use 16GB of main memory, forcing the graph to go to SSD.
Also it is just able to run on a single machine, and that limits the memory.
Thoughts for further development:
Support of running on multiple machines as opposed to a single one could be a good approach, maybe if needed to increase the limit of 16GB memory X-Stream is now using.
Questions/Critiques:
I’m generally willing to know why not working on multiple machines and/or increased memory to provide a better scalability as well..even in this initial study?!