Paper Review: X-Stream: Edge-centric Graph Processing using Streaming Partitions

09 September 2014 0 7K Report

SUMMARY:

The authors in this paper propose X-Stream, which is a system for scaling-up graph processing on a single shared-memory machine. It keeps state in the vertices and disclosures a scatter-gather programming model. The computation is structured as a loop, each iteration of which consists of a scatter phase followed by a gather phase. The approach to scale-up graph processing is to sort the edges of the graph by originating vertex and build an index over the sorted edge list. The results show that X-Stream scales pretty well and is in fact an appealing approach compared to the other approaches in indexing the edge list and performing random access through the index.

Pros:

For me the design of a statically sized and statically allocated data structure, which they called the stream buffer, to store variable-sized data items was a well-designed approach to avoid the overhead of dynamic memory allocation.

Also supporting interfaces other than edge-centric scatter-gather was a plus. For example, X-Stream supports the semi-streaming model for graphs or graph algorithms that are built on top of the W-Stream model.

Absorbing new edges with little overhead because it supports efficient recomputation on graphs with the newly added edges.

In general it seems it provides a pretty fast approach.

Cons:

One thing that I’m concerned about is the fact of edge streaming at the cost of random access into the set of vertices. This might not be good for graphs in which the vertex set is bigger compared to the edge set.

Also not sure if it could be counted as a con, but X-Stream is limited to only use 16GB of main memory, forcing the graph to go to SSD.

Also it is just able to run on a single machine, and that limits the memory.

Thoughts for further development:

Support of running on multiple machines as opposed to a single one could be a good approach, maybe if needed to increase the limit of 16GB memory X-Stream is now using.

Questions/Critiques:

I’m generally willing to know why not working on multiple machines and/or increased memory to provide a better scalability as well..even in this initial study?!

Badges
Science topic

Similar topics
Mathematical Sciences
Graphs

More Mohammad Hosseini's questions See All

Paper Review: Camdoop: Exploiting In-network Aggregation for Big Data Applications

Summary:In this paper, the authors propose Camdoop, a system similar to Map-Reduce that supports full on-path aggregation of data streams. It builds aggregation trees with the sources of the...

08 September 2014 9,602 0 View

Review: Apache Hadoop YARN: Yet Another Resource Negotiator

Summary:In this paper, the authors discuss YARN, the next generation of Hadoop platform, and summarize its design and development. They discussed how adoption and new types of applications has...

08 September 2014 6,880 0 View

Paper Review: Starﬁsh: A Self-tuning System for Big Data Analytics

Summary:The authors in this paper propose Starfish, an optimizer tool for big-data analytics. It enables Hadoop workloads and applications to get optimized performance automatically throughout the...

08 September 2014 8,365 0 View

Paper REVIEW: Rhea: automatic ﬁltering for unstructured cloud storage

Summary:The authors in this paper propose Rhea, a system which automatically generates and executes storage-side filters for unstructured text data. It extracts both row filters (which selects...

08 September 2014 7,365 0 View

Paper Review: Bimodal Multicast

Summary:In this paper, the authors propose a bimodal multicast protocol with good scalability and predictable reliability even under highly perturbed conditions, which can also be understood as...

08 September 2014 3,549 0 View

Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

Summary: The authors propose a large-scale, big data processing in real time using a parallel recovery system in a distributed environment. The proposed system addresses the features lacking in...

08 September 2014 1,407 0 View

How (and How Not) to Write a Good Systems Paper?

There are many articles around discussing what are the elements of a good research. During my Masters, I had the chance to be a guest reviewer and reviewer for some of SIGMM (SIG Multimedia)...

08 September 2014 1,874 2 View

Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

Summary:The authors in this paper propose Trinity.RDF, which is a distributed and scalable RDF system that is able to handle web scale RDF data (billion or even trillion triples). Trinity.RDF...

08 September 2014 4,492 0 View

Paper Review: STREAM: The Stanford Data Stream Management System

Summary: Stream, a system proposed by Stanford introduces a framework for continuous and long-running data management and query processing, and that for both continuous streams and traditional...

08 September 2014 5,800 0 View

How to fix errors in my heat transfer steel structure with reinforced concrete slab model Abaqus?

I have modelled a steel structure using beam elements in Abaqus and attached to this structure reinforced concrete slab. The analysis that I am making is heat transfer of the structure. The...

07 August 2024 1,028 0 View

Can anyone explain why there is not any color bands in the middle portion of the tensile coupon?

Details of the Analysis. Static Analysis Composite Layup Continuum Shell Elements FRP Material (Elastic and Hashin Damage)

03 August 2024 8,538 4 View

What is meant by baseline of FTIR data?

I got comment on my FTIR data figure from a reviewer. The reviewer said "FTIR data in Figure should be repeated. there is no bassline." I made Y off set comparison graph of FTIR on OriginLab. Can...

03 August 2024 6,070 3 View

Is the mentioned CV graph a valid one as this graph have only one peak prominent (reduction)?

I have used Prussian blue nanoparticles as a redox couple. The PBNPs have been made using only one salt precursor. Also, during scan rate studies, a small oxidation peak can be consistently found...

31 July 2024 9,697 0 View

Entropy measure and QSPR modeling in Graph Theor. How to construct the table for lengthy equation?

The entropy measured of molecular graphs plays a crucial rule. The network structures in some cases are very lengthy calculations to handle. Some author avoid to construct table where as most...

30 July 2024 3,126 0 View

There is possible way to calculate experimental DOS from the UPS spectrum of the material. How to calculate the DOS from UPS?

Here, I have attached the UPS graph. I'm trying to calculate the DOS/DOVS from the UPS.

29 July 2024 4,971 1 View

How Can I Apply Hashin Damage Properties to Solid Elements in Abaqus Without Using VUMAT?

Sure, here is the translation: I have a 3D orthogonal woven composite structure where warp, weft, and binder yarns are oriented in three directions. After modeling these yarns and the matrix in...

28 July 2024 8,169 0 View

Will the catalytic effect of retained austenite happens in steels including RA?

The catalytic effect of retained austenite refers to the phenomenon wherein quenching steels with a small amount of retained austenite, when subjected to tempering at a specific temperature...

27 July 2024 8,508 0 View

Why might I be observing hysteresis in my stress-strain curves when using the Mohr-Coulomb model, despite not applying dynamic loading?

While working on caisson foundation, I applied static vertical load

25 July 2024 9,357 1 View

Reason for discontinuities in my Band structure?

Hey All! I am wondering what might be wrong with my band structure. I did the calculations using VASP and plotted the results using Origin. Although I have tried changing various input...

25 July 2024 2,920 11 View