Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

09 September 2014 0 4K Report

Summary:

The authors in this paper propose Trinity.RDF, which is a distributed and scalable RDF system that is able to handle web scale RDF data (billion or even trillion triples). Trinity.RDF models RDF data as an in-memory graph, and supports fast random accesses on the RDF graph. The authors developed novel techniques that use efficient in-memory graph exploration instead of join operations for SPARQL processing. The results show that even without a smart graph partitioning scheme, Trinity.RDF achieves several orders of magnitude speed-up on web scale RDF data over state-of-the-art RDF systems.

Pros:

As the main advantage, it significantly reduces the amount of intermediate results, boosts the query performance in a distributed environment, and makes the system scale.

Adding a novel cost model to enhance the performance of RDF data is also another main plus of the paper.

The evaluation was also based on both real-life and synthetic datasets, which covered enough show cases.

Cons:

One of my main points is that the authors almost mostly claim about the novelty of their approaches, and the new approaches proposed by them. Some parts of the problem however could be new in the context of distributed query plan generation, but generally I believe is not a novel one and is a combination of approaches and concepts, e.g. use of basic graph operators.

One odd part for me was regarding the system evaluation part; they implemented Trinity.RDF in C#, with OS of 64-bit Windows Server 2008 R2 Enterprise with service pack 1!! Probably it is not a good option to limit ourselves with a windows-dependent platform!!

Thoughts for further development:

Just one note regarding the implementation and evaluation of system is to use a platform-independent system, maybe over a virtual machine instead of a windows platform!

Questions/Critiques:

What was the main reason of going for a windows-based platform for evaluation, and that using C#?!

Badges
Science topic

Similar topics
Mathematical Sciences
Graphs

More Mohammad Hosseini's questions See All

Paper Review: Camdoop: Exploiting In-network Aggregation for Big Data Applications

Summary:In this paper, the authors propose Camdoop, a system similar to Map-Reduce that supports full on-path aggregation of data streams. It builds aggregation trees with the sources of the...

08 September 2014 9,602 0 View

Review: Apache Hadoop YARN: Yet Another Resource Negotiator

Summary:In this paper, the authors discuss YARN, the next generation of Hadoop platform, and summarize its design and development. They discussed how adoption and new types of applications has...

08 September 2014 6,880 0 View

Paper Review: Starﬁsh: A Self-tuning System for Big Data Analytics

Summary:The authors in this paper propose Starfish, an optimizer tool for big-data analytics. It enables Hadoop workloads and applications to get optimized performance automatically throughout the...

08 September 2014 8,365 0 View

Paper REVIEW: Rhea: automatic ﬁltering for unstructured cloud storage

Summary:The authors in this paper propose Rhea, a system which automatically generates and executes storage-side filters for unstructured text data. It extracts both row filters (which selects...

08 September 2014 7,365 0 View

Paper Review: Bimodal Multicast

Summary:In this paper, the authors propose a bimodal multicast protocol with good scalability and predictable reliability even under highly perturbed conditions, which can also be understood as...

08 September 2014 3,549 0 View

Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

Summary: The authors propose a large-scale, big data processing in real time using a parallel recovery system in a distributed environment. The proposed system addresses the features lacking in...

08 September 2014 1,407 0 View

Paper Review: X-Stream: Edge-centric Graph Processing using Streaming Partitions

SUMMARY:The authors in this paper propose X-Stream, which is a system for scaling-up graph processing on a single shared-memory machine. It keeps state in the vertices and disclosures a...

08 September 2014 7,485 0 View

How (and How Not) to Write a Good Systems Paper?

There are many articles around discussing what are the elements of a good research. During my Masters, I had the chance to be a guest reviewer and reviewer for some of SIGMM (SIG Multimedia)...

08 September 2014 1,874 2 View

Paper Review: STREAM: The Stanford Data Stream Management System

Summary: Stream, a system proposed by Stanford introduces a framework for continuous and long-running data management and query processing, and that for both continuous streams and traditional...

08 September 2014 5,800 0 View

How to convert a privately loaded document into a public document?

I attempted to make a privately uploaded text public but a window appeared that said an error occurred. There was no explanation provided as to why there was an error or what might be done to...

05 August 2024 8,025 7 View

What is meant by baseline of FTIR data?

I got comment on my FTIR data figure from a reviewer. The reviewer said "FTIR data in Figure should be repeated. there is no bassline." I made Y off set comparison graph of FTIR on OriginLab. Can...

03 August 2024 6,070 3 View

Is the mentioned CV graph a valid one as this graph have only one peak prominent (reduction)?

I have used Prussian blue nanoparticles as a redox couple. The PBNPs have been made using only one salt precursor. Also, during scan rate studies, a small oxidation peak can be consistently found...

31 July 2024 9,697 0 View

Entropy measure and QSPR modeling in Graph Theor. How to construct the table for lengthy equation?

The entropy measured of molecular graphs plays a crucial rule. The network structures in some cases are very lengthy calculations to handle. Some author avoid to construct table where as most...

30 July 2024 3,126 0 View

Hi there, someone has the SeinFit software for windows because I cannot download it?

DOS version.

29 July 2024 6,064 1 View

There is possible way to calculate experimental DOS from the UPS spectrum of the material. How to calculate the DOS from UPS?

Here, I have attached the UPS graph. I'm trying to calculate the DOS/DOVS from the UPS.

29 July 2024 4,971 1 View

Critique Calculaltion in Topological Indices and QSPR Modeling?

The first step is to analyze a 2D molecular graph and implement partitioning techniques to calculate the topological indices. Secondly impose statically tools to generate QSPR model, may or may...

24 July 2024 9,644 1 View

How to do calculation of Louvain methods calculation for excel sheet and and is there any software available?

The Louvain method – named after the University of Louvain where Blondel et al. developed the algorithm – finds communities by optimizing modularity locally for every node’s neighborhood, then...

23 July 2024 6,659 0 View

Abaqus cylinder buckling riks analysis?

hi, I am doing a risk analysis of a cylindrical fuselage with windows. I have applied contact interactions between the different parts, and I have applied a unit value load to a reference point,...

20 July 2024 647 2 View

How can I adapt cyclic server model in hospital operations?

To help enchancing patients flow and service efficiency

18 July 2024 482 2 View