Summary:
The authors in this paper propose Trinity.RDF, which is a distributed and scalable RDF system that is able to handle web scale RDF data (billion or even trillion triples). Trinity.RDF models RDF data as an in-memory graph, and supports fast random accesses on the RDF graph. The authors developed novel techniques that use efficient in-memory graph exploration instead of join operations for SPARQL processing. The results show that even without a smart graph partitioning scheme, Trinity.RDF achieves several orders of magnitude speed-up on web scale RDF data over state-of-the-art RDF systems.
Pros:
As the main advantage, it significantly reduces the amount of intermediate results, boosts the query performance in a distributed environment, and makes the system scale.
Adding a novel cost model to enhance the performance of RDF data is also another main plus of the paper.
The evaluation was also based on both real-life and synthetic datasets, which covered enough show cases.
Cons:
One of my main points is that the authors almost mostly claim about the novelty of their approaches, and the new approaches proposed by them. Some parts of the problem however could be new in the context of distributed query plan generation, but generally I believe is not a novel one and is a combination of approaches and concepts, e.g. use of basic graph operators.
One odd part for me was regarding the system evaluation part; they implemented Trinity.RDF in C#, with OS of 64-bit Windows Server 2008 R2 Enterprise with service pack 1!! Probably it is not a good option to limit ourselves with a windows-dependent platform!!
Thoughts for further development:
Just one note regarding the implementation and evaluation of system is to use a platform-independent system, maybe over a virtual machine instead of a windows platform!
Questions/Critiques:
What was the main reason of going for a windows-based platform for evaluation, and that using C#?!