Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

09 September 2014 0 1K Report

Summary:

The authors propose a large-scale, big data processing in real time using a parallel recovery system in a distributed environment. The proposed system addresses the features lacking in the existing approaches, most importantly sub-second fault and stragglers recovery by defining an in-memory data structure (called RDD) instead of a replicated on-disk state recovery mode. The experimental results using a Spark Engine-based machine show the system is able to sub-secondly process over 60 million records per second on a 100-node running scenario.

Pros:

The main advantage of the proposed system that as the authors claim, many of the other work do not support is the address the issues with both faults, and stragglers. Also, unlike centralized approaches, this work is proposing a parallel recovery mechanism in a distributed environment which enables huge scalability. Plus, the other major vantage of the proposed system is the sub-second recovery latency for both faults and stranglers.

Cons:

However I’m not a distributed systems guy myself, but what I like to point out is regarding the scalability of Conviva’s Internet video streaming application which I am a bit familiar with. The authors claim that on 64 EC2 nodes, the system can process enough concurrent viewers which was exceeding the peak load experience of Conviva so far. Well, it’s worth to mention that just on mobile devices, video is accounting for more than 67% of the whole data traffic, and with the increase in the video resolution and quality, this trend is rapidly increasing. I was willing to see some information on this particular video steaming scalability, and some predictions on the incremental model as shown in Figure 14. (a).

Thought for further development:

One option for the optimization model could be somewhat similar to what TCP’s RTT is doing, so to provide an approximation based on the priority of the real-time data. So given the data history, basically we can give more priority to the latest hot data, and less recovery priority to the older ones, and try to seek a memory-accuracy trade-off.

Also regarding Figure 14. (a), given today’s daily-increased video traffic, maybe providing a prediction model for the scalability of nodes in cluster as opposed to supporting more active sessions could be interesting!

Critiques/Questions:

So basically as I mentioned in the cons section, shall we assume a linear-like model (maybe with high R-squared value) as a prediction for the scalability of nodes in cluster so to support more active sessions?! Maybe we should update the results again regarding to today’s daily-increased video traffic!

Badges
Science topic

More Mohammad Hosseini's questions See All

Paper Review: Camdoop: Exploiting In-network Aggregation for Big Data Applications

Summary:In this paper, the authors propose Camdoop, a system similar to Map-Reduce that supports full on-path aggregation of data streams. It builds aggregation trees with the sources of the...

08 September 2014 9,602 0 View

Review: Apache Hadoop YARN: Yet Another Resource Negotiator

Summary:In this paper, the authors discuss YARN, the next generation of Hadoop platform, and summarize its design and development. They discussed how adoption and new types of applications has...

08 September 2014 6,880 0 View

Paper Review: Starﬁsh: A Self-tuning System for Big Data Analytics

Summary:The authors in this paper propose Starfish, an optimizer tool for big-data analytics. It enables Hadoop workloads and applications to get optimized performance automatically throughout the...

08 September 2014 8,365 0 View

Paper REVIEW: Rhea: automatic ﬁltering for unstructured cloud storage

Summary:The authors in this paper propose Rhea, a system which automatically generates and executes storage-side filters for unstructured text data. It extracts both row filters (which selects...

08 September 2014 7,365 0 View

Paper Review: Bimodal Multicast

Summary:In this paper, the authors propose a bimodal multicast protocol with good scalability and predictable reliability even under highly perturbed conditions, which can also be understood as...

08 September 2014 3,549 0 View

Paper Review: X-Stream: Edge-centric Graph Processing using Streaming Partitions

SUMMARY:The authors in this paper propose X-Stream, which is a system for scaling-up graph processing on a single shared-memory machine. It keeps state in the vertices and disclosures a...

08 September 2014 7,485 0 View

How (and How Not) to Write a Good Systems Paper?

There are many articles around discussing what are the elements of a good research. During my Masters, I had the chance to be a guest reviewer and reviewer for some of SIGMM (SIG Multimedia)...

08 September 2014 1,874 2 View

Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

Summary:The authors in this paper propose Trinity.RDF, which is a distributed and scalable RDF system that is able to handle web scale RDF data (billion or even trillion triples). Trinity.RDF...

08 September 2014 4,492 0 View

Paper Review: STREAM: The Stanford Data Stream Management System

Summary: Stream, a system proposed by Stanford introduces a framework for continuous and long-running data management and query processing, and that for both continuous streams and traditional...

08 September 2014 5,800 0 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

I am trying to simulate vehicular loading on an orthotopic steel deck bridge section in ABAQUS software. The red arrow mark in the attached figure indicates the direction in which the vehicle will...

08 August 2024 719 0 View

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

My name is Apurva Saoji. I am a Ph.D scholar in Computer engineering in India. I am looking for international expert in reviewing my PhD thesis, "Competitive Optimization Techniques to Minimize...

07 August 2024 4,600 2 View

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Dear fellow researchers, I am currently working on a paper where I need to provide a reliable reference that defines and distinguishes between 3D mesh models and 3D city models. Although I am...

06 August 2024 9,986 2 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View