Paper Review: STREAM: The Stanford Data Stream Management System

09 September 2014 0 6K Report

Summary:

Stream, a system proposed by Stanford introduces a framework for continuous and long-running data management and query processing, and that for both continuous streams and traditional stored data sets. The authors proposed a Continuous Query Language (CQL) that implements the abstract semantics for achieving this goal. STREAM is provided with StreaMon which provides adaptations to enables the system to adapt to the varied load so to gain increased performance. In addition, it also includes approximation modules to fulfill the processing and Memory limitations.

Pros:

Aside from the major advantage of proposing a unified and robust data query management system for both continuous queries and continuous large-scale streams, the paper itself provides a high-level abstractions for the system. However, I ‘m not sure if this is counted as an advantage though.

Definitely one of the main advantages is to define an adaptive module that monitors conditional selectives and orders stream joins to minimize overall work given current conditions. Further, the authors claim they have studied the tradeoffs among the runtime overhead, adaptation speed, and convergence to good strategies if conditions stabilize, which was also a plus.

Cons:

One thing that was not really clear was regarding the paper itself; the paper seemed to be more like a position paper that a journal/conference paper. The design claims provided by the authors lacks detailed experimental results. It could have been best that the framework could accompany with some prime experimental results.

But aside from the point above, what I think is that maybe with the advancements of parallel computing for big-data processing, maybe the CPU limitations is still a major issue?! Depending on the type of application, maybe scarifying accuracy by dropping elements seems naïve, and might not be acceptable. However, we understand at the time it was a big concern.

The last, but not the least, due to the centralized nature of the proposed solution, I’m concerned with the scalability to support an extremely large number of queries, and the SPOF (single point of failure) issue of the proposed system.

Thoughts for further development:

For sure, the main direction for the authors to go is to design a distributed continuous query and stream management system to address both the scalability, and the SPOF issues. Such design further fulfills the requirements of processing and memory limitations.

Questions/Critiques:

How much the system can achieve using the approximation approaches, such as load-shedding?! Proposing approximation methods for optimizing resources should be accompanied with some primary quality/performance trade-offs along with experimental proofs.

Badges
Science topic

More Mohammad Hosseini's questions See All

Paper Review: Camdoop: Exploiting In-network Aggregation for Big Data Applications

Summary:In this paper, the authors propose Camdoop, a system similar to Map-Reduce that supports full on-path aggregation of data streams. It builds aggregation trees with the sources of the...

08 September 2014 9,602 0 View

Review: Apache Hadoop YARN: Yet Another Resource Negotiator

Summary:In this paper, the authors discuss YARN, the next generation of Hadoop platform, and summarize its design and development. They discussed how adoption and new types of applications has...

08 September 2014 6,880 0 View

Paper Review: Starﬁsh: A Self-tuning System for Big Data Analytics

Summary:The authors in this paper propose Starfish, an optimizer tool for big-data analytics. It enables Hadoop workloads and applications to get optimized performance automatically throughout the...

08 September 2014 8,365 0 View

Paper REVIEW: Rhea: automatic ﬁltering for unstructured cloud storage

Summary:The authors in this paper propose Rhea, a system which automatically generates and executes storage-side filters for unstructured text data. It extracts both row filters (which selects...

08 September 2014 7,365 0 View

Paper Review: Bimodal Multicast

Summary:In this paper, the authors propose a bimodal multicast protocol with good scalability and predictable reliability even under highly perturbed conditions, which can also be understood as...

08 September 2014 3,549 0 View

Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

Summary: The authors propose a large-scale, big data processing in real time using a parallel recovery system in a distributed environment. The proposed system addresses the features lacking in...

08 September 2014 1,407 0 View

Paper Review: X-Stream: Edge-centric Graph Processing using Streaming Partitions

SUMMARY:The authors in this paper propose X-Stream, which is a system for scaling-up graph processing on a single shared-memory machine. It keeps state in the vertices and disclosures a...

08 September 2014 7,485 0 View

How (and How Not) to Write a Good Systems Paper?

There are many articles around discussing what are the elements of a good research. During my Masters, I had the chance to be a guest reviewer and reviewer for some of SIGMM (SIG Multimedia)...

08 September 2014 1,874 2 View

Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

Summary:The authors in this paper propose Trinity.RDF, which is a distributed and scalable RDF system that is able to handle web scale RDF data (billion or even trillion triples). Trinity.RDF...

08 September 2014 4,492 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Separation of organic acids-HPLC?

Hello What should be done to separate and identify organic acids in HPC when their RetTime is the same?Like oxalic acid with Propanoic Acid.or acids that have a very close RetTime.

07 August 2024 8,782 3 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

What should Berlin do as a city to become as impactful as London and Paris in World Football?

Please go through my Abstract. I can also share a proposed Thesis Outline.

04 August 2024 2,077 0 View

Identify the eight key processes of excellent supply chain management and discuss how each of these processes impacts the end customer?

Read the journal article by Douglas M. Lambert, “The Eight Essential Supply Chain Management Processes,” Supply Chain Management Review, Vol. 8, No. 6 (2004), pp. 18-26

04 August 2024 9,919 4 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View