Summary:
Stream, a system proposed by Stanford introduces a framework for continuous and long-running data management and query processing, and that for both continuous streams and traditional stored data sets. The authors proposed a Continuous Query Language (CQL) that implements the abstract semantics for achieving this goal. STREAM is provided with StreaMon which provides adaptations to enables the system to adapt to the varied load so to gain increased performance. In addition, it also includes approximation modules to fulfill the processing and Memory limitations.
Pros:
Aside from the major advantage of proposing a unified and robust data query management system for both continuous queries and continuous large-scale streams, the paper itself provides a high-level abstractions for the system. However, I ‘m not sure if this is counted as an advantage though.
Definitely one of the main advantages is to define an adaptive module that monitors conditional selectives and orders stream joins to minimize overall work given current conditions. Further, the authors claim they have studied the tradeoffs among the runtime overhead, adaptation speed, and convergence to good strategies if conditions stabilize, which was also a plus.
Cons:
One thing that was not really clear was regarding the paper itself; the paper seemed to be more like a position paper that a journal/conference paper. The design claims provided by the authors lacks detailed experimental results. It could have been best that the framework could accompany with some prime experimental results.
But aside from the point above, what I think is that maybe with the advancements of parallel computing for big-data processing, maybe the CPU limitations is still a major issue?! Depending on the type of application, maybe scarifying accuracy by dropping elements seems naïve, and might not be acceptable. However, we understand at the time it was a big concern.
The last, but not the least, due to the centralized nature of the proposed solution, I’m concerned with the scalability to support an extremely large number of queries, and the SPOF (single point of failure) issue of the proposed system.
Thoughts for further development:
For sure, the main direction for the authors to go is to design a distributed continuous query and stream management system to address both the scalability, and the SPOF issues. Such design further fulfills the requirements of processing and memory limitations.
Questions/Critiques:
How much the system can achieve using the approximation approaches, such as load-shedding?! Proposing approximation methods for optimizing resources should be accompanied with some primary quality/performance trade-offs along with experimental proofs.