Paper REVIEW: Rhea: automatic ﬁltering for unstructured cloud storage

09 September 2014 0 7K Report

Summary:

The authors in this paper propose Rhea, a system which automatically generates and executes storage-side filters for unstructured text data. It extracts both row filters (which selects irrelevant rows/lines in the input) and column filters (which select irrelevant columns in the surviving rows). It uses static analysis of application code to generate safe and stateless filters. As for the evaluation, the results showed that Rhea filters reduces job runtime by up to 5 times and dollar costs by up to 13

times!

Pros:

For sure the main advantage of MemC3 is to reduce the bandwidth cost of transferring redundant data from storage to computation by retaining both the unstructured storage and cloud storage.

In addition, it’s a plus that it can have false positives (return true for records that do not affect the output), but it cannot have false negatives.

Cons:

Unfortunately at this point Rhea is supporting Map-Reduce and Java language.

Also it was not clear for me a general overhead of filters, maybe in terms of CPU and energy usage.

Thought for further development:

For sure one option that the authors also mentioned themselves was to generalize Rhea to support other format such as binary formats, and XML. Also data processing tools and runtimes other than Hadoop and Java could be considered.

Critiques/Questions:

Like I said previously, I’d like to know what tools other than Map-Reduce can be leveraged.

Badges
Science topic

Similar topics
Analytical Chemistry
Column

More Mohammad Hosseini's questions See All

Paper Review: Camdoop: Exploiting In-network Aggregation for Big Data Applications

Summary:In this paper, the authors propose Camdoop, a system similar to Map-Reduce that supports full on-path aggregation of data streams. It builds aggregation trees with the sources of the...

08 September 2014 9,602 0 View

Review: Apache Hadoop YARN: Yet Another Resource Negotiator

Summary:In this paper, the authors discuss YARN, the next generation of Hadoop platform, and summarize its design and development. They discussed how adoption and new types of applications has...

08 September 2014 6,880 0 View

Paper Review: Starﬁsh: A Self-tuning System for Big Data Analytics

Summary:The authors in this paper propose Starfish, an optimizer tool for big-data analytics. It enables Hadoop workloads and applications to get optimized performance automatically throughout the...

08 September 2014 8,365 0 View

Paper Review: Bimodal Multicast

Summary:In this paper, the authors propose a bimodal multicast protocol with good scalability and predictable reliability even under highly perturbed conditions, which can also be understood as...

08 September 2014 3,549 0 View

Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

Summary: The authors propose a large-scale, big data processing in real time using a parallel recovery system in a distributed environment. The proposed system addresses the features lacking in...

08 September 2014 1,407 0 View

Paper Review: X-Stream: Edge-centric Graph Processing using Streaming Partitions

SUMMARY:The authors in this paper propose X-Stream, which is a system for scaling-up graph processing on a single shared-memory machine. It keeps state in the vertices and disclosures a...

08 September 2014 7,485 0 View

How (and How Not) to Write a Good Systems Paper?

There are many articles around discussing what are the elements of a good research. During my Masters, I had the chance to be a guest reviewer and reviewer for some of SIGMM (SIG Multimedia)...

08 September 2014 1,874 2 View

Paper REVIEW: Discretized streams fault-tolerant streaming computation at scale

Summary:The authors in this paper propose Trinity.RDF, which is a distributed and scalable RDF system that is able to handle web scale RDF data (billion or even trillion triples). Trinity.RDF...

08 September 2014 4,492 0 View

Paper Review: STREAM: The Stanford Data Stream Management System

Summary: Stream, a system proposed by Stanford introduces a framework for continuous and long-running data management and query processing, and that for both continuous streams and traditional...

08 September 2014 5,800 0 View

What are the long-term impacts of incarceration on youths' developing brain?

I want to explore the long-term effects of incarceration on a youth's developing brain. I also want to explore research that looks critically at incarceration and punitive measures as the primary...

12 August 2024 862 0 View

Which type of compound does lamda max of 218 indicate in a uv-vis spectrum of a partially purified compound through column and TLC?

A crude extract of fungal culture using EtOH was subjected to column and TLC and partially purified compound was obtained. UV vis spectrum of the compound/s has max absorbance at 218nm. The...

11 August 2024 9,801 2 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Can you connect an HPLC to a Mass Spec only at a certain time point?

Can anyone explain this method? Especially the last statement where it says only at 1.5 to 2.5mins was the MS/MS connected to the UPLC. How is that possible, is it a feature in this specific...

11 August 2024 8,141 3 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

RNA Extraction Using Hot Borate Method No Longer Working?

I've been performing RNA extraction on cotton petiole tissue for a few months now using the method described in the following paper, a derivative of the typical hot borate method...

08 August 2024 9,882 2 View