12 December 2015 0 9K Report

Space Saving algorithm is an approximation algorithm for approximating frequencies of items in data stream. Suppose a continues stream of elements [$e_1,e_1,e_2,e_1,e_5,e_3....$], Space Saving algorithm maintains a summary of stream in k tuples, which are of the form [$e_i,f_{e_i},\Delta_{e_i}$]. For each incoming element $e$ Space Saving algorithm operate as follows (1) If $e$ is already maintained in the summary, its counter is increased by 1. (2) If $e$ is not maintained and the number of tuples are fewer than k, we add $e$ into the summary with its counter set to 1. (3) if the summary is full and the new item $e$ is not currently maintained, we find any element $e_i$ with the minimum counter value, replace $e_i$ with $e$, and increase the counter by 1, and assign $\Delta_e=f_{e_i}$. Now using this strategy Space Saving can estimate the frequency with the following bounds

(a) $f_e\leq \hat{f}_e\leq f_e+\frac{N}{k}$, where $f_e$ and $\hat{f}_e$ are the true and estimated frequencies of element $e$ and $N$ is the number of elements processed thus far. (b) Space Saving algorithm guarantees that any element with $f_e\geq \frac{N}{k}$ is maintained in the summary.

This is very well established algorithm. We designed a modified version of Space Saving algorithm, which works as follows;

We process stream by an instance of Space Saving ($SS_1$) algorithm as above. In addition to this we also maintained a second instance of Space Saving ($SS_2$) algorithm for deleted elements. Now all the elements that are replaced (deleted) by $SS_1$ are processed by $SS_2$. For example if $e_i$ is replaced by $e$ in operation (3) of $SS_1$, we take $e_i$ and process them by $SS_2$.

Now we want to find the error bound of elements by this algorithm. We have the following two guess but are not sure.

Guess 1. $f_e\leq \hat{f}_e\leq f_e+\frac{N}{k^2}$

Guess 2. $f_e\leq \hat{f}_e\leq f_e+\frac{N}{2k}$

Can some please explain how to find these error bounds, any idea.

More Zubair Shah's questions See All
Similar questions and discussions