Summary:
In this paper, the authors propose Camdoop, a system similar to Map-Reduce that supports full on-path aggregation of data streams. It builds aggregation trees with the sources of the intermediate data as the children and roots at the servers executing the final reduction. They use CamCube which distributes the switch functionality across the servers instead of using traditional switches. Their approach reduces network traffic by receiving only a fraction of the data which is forwarded at each hop, and that it outperforms Camdoop running over a traditional switch.
Pros:
The main advantage of CamCube is that by using a direct connect topology and actually letting servers to handle packet forwarding, it totally removes the distinction between the logical and the physical network.
Also showing that on-path aggregation still provides benefits even when the reduce function is not associative, was a plus. Supporting the same functions used in MapReduce and then Hadoop was an advantage for being user-friendly.
In general, it was nice to show that Camdoop can provide significant performance gains, up to two orders of magnitude, when associative and commutative reduce functions are used.
Cons:
Again, in their prototype, the user-space runtime is written in C#. In general, it could be best to use a platform-independent environment as opposed to just use a windows-based platform.
Their use of tree topology was basically chosen as a specific trade-off. Some measurements to show for what scenarios, a different tree topology could be employed to obtain different tradeoffs could be best.
A drawback of using six disjoint trees is that this increases the packet hop count compared to using a single shortest-path tree.
Thoughts for further development:
One general thought is somehow to provide experimental tests how avoiding the overhead of TCP/IP could be beneficial. Testing some protocols such as Google’s SPDY could be nice to test overhead against TCP/IP.
Also testing a platform-independent user-space runtime as opposed to C# could be best.
Critiques/Questions:
I would like to see how it can be beneficial to use customized transmission protocols such as SPDY for less overhead.