I am trying to implement a depth computation algorithm on a low cost single board computer (ARM ODROID XU4), I'm using c++ with opencv for this purpose(opencv for simple operations such as reading and displaying images). the algorithm is executed and tested on CPU. In my implementation I'm using STL deque to mimic the behaviour of a shift register; each time I'm done processing one pixel I pop out the front of the deque and push a new elements at the back, hower the cost of executing this operation is very high. am I right in choosing the deque ? Note that the its size is predefined to be only 8.