The complexity of the architecture depends on the task at hand. if the data is complex you need more parameters in terms of the layers or depth of the filters in the network. But in case of limited hardware resources, I would suggest less number of layers rather than decreasing the depth of the filters.
It is essential to understand the capabilities and limitations of the target hardware platform before deciding network architecture. For resource constrained hardware, the ultimate objective is to reduce computations while keeping the performance as high as possible. Depth-wise separable convolutions are one strategy to improve efficiency among others. However, the more aggressive approaches to minimize computations are 1) parameter quantization (from FP32 to FP16 or even INT8), followed by fine-tuning, and 2) network pruning. An interesting paper on the topic is:
Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding