The complexity of the architecture depends on the task at hand. if the data is complex you need more parameters in terms of the layers or depth of the filters in the network. But in case of limited hardware resources, I would suggest less number of layers rather than decreasing the depth of the filters.
It is essential to understand the capabilities and limitations of the target hardware platform before deciding network architecture. For resource constrained hardware, the ultimate objective is to reduce computations while keeping the performance as high as possible. Depth-wise separable convolutions are one strategy to improve efficiency among others. However, the more aggressive approaches to minimize computations are 1) parameter quantization (from FP32 to FP16 or even INT8), followed by fine-tuning, and 2) network pruning. An interesting paper on the topic is:
Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding
There has been a lot of interest in making models smaller for resource-efficient training and inference on such constrained devices. Two main approaches have been used:
1. The first approach is to train neural networks that are lightweight by design. A good example of this can be "Spatially Separable convolutions", "Depthwise Separable convolutions", "Flattened convolutions", and "Dilated convolutions".
2. The second approach is to leverage large, pre-trained neural networks and compress them to reduce their size while minimizing loss in performance. A good example of this can be " Network pruning " where L1 norm, L2 norm , and Average Percentage of Zero can be used.